"When those who benefit are not those who do the work, then the technology is likely to fail or, at least, be subverted"
This quote perfectly highlights my last blog post on how I failed to introduce dbt in my previous company.
It also uncovered something I recently thought about lately: should data warehouse technologies go further in their bundling of the stack?
Snowflake introduced Git interface and notebooks, BigQuery also has a similar fashion, they all have dedicated SQL operations making interoperability complicated, etc.
Data warehouse should be (about) data (only)?
The unbundling vs bundling debate is at play here. I feel we're coming back into a bundling phase with Snowflake and Databricks as the main leaders in the field (followed by cloud providers).
From my personal experience, I feel unbundling our stack makes it easier to deal with new projects and maintain them. While having one tool bundling everything is great, it simply doesn't allow space for new projects and innovation. Again, it's a feeling. I might dive deep into this subject soon.
📡 Expected Contents
Home-Cooked Software and Barefoot Developers
This is top-notch 🚀
Maggie is worth following. She always writes thoughtful posts and this one doesn't disappoint.
She comes with this nice coined term - the "Barefoot Developers":
"we're on a verge of a golden age of local, home-cooked software and a new kind of developer – what I've called the barefoot developer."
And how LLMs can finally be useful...
Don't Mention AI Again
So it is with great regret that I announce that the next person to talk about rolling out AI is going to receive a complimentary chiropractic adjustment in the style of Dr. Bourne, i.e, I am going to fucking break your neck. I am truly, deeply, sorry.
Beyond the provocative tone, this post perfectly describes a general feeling about AI and LLM shared by machine learning engineers and data scientists.
Stateless vs Stateful
Working with "K" tools (Kubernetes, Kestra, Kafka) I'm now used to stateless systems.
In these, tasks or events are seen as independent. Every request must include all the necessary data for the server to process. This makes them simpler to design and scale.
The cool thing about stateless systems is they practically force you to do things the "right way". Like keeping different parts of the system separate (service isolation), making sure tasks can run multiple times without messing things up (idempotency), and being able to handle more work easily (scalability).
I sometimes feel that a lot of the bad habits we're used to come from older tools. Most of them rely on remembering things (stateful design). It's a whole topic on its own - maybe I'll write a blog post about it in the coming months.
What Factors Explain the Nature of Software?
Very enlightening post. Most of what's highlighted here can apply to data:
Are there deeper, more fundamental aspects of software that can help us think beyond surface-level matters? I’ve come to think that a triad of interacting factors best explains what software is and, by extension, why software is difficult:
Software occupies a liminal state between the constraints of the physical world and an anything-goes fantasy world. We frequently mistake the constraints that software faces.
👉 Data also occupies a liminal state. It's the best asset we have to understand reality, but it's only a proxy of it.
Our ability to specify what a given piece of software should be is limited by the circular specification problem. We nearly always have to fully build the software in order to know precisely what we want it to be.
👉 To be sure our data models the world we want to model, we should explore, transform, and join missing pieces before getting to a decision.
Software is subject to the observer effect. The act of seeing the software in action changes what we – or more often others – think the software should be, sometimes radically.
👉 Sounds familiar isn't it? After shipping a dashboard, people will have new demands, new graphics, new questions, etc. Observer effect is at play here.
📰 The Blog Post
No new blog post this month. I've several ideas in the writing, so expect some new ones soon 😉
In the meantime, sharing an old one - I'm still proud of it1
🎨 Beyond The Bracket
This last week I had the opportunity to teach a "machine learning orchestration" course at a French engineering school. Students were in their very last course before joining companies and working as data engineers (most of them already had some contract discussions ongoing). It was very interesting to see the actual level of young engineers entering into business.
Most of their challenges revolved around two things: installing tools and understanding the big picture.
It's not a surprise for the latter: we're many to wonder about the stack and the "it depends". Actually, I use this newsletter as my own space to reflect on the topic.
More surprisingly: installing tools was still painful, even in a world where everything is Docker. Windows and WSL incompatibility, python dependencies management, port conflicts, etc.
I didn't plan to make a Docker summary - they already had a course around it, and they said they were quite ok with it. In fact, they only scratched the surface, and few of them really understood what was going on in their laptop.
This needs more nuance2, by my almost serious advice: stop using Windows, stop using raw Python installation, and stop using tools for projects without having a proper play with them beforehand.
After a great off-site in Lille, many projects at work, and this course in Paris, I'm looking forward to breaking a bit for summertime ☀️
To relax, slow the pace, and reflect on what's next.
The sun is shining finally; heat out there!
I might skip the August issue, will see 😉
By the next time, take care of yourself 🙏
looking back at my old writing is both a terrible and grateful feeling...
this week didn’t help me solve my feelings about Windows or Python…