Issue 52 - Closing Doors & Brackets
Metastasize
Consultancy firms all have internal tools. They are usually built during inter-contract time, by fellow engineers tired of building the same thing over and over, from customer to customer. Ultimately, a library as simple as templating SQL with Jinja succeeds in gathering a community. In “building a product”.
It’s easy to say after the fact, but merging with a company that succeeded in providing value by externalizing data moves seems pretty natural. If we remove open source from the debate, it’s kinda surprising it didn’t come earlier.
Some of us will look for the alternatives. For both a community and a “real” open source project for transforming data with SQL.
As engineers, we continually search for something better, but a moment’s reflection reveals that it’s hard to ask for much than these templating and model dependencies engines.
I am. Either because I’m tired of seeing the same patterns. Or because I feel we are mature enough now. Probably both. Templating SQL query with Jinja sounds so much like a graduate project to me now. Don’t get me wrong, I’ve built such a thing myself before dbt. I used dbt. I use dbt. I will use dbt in the future. And I don’t have an off-the-shelf solution here; just saying we should think deeper about the basis. SQL wasn’t designed for analytics. Not a lot of things in our industry really were.
Now is a moment to pause, to reflect on our codebase, on our practices. Is templating a language not designed for analytics, with a mix of brackets, untestable macros, and markup files is the best we can do?
I don’t think so. For now, it indeed seems hard to ask for much than that. And upstairs, people really don’t care about how we provide the figures. The decision around this company merger probably involved more Excel files than dbt models1.
So while we should be at our desks bringing value to stakeholders, there’s still a gap to fill. For now, it will still involve a lot of brackets, of YAML, of templated SQL, and ultimately humans. And while the latter is not about to go, I really feel the former should be part of history.
📡 Expected Contents
The Long Nose Innovation
What technologies have been here for 10 years, yet not really in production at high scale?
In data, I can’t really find one. This suggests we are at a plateau.
Looking at 5-year timespan brings a bit more of novelties: accents to lakehouse, declarativness, or BI as code seems the next path for productization.
Here is a nice article reminding us that technological innovation occurs at a slower pace than we prefer to admit.
My belief is there is a mirror-image of the long tail that is equally important to those wanting to understand the process of innovation. It states that the bulk of innovation behind the latest “wow” moment is also low-amplitude and takes place over a long period—but well before the “new” idea has become generally known, fully refined, much less reached the tipping point where it becomes widely adopted. It is what I call The Long Nose of Innovation.
What’s DevRel?
Pedram shares a good roundup of what developer’s relationship is after two years leading the initiative at Dagster.
Beyond the nice read, it’s also good to see that such a position brings the path toward landing a job at Anthropic. Congrats on that Pedram!
Docker-compose with AI models
Here is a nice blog post from Pradumna, a new teammate at Kestra!
Even if I’m moving away from Docker more and more, this one specification from Docker is nice to see.
What Moves the Business Tick?
If it’s not the latest lakehouse architecture, what is it?
For me, the most painful part is BI. Yes, we have tools for building dashboards. But users of these dashboards still have a hard time making decisions. And if they make decisions, it is still not clear what the outcomes are (that’s often another subject...).
It’s been more than 7 years I’m working close to C-levels, either directly or in a second order. And what remains in the end is a complexity in making decisions. That’s their job. It’s a complex one.
Intuition is their most valuable asset, and they build it on top of their experiences. Data is here to provide facts - to bring signal - not decisions. It’s here to validate intuition. When done best, it’s building intuition.
The decision is mainly about an intuition, i.e. the result of the latent space built over data analysis discovery and their own experience.
Do you see the real bottom line here? Experience, discovery, intuition. Not sure a new architecture for data storage resonate with these.
📰 The Blog Post
New blog post coming in very soon!!
This one has been in the draft folder for over a year now. Writing is about patience, and seeing the words unfold almost naturally after such a long span of time provides great joy.
Here is a short spoiler - the blog post is likely to be titled “The Metastasize of Templating in Configuration Language”:
We added Jinja to our dbt project to avoid copy-pasting SQL. Then we needed loops. Then conditionals. Then macros that call other macros. Now we’re debugging template rendering errors at 2am, and somewhere along the way, our “configuration” became a full programming language - just one without a debugger, type system, or any of the tooling we’d expect from actual code.
That’s basically the state of data engineering nowadays. Actually, the state of any ops-oriented engineering work.
When we move further in the configuration complexity clock, we often find ourselves templating configuration files.
We start with clean YAML, then gradually pollute it with Jinja or any similar templating language. The YAML becomes unreadable - neither a declarative configuration nor proper code. These templating languages emerge as expression languages to bridge gaps, then become the problem themselves.
Be sure to subscribe to receive it in your inbox soon ;)
🎨 Beyond The Bracket
This is what’s behind the scenes. At giant hall, 12 hours before the party starts. People not having a clue of what’s ETL are building high quality booths for venue expecting more thousands of other kind of people.
I had such a great time in London: speaking with fellow data product builders, playing darts in English pubs, looking at smart marketing hack from Duck people, eating pies, walking in second hand book store, etc.
But I also saw the truth behind our industry: it’s serious money, propelled by marketing efforts. It was really surprising to be one of the few teams building a product based on an open-core strategy. To be here with engineers, not only sales people.
Half of the people crossing the booth were here for the stickers, the t-shirts and the iPad. At other booths, marketing teams brought free coffee, Lego game, and Playstations.
Because you know, it should be fun2.
Demos were mostly run by salespeople. I was expecting more from demo provided by a leader of the industry than just showing a notebook and run SQL queries.
AI was not really a thing. Making sense of all these logos was the real moat. Business analysts were lost, their data engineering fellows too focused on technical artifacts - as always.
I realize after writing this one how a pessimist call it can be. But don’t get me wrong. I want the fun too, but being serious first. I want open source, but being independent first. I want templating, but being free of brackets3 . I want goodies but without the branding.
Anyway, definitely looking forward to the nighttime here. I’ve been moving to Qwerty lately4, entering the mechanical keyboard fantasy, doubling down on good cake and tea, chess, and reading5. I’ve also just returned from Lake Como and the countryside in Italy - it was lovely!
Hope you had a good weekend! See you next time!
I actually don’t know. Fivetran pivoted away from BI, and so dbt does with this move. Interestingly enough, BI is still not a killed subject... Good prospect such as Rill, Ligthdash, Hex, Omni or Evidence are drawing the next gen landscape, but I cannot express how much this field still seems odd to me.
You already know it’s not really possible. But please don’t mix them up. Please.
Because you know, as French people we like to make things harder: we are using Azerty configuration because, back then, people were messing up their sheets writing too fast. So we chose a layout making our finger to slow down... We never updated so far.
Just finished Social Leap by William Von Hippel - definitely recommend it if you want a great summary of our evolutionary process and better understand our human incentives.








