Entropy generally describes energy configurations in system. Measurement of disorders. Energy spread.
It explains why things are going forward. The concept of "time" in some way.
It's hard to reduce entropy. I recently wondered if there are any patterns to optimize that process ? The reduction of system entropy through engineering processes ? 🤔
I guess it's probably what we're doing at work (any work). The more successful we are at minimizing entropy without inadvertently creating more, the greater the value we contribute through our work. Hence the advent of computing and digitalization...
More a thinking session summary here; with more questions than answers. But feeling it was interesting to share as an introduction 🙂
📡 Expected Contents
Semantic Layer: the interface for LLM?
I'm currently observing the trends in generative AI and LLMs from a distance. However, what truly excites me at this moment is the idea of the Semantic Layer.
The belief of giving natural language prompts to transcript SQL queries is close, but LLMs seem not enough here.
The actual challenge of LLMs is trust. Even with the last advancements like chatGPT functions, LLMs sometimes "hallucinate". Especially when they need to create complex tasks (many join queries). When they lack of context (how users are defined? what's an order?).
Folks are now using the concept of the semantic layer as a gateway. By putting a real semantic layer in front of our LLMs we can control the context and the structure of the information we give to the model. It would results in less hallucinations and improved answers from the engine.
Leading products on this idea are: Delphi (Slack powered by LLM), Cube (semantic layer API), Malloy (experimental language on top of SQL). I think it worth following them.
The Tale of Modern Data Stack
I shared some in an previous issue ago the story of big data. Here is now the story of the modern data stack.
We're more and more feeling that a new cycle is about to come. It's good to know what we have now, what we should have soon 👀.
Multi Layered Calendar
The most popular time device is the watch. A watch is a useful tool, but its functionality is limited to the present moment. It allows us to see time, but not to manage it. It only tells us the status quo.
Calendars, on the other hand, cover the entire spectrum of time. Past, present and future. They are the closest thing we have to a time machine. Calendars allow us to travel forward in time and see the future. More importantly, they allow us to change the future.
Changing the future means dedicating time to things that matter. It means allocating our most precious resource to activities with the highest expected return on investment.
You would expect technologists and entrepreneurs to be intensely focused on perfecting such a magical time travel device, but surprisingly, that has not been the case. Our digital calendars turned out to be just marginally better than their pen and paper predecessors. And since their release, neither Outlook nor Google Calendar have really changed in any meaningful way.
Isn’t it ironic that, of all things, it’s our time machines that are stuck in the past?
Very nice post about calendars; what could be improved in that field. I personally use Cron (acquired by Notion) for a month now and it's very pleasant to know that the people behind have a proper vision.
dbt is jQuery, not Terraform
We often like to think that React brought a proper declarative semantic over jQuery, same for Terraform over Ansible (Kestra over Python based orchestrator). I used to think the same thing for dbt over Spark/Python transformation engine but this post made me change my mind.
dbt is jQuery, not Terraform: like jQuery it's hard to manage a dbt model made of 1000 models, like jQuery there is a myriad packages and snippets trying to pair with core dbt.
But the biggest reason jQuery isn’t the de facto framework today is because it was hard to scale to large teams.
The same flexibility that allowed small teams to deliver value quickly left behind messes of spaghetti code for large teams to maintain.
I feel the same about dbt or Airflow code... What would be the terraform equivalent in data processing ? SQLmesh maybe ? Malloy in a future ?
📰 The Blog Post
Scrape & Analyze Football Data with Kestra, Malloy and DuckDB
I recently wrote a blog post showcasing how to scrape football data in Python, process them using Malloy and duckDB for storage, the whole thing orchestrated by Kestra.
🎨 Beyond The Bracket
I recently read again Le Petit Prince from Antoine de Saint-Exupéry.
This is truely a master piece. It's written in the simplest prose while being so thoughtful.
It gave me two insights:
Writing is one the powerful things we can do. For ourselves, to better arrange our thoughts. For others, to give incentives and share sentiments.
Like debugging ML models in production can lead to frustration and exasperation when focusing on a few individual mispredicted data points : it’s hard to draw any general conclusions from such debugging investigations.
As individual we have to build a broader internal debugging system for life. To me, being open minded and septic are two keystones to embrace life at its best. Reading this book gave me a reminder in my face on this.
Really encouraging you to read this classic (again and again).
Don't know about you, but September was harsh here 😮💨
Giving a talk in front of 60 people, attending and doing demo in a summit, moving around Paris and north of France, talking with many smart people, learning React, repairing my childhood Nintendo gamecube 😅, etc.
Feeling the urge to get some calm and focus on my things. Also feeling some maturity and ambitions to continue growing...
See you in November 😉