Discover more from From An Engineer Sight
Issue 23 - Search, Curation & Beyond
Data Engineer is a transitional job
This marks the twenty-third issue of this newsletter. Through my writing, I have compiled a wealth of valuable links that I frequently revisit. However, after two years of monthly issues, finding and organizing these curated resources has become increasingly challenging.
Do I need to create a search feature ? Add tags to each content ?
Sometimes, I try to view this newsletter through the eyes of a new subscriber. It's likely that I would glance through the latest issues without delving into all twenty-two previous editions. However, in doing so, I might miss out on some great ideas and links.
Considering the perspective of new subscribers and the challenge of managing a growing collection of curated resources, it's worth reflecting on a timeless adage: all curation grows until it requires search, and all search grows until it requires curation.
With that in mind, it's important to remember what reading and writing truly bring to the table: a call to action. As Aristotle famously said, "the purpose of knowledge is action, not knowledge".
While it's easy to spend hours scrolling and reading online content, the ultimate goal should not be simply to seek dopamine hits, but to inspire action and promote innovative thinking.
So even if this newsletter serves as a curation medium, its true underlying goal is to inspire action - for myself, for you, and for our future selves1.
📡 Expected Contents
Data Modeling at its best
I recently worked on a project at work involving the revamp of a big pipeline. It's a very important one, including core payment data and producing company's KPI followed by the CEO and C-level friends.
After some brainstorming with data-analysts and data-engineers, we ended-up creating a "full table of event" that we can explode or filter for further processing.
We didn't find any resource debating on how to model such "event" data, and we were actually pretty proud of our design.
To summarize briefly : traditional modeling, such as star schema or Kimball modeling, have often many layers of dependencies and so are difficult to manage and maintain.
"An activity schema transforms source tables into a single, time series table called an activity stream".
That one table is then way easier to maintain and expose to other tools (BI, product, etc.). It could definitely help on building a semantic layer...
This is not really a new idea, but this project is nicely designed and full of clear explanation.
I'm actually all-in on this methodology 😅.
GPT is all you need for backend
That's the next thing : completely replace an application backend with an LLM (Large Language Model) that can both run logic and store memory.
What if yout data schema is entirely malleable on the fly ? What if backend logic is also entirely flexible (meaning you don't even write code) ?
This is the future we imagine
You can iterate on your frontend without knowing exactly what the backend needs to look like.
Backend gives you the wrong format?
Mistype an API name? It doesn't matter!
Serverless w/o the cold start: The only difference between your server and someone elses is the 1KB of state and the LLM instructions, these can be swapped out in milliseconds
🤯 I have actually no words for what coming next...
The definitive visual guide to Pandas
I think "definitive" is the word.
This guide goes throught all main pandas routines, with evevery time a visual attached.
No excuse to use pandas the good way (chaining please 🙏 👀)
(Next step is trying Polar)
Unit testing SQL queries with DuckDB
We are working more and more with SQL and even if the tooling around have been helping a lot to apply software engineering principles to it (dbt 👀), we are still lacking some bits.
Virgile - one of the best manager I have ever had ♥️ - came with a great blog post recently, looking for solutions to put SQL code in production with proper unit testing.
Not surprise to see DuckDB again here, and I really think we will see very similar designs in the industry this year.
📰 The Blog Post
Data Engineer is a Transitional Job
This blog post is the backbone of the discussion, hope you'll like it as much as I did back to the bar.
Thanks for reading From An Engineer Sight! Subscribe for free to receive new posts and support my work.
🎨 Beyond The Bracket
Last weekend, I finally had the opportunity to attend the photographic development and print course that my partner had gifted me for Christmas.
The experience was incredible. We spent the whole day taking shots in the heart of Paris, with a professional photographer by my side. Then we developed the film and printed some of the pictures on paper in the darkroom, which was a pretty unique and fulfilling experience.
Although I'm a digital nerd, I find something beautiful in everything made with analog technology.
I believe that even though our eyes and ears may not be able to tell the difference between digital and analog, there is still something unique and special about the process of creating analog work.
Film photography, even if less sharp than its digital counterpart, renders a color range that's almost impossible to achieve with digital technology.
Similarly, listening to live music or music recorded with real instruments is a completely different experience from listening to electronic-based music. While the two worlds can coexist symbiotically, I find some analog craftsmanship closer to my heart than any digital equivalent.
This experience also reminded me that even in today's digital age, research is being done on analog computers that can outperform our current CPU/GPU technology3.
It makes me wonder if the imperfections of analog technology reflect the real world better than our best digital binaries. Perhaps there's something about analog technology that captures the essence of the world around us in a way that digital technology cannot.
Sun shinning again 🌞.4
Assuming everything continues to go well, the next few months will be crucial in shaping my short to mid-term future...
Hopefully you still like this newsletter 🙏. Don't hesitate to reach me for any questions, comments or feedbacks.
See you in April 🐟.
That's also why newsletter are a very interesting medium : you get a better understanding of the writer's intention as you keep reading them.
The M1076 deliver the AI compute performance of a desktop GPU while consuming up to 1/10th the power - all in a single chip.
I think I have something with weather, as I often finish this newsletter with a weather news😅.