Hello there!
I have subscribed to great newsletters recently and it really drove me to create one myself. The format seems to be perfect to mix themes and go on more personal stuff. I feel it’s also easier to interact with writers/subscribers while email is more private than social networks…
So I’m very happy to launch this newsletter with already incredible subscribers.
Every four weeks or so I’m going to share some resources around data, engineering and design contents.
What do I mean by data, engineering, design?
Data. The new oil you know…
Data-science, machine learning, data-visualization, ETL, data-mining, statistics, etc… No need for more buzzwords. We will look at the whole scope of data.
Engineering. “This is the use of scientific principles to design and build machines, structures, and other items” - Wikipedia.
Here again, the word is large. For me it’s also a mindset. Doing things slowly, as “clean” as possible. Thinking long-term. Taking your time to digest new learnings.
I believe everyone can adopt this mindset. It can sounds obscure. Difficult. But maybe the most valuable thing in the end.
Design. It has a double meaning.
The first one is close to engineering in the way it takes time to design good solutions. This is the act of engineering. Any other work is mostly about technique.
It’s also design in its cultural and urban signification. Art, architecture, decoration, color mix, photography, clothes, movies, games… These are elements that entertain. They inspire us.
Data is just another field of engineering. A modern and digital one. Still, it’s well done when tied up with good designs and engineering principles.
That’s way too long for a first issue introduction.
Let’s go straight then!
Expected Contents
Why 90 percent of all machine learning models never make it into production?
This is a huge trend nowadays. It seems companies finally notice that code in Jupyter Notebook is somewhat useless.
I’m a little mean with notebooks, while it’s a very (very) good tool for exploring and analysis. Still, it’s way too much overrated in the data-science field in my opinion: if code never reaches production what’s the matter?
The range of tools that currently exist is near maturity: cloud provider features, MLFlow, Kubeflow, etc... There are quite good things out there to be tested and put in production.
The real "problem" splits into two components: the old well-known "data-processing" and project management/leadership issues.
The first one is not new: data is like plumbery. Most of the water flows correctly but when your wall is twisted you get leaks. Lack of quality code pipeline, tests, data lineage, data governance, etc….
Project management problems are more tenuous. Agile or Scrum workflows don't work completely when applied to data stuff. Due to its cyclic flow but also because most of the current leadership working with data do not necessarily know all the concerns and regular issues of the field.
The data industry is only a decade or so and movement such as "DataOps" seems to be the way of great improvements in the near future... More to come on this…
Video tracking to map. Recently I really wanted to understand how some systems are mapping camera recording to an actual 2D map. In the football business, some companies are using live match broadcasts to provide clubs tracking and analytics data. They built computer vision systems to analyze raw video and extract tracking player and ball positions in more than 10 frames by second.
Beyond the technical challenge of data volumetry and velocity, it's interesting to understand how to translate raw recording to coordinates position.
The mathematical stuff behind this is called homography (this is pure algebra).
Homography is quite simple but very powerful. I tested the concept on a street photo to map tagged elements into 2D cartography: it works pretty well.
Homemade tool. When analyzing data we like to explore dashboards and charts in a dynamic way with well-thought filter tooltips. Morphocode is a team of data and developer enthusiasts focusing on architecture, design, maps or urban planning.
Their work is top quality. Here is a blogpost covering the making of their exploratory interface for planning professionals, architects, and analysts to do spatial research faster than ever.
This kind of "in-house" tools can be very powerful when you can't find a product filling your requirements. Though complex and time-consuming, the result can be very valuable to leverage on the engineering workforce and add a new source of revenue when sold publicly (sometimes it's open sourced).
Many companies outsource their tools like Spotify with their Backstage tool, Uber with Ludwig or Airbnb with the famous Airflow job orchestrator.
If you don't already know Streamlit, you really have to give it a try!
This Python web application framework, between Jupyter Notebook and classical dashboarding, is growing really fast. With more and more use cases and powerful features it can be used in almost any data science project. (+ here is a great cheatsheet )
The Blog Post
Gotta Grid’em All! You like Pokemon and scatter plots? These two themes have nothing in common, but you might like this post.
PS: If you wonder about the code behind those plots, it’s full ggplot (and a bit of Photoshop…).
Beyond The Bracket
I'm a big amateur of fonts. Like, really. I even set up a GitHub repository to access my favorite ones whenever I want. You might already know this website too.
Fonts are probably the first aesthetic people see unconsciously on your slides. Like naming variables in your code, choosing the good font for the correct usage can be very time-consuming.
The Recursive Sans & Mono is a modern flexible font: you can change its style along five different dimensions: Monospace, Casual, Weight, Slant and Cursive.
This one can be used in technical presentations for titles or highlight specific parts. Some (not me) will even take the risk to set it as the default font in their code editor. By the way, it is open-source!
Its designer, Stephen Nixon, shared the story behind his creative process.
That's all for this first issue! Thanks for reading. Hope you appreciated it.
Don’t hesitate to reach me at pimpaudben@gmail.com if you have suggestions, comments or any other thoughts.