Reminder: code is technical debt. The optimal spot for engineers is to write the least code possible but still write code.
And SQL makes this particularly painful. Like every data person, I've written my share of 300+ line queries - cluttered with nested CASE WHEN statements, complex window functions, and endless ORDER BY clauses. We've all been there.
It's very hard to build up in SQL. To write a piece of code and reuse it across different projects. The core reusability elements are tables, not the code to create them.
That's why BigFunctions caught my attention. It's built specifically to solve these pains, and honestly, much more.
The real benefit of tools like BigFunctions is simple: they move us closer to that sweet spot of "write the least code possible but still write code."
Remember: code is for humans, not machines. There's immense value in reducing code footprint and accelerating the "insight to action" motion.
Think about it - the progression from Assembly to C to C++ to Java/Scala to Python to SQL wasn't random. Each step made code (and data) more human-readable, and more accessible.
I've written before about SQL's limitations for analytics, and I stand by that critique. But BigFunction made me realize something important: the issue isn't really about SQL's syntax. It's about the semantics underneath, the core capabilities, and how we interface with our data through that syntax.
SQL syntax is not enough
As far as I know, the most advanced tool allowing the building of reusable patterns for analytics with SQL is dbt. And especially what takes an important part of dbt: Jinja.
The more I used Jinja in this setup, the more I feel it's a missed opportunity. Jinja is a templating engine, not an expression language. That's why there's either a painful build process involved, or some intermediate black box that's very hard to inspect. When something goes wrong, it's really hard to understand why -- is it some problem with the underlying dataset? Or is there a bug in the codegen system?
In "normal" software engineering, the language itself has all the necessary primitives for reusability. Java and Python's notion of classes, functions, and types mean you can write logic that can be used on any input data that conforms to a given interface. Not so with SQL.1
That's why we see technologies emerging like BigFunction to bridge the gap. BigFunctions is more than SQL:
It brings functionality, composability, to SQL.
It brings the syntax that fits best the semantics at play. It moves data work closer to the "insight to action" motion needed; all inside the data warehouse where SQL is the lingua franca.
IN SQL, the SELECT statement is under the hood of C++ machinery. BigFunctions extends SQL with the same idea but with vanilla SQL, Python, or anything under the hood.
Instead of writing long CASE WHEN statements, it gives the power to embed everything in functions. Hence, improving the code written - both from a maintenance and readiness perspective.
Separation of Concerns
As written some time ago, data engineers may find themselves moving towards business-related roles (Software engineers, ML engineers, Analytics engineers, Product managers, Data analysts) where they can use their data skills to develop data-driven solutions for the organization’s needs.
Alternatively, data engineers may move towards operations-focused roles (SRE, DevOps, DevXP, Platform Engineer) where they can apply their expertise in managing data infrastructure and ensuring its reliability and scalability.
With BigFunctions we clearly see this separation of concerns:
Data analysts and analytics engineers should use the tool and focus on business (insight to action): their daily tools shouldn't be 300 hundred lines of SQL but a handful and expressive set of routines. BigFunction allows them to use a syntax closer to the semantics of their work and still express things in a declarative and "as code" way.
Data platform engineers should build and maintain the platform: they are ultimately software engineers who write code. Again, they want to write the little code as possible but operate a level below the data analyst. Hence the usage of declarative technologies - Terraform, Kestra, etc. - and so BigFunctions.
It's a call for YAML engineering again. YAML is a nice interface: you still write code, but mainly to hide inherent complexity on the backend.
BigFunctions uses the same principle as you declare a function within a YAML file: it makes the function easy to maintain, deploy, and share.
But SQL is not Designed for Analytics
Let's be clear: SQL is here to stay. It's the most effective interface we have to the database, and that's not changing anytime soon.
What's changing is how we use SQL. Projects like BigFunctions are transforming how we bring maturity and data literacy to companies. They're accelerating the "insight to action" motion in ways we couldn't imagine before.
Yes, SQL was originally designed for OLTP applications 30 years ago. That's why there are no functions in it.
But that's also exactly why tools like BigFunction matter so much – they bridge the gap between SQL's origins and modern analytics needs. They bring proper separation of concerns to the data stack: data platform engineers get powerful tools designed for their work, analytics engineers get the abstractions they need, and data analysts get intuitive interfaces for their analysis.
At some points, we might move away from it. We will find other designs that fit best with the semantics of what we want to do with data:
I’ve worked almost exclusively in SQL for much of my career; just as English defines how I see the world, my understanding of data has been rewired around relational notions of tables and joins. Some people think in sentences, some in non-verbal thoughts—and data people, it seems, think in SQL. The more we can do in our native language, I’ve long thought, the easier our jobs will be.
In recent weeks, however, my faith has started to waver. Perhaps, perhaps, we’ve overreached. Perhaps, to nail a disputation on the church door,4 one of the core responsibilities of a data team—modeling a business, and defining a semantic layer—is best done in another language.2
The only thing I can take action on now is to write it thoughtfully, as little as possible, but still write with it.
Here, BigFunctions gives us the best of both worlds: the familiarity and power of SQL, enhanced with the software engineering principles we need to build robust, maintainable data systems.
By the way: BigFunctions is open-source! I encourage you to star the Github repository and explore the different functions it provides out of the box. After playing with it a few times, it definitely made me want to invest more time into cleaning my SQL queries and uncluttering my data modeling.
https://www.linkedin.com/in/carlineng/