We've been using DuckDB in production for a while now, and tested several approaches.
For now, the most (cost) efficient way we have found was to stream parquet files from a NFS based host.
The object storage model is way too expensive as you're paying on a per query basis. NFS file server makes it much easier, especially on the streaming part.
Small world : you live / work 10 minutes from my place and my boss is an investor in your company 🤣.
Not that much concurrency, but we read thousand files in s single user requests. And we don't need versioning as these files are the final result of aggregated data.
We've been using DuckDB in production for a while now, and tested several approaches.
For now, the most (cost) efficient way we have found was to stream parquet files from a NFS based host.
The object storage model is way too expensive as you're paying on a per query basis. NFS file server makes it much easier, especially on the streaming part.
Oh interesting. Do you have many concurrent read/write IOs there?
Also wondering how manage file versioning (but I can think of a snapshot/archive strategy as storage there is very cheap...)?
Small world : you live / work 10 minutes from my place and my boss is an investor in your company 🤣.
Not that much concurrency, but we read thousand files in s single user requests. And we don't need versioning as these files are the final result of aggregated data.
Ahah I have DM'd you on LinkedIn 😉