Snappy compression

6/18/2023

And although nothing has been published yet, there are efforts underway to embed GDAL via its Arrow-based interface into DuckDB.Ī min-max index is created for every column segment in DuckDB. Isaac Brodsky, a Principal Engineer at Foursquare, Unfolded's parent company, has also recently published an H3 extension for DuckDB. Work is underway on an extension which will port Paul Ramsey's PostGIS geospatial functionality to DuckDB. Thankfully, DuckDB supports 25 data types out of the box and more can be added via extensions. I was always frustrated by this as working with time would require transforms in every SELECT statement and not being able to describe a field as a boolean meant analysis software couldn't automatically recognise and provide specific UI controls and visualisations of these fields. SQLite supports five data types, NULL, INTEGER, REAL, TEXT and BLOB. DuckDB uses PostgreSQL's SQL parser, Google’s RE2 regular expression engine and SQLite's shell. Development is very active with the commit count on its GitHub repo doubling nearly every year since it began in 2018. It's made up of a million lines of C++ and runs as a stand-alone binary.

One of the tools I've been examining to help me with this is DuckDB.ĭuckDB is primarily the work of Mark Raasveldt and Hannes Mühleisen. In recent months, I've been exploring ways to both increase the throughput of our pipeline and find ways of reducing engineering labour. These were reported to the FCC by broadband providers across the US. We grouped records by H3's zoom level 9 and took the highest average download speed from each hexagon, plotting them on the globe.īelow is another visualisation we produced showing the difference in top broadband speeds between June and November last year. Below is a visualisation we produced of Ookla's Speedtest from last summer. Engineers working with LocalSolver will then source their data from a pristine and up-to-date version on BigQuery.Īnalysis feeds the imagination of the engineers so we're often visualising data we've received using Unfolded. We then shape and enrich the data, often with PostGIS or ClickHouse, before shipping it off to BigQuery. Data arrives in a wide variety, but often never optimal formats. Our data platform is largely made up of PostGIS, ClickHouse and BigQuery. Below is a figurative example of one of these plans. This data is used by engineers working on optimal fiber-optic network rollout plans using LocalSolver. My role is to manage a data platform that holds 30 billion records from some 70 sources of information. At present, they have a handful of networks in the Bay Area but have plans to expand across the US. Based in Atherton, California, the company builds and manages fiber-optic networks. For much of the past year, I have been working with Hexvarium.

0 Comments

Snappy compression

Leave a Reply.

Author

Archives

Categories