05/12/2021
Percona Live Online 2021
Why We Chose Trino. Choosing, Using, and Extending Trino (fka PrestoSQL) For a Primary Datastore by Rob Dickinson, CTO, Resurface Labs.
There are a lot of capture-first pipelines out there, which are very good at squirreling data away, but are relatively slow or cumbersome for queries. For example, Kafka and Pulsar are great for write performance, but horribly slow at scanning all the data in the queue.
For a query-first architecture, a different mindset and approach is required. You can’t build the whole data pipeline and then hope to tune the queries after the fact.
Instead, you model the query behaviors you hope to achieve first, and then work backward to define ingestion and indexing requirements. Build it fast, keep it fast. Enter Trino, a distributed query engine that is an ideal starting point for a query-first data architecture like ours. To use it, we built a custom memory connector to use Trino as a primary memory store. An unusual, but fun, use for Trino.
We’ve developed Trino connectors that are optimized to work with local data, so that there is no network hop between the query engine and the data being computed. This gives a 5-20X improvement for our workloads compared with running against even the fastest remote datastores. I’ll walk through the discovery process to get to Trino, and how we built a new Trino connector.
Speaker: Rob Dickinson – Resurface Labs
