Trino to Boost Open Source SQL Query Performance

0


The open source distributed SQL query engine Trino has had a big year in 2021 and is gearing up for more innovations in the year to come.

At the recent Trino Summit virtual event, Trino supporters and users detailed the use cases for the open source distributed SQL query engine. The event was sponsored by commercial vendor Trino Starburst, a major contributor to the open source Trino project.

Prior to 2021, Trino was known as PrestoSQL, which was a competitive effort for related technology supported by the Linux Foundation known as PrestoDB.

At the Trino Summit, several users, including LinkedIn, Electronic Arts, Robinhood, and DoorDash, took the virtual stage to explain how their organizations are using Trino at scale to enable distributed data queries.

We use Trino to build our master data query platform which allows us to make data driven analysis and decisions.

Grace LuSenior Software Engineer, Robinhood

“We are using Trino to build our master data query platform that enables us to make data-driven decisions and analyzes,” said Grace Lu, senior software engineer at investment application provider Robinhood, in a user session on October 22.

How Trino helps Robinhood with a distributed SQL engine

Robinhood uses Trino for its own internal applications. These applications include data analysis and business intelligence, as well as global platform visibility to help resolve availability and performance issues.

Robinhood has multiple Trino clusters that connect to different data sources and allow business users to run queries against those data sources.

Among the data sources are several PostgreSQL databases that Robinhood uses as its primary source of transactional data. Robinhood also uses an Alation data catalog as well as the Looker Analysis Platform, both of which are connected to Robinhood’s data sources with Trino to allow users to query data.

DoorDash integrates Trino for distributed SQL queries

The pandemic has sparked an upsurge in business for food delivery services, including DoorDash. During a user session on October 21, Akshat Nair, head of engineering for the San Francisco-based company, explained in detail how the organization uses Trino to enable distributed data queries.

DoorDash has a complex data architecture that uses PostgreSQL, Apache Cassandra, and CockroachDB as primary data sources. For real-time event streaming, DoorDash uses Kafka. Some of the data ends up in a Snowflake cloud data warehouse, while some is routed to an Amazon S3-based data lake.

DoorDash is now in an early adoption phase for Trino and uses it to enable queries on its data architecture, Nair said. DoorDash’s initial use case is similar to Robinhood, allowing internal users to run data analytics on business processes and operations.

“We’re in an adoption phase right now, so the volume of queries isn’t huge, but the data processed is measured in terabytes and petabytes for some of these tables,” Nair said.

DoorDash has a complex data architecture and is now starting to use Trino to enable distributed SQL queries.

Trino State Moves Forward

Martin Traverso, co-creator of Trino and CTO of Starburst, gave an overview in a keynote presentation on October 21 of the technical progress made by Trino this year and the direction the supplier is taking.

Traverso explained that PrestoDB, which was renamed Trino in December, and PrestoSQL, really started to diverge in 2019. He noted that although the two projects have a common history, more than 40% of the changes have occurred since 2019. and all of these modifications are exclusive to Trino.

A number of new capabilities will arrive at Trino over the next few months, Traverso said. Among them is an ability that Traverso called granular fault tolerance.

One of the big limitations of Trino now is that if a query exceeds the amount of memory available in a cluster, the query will fail. With the granular fault tolerance capability, the query engine will be able to retry a query to help it succeed, instead of just failing completely.

Trino uses the Java programming language at its core. Traverso noted that Trino is currently based on Java 11, which is several years old. In the coming months, Trino will switch to the new Java 17 as a base.

“We’ve actually started doing benchmarking with Java 17, and we’re seeing that we’re getting a 20% performance improvement,” Traverso said. “So it’s very important to be able to move to Java 17 as the platform Trino is built on.”


Leave A Reply

Your email address will not be published.