Apache Spark

Presto on Apache Spark

Senthil Nayagan
Oct 6, 2022 - 1 Mins Read

Overview
Presto’s strengths and weaknesses
- Strengths
- Weaknesses
Frequently asked questions (FAQ)
- What is federated query engine?
Why optimized data warehouses are faster than federated query engine?

Overview

Note: The Presto referred to here is the PrestoDB, not the PrestoSQL or Trino.

Presto’s strengths and weaknesses

Presto has both strengths and weaknesses.

Strengths

It’s ANSI SQL.
Widely adopted.
Interactive.
Federated query design.
Extensively used in scheduled (batch) workloads.

Weaknesses

Scale limitations.
High memory query reliability.
Long running query reliability.

Frequently asked questions (FAQ)

What is federated query engine?

A federated query is a way to send a query statement across data stored in various external data sources, such as relational, non-relational, object, or custom data sources. The federated query engine runs in a completely decoupled architecture, with computing on one side and storage on the other side.

What makes Query Federation such a game-changing breakthrough is its ability to simplify the process of accessing data from a variety of sources via the use of a single query. This is due to the fact that in the past, combining data from a variety of sources was a time-consuming and tedious procedure. In order to combine several data sources into a single, standardized format, we will need to use ETL operations.

Federated query engines are great for the infrequent analytics use cases where we can’t have the data in a single place and second-level performance isn’t important.

Why optimized data warehouses are faster than federated query engine?

TODO

Comments

comments powered by Disqus

Explore more like this

apache spark presto prestodb

Partitions and Bucketing in Spark

Senthil Nayagan
Jul 25, 2022 - 13 Mins Read

Need for Caching in Apache Spark

Senthil Nayagan
Jul 24, 2022 - 2 Mins Read