Vue d'ensemble Dremio

Confidential - Do Not Share or Distribute
Dremio
The Easy and Open Lakehouse Platform
1

The Easy and Open Data Lakehouse Platform
– Data warehouse performance directly on the lake
– Query acceleration to eliminate copies and BI extracts
– Semantic layer to enable governed self-service
– Database connectors to enable queries on other sources
Enterprise Adoption
– 1000s of companies across all industries
– 5 of the Fortune 10
Open Source & Community
– Apache Arrow (60M+ downloads/m), Apache Iceberg, Nessie
– Creator and host of Subsurface LIVE conference
About Dremio

3 Confidential - Do Not Share or Distribute
SQL
Data Science Dashboards Apps
Companies Want to Democratize Data… But How?
▪ Everyone wants access
▪ Data volumes are
exploding
▪ Security risks
▪ Compliance requirements
▪ Limited resources
Application Databases | IoT | Web | Logs
Continuous New Data
ADLS RDBMS
S3 GCS
Cloud Object Storage On-Prem

SQL
Data Warehouses: Expensive, Proprietary, Complex
Continuous New Data
✗ Skyrocketing costs
✗ Vendor lock-in
✗ Exploding backlog
✗ Can’t explore data
✗ No self-service
ADLS RDBMS
S3 GCS

SQL
Dremio Data Lakehouse: Easy, Open, 1/10th the Cost
Continuous New Data
⇅ ODBC | JDBC | REST | Arrow Flight ⇅
⇅ Parallelism | Caching | Optimized Push-Downs ⇅
✓ Sub-second performance
✓ Eliminate Data Silos
✓ Improve Data Discovery and
Access
✓ No Data Movement Required
✓ No Copies
✓ Inexpensive
✓ No lock-in
ADLS RDBMS
S3 GCS

Raw
zone
Physical
datasets
Semantic
zone
Virtual
datasets
Data
Engineers
BI Users
SQL
Data Scientists
ADLS S3
or or
Acceleration
(Data Reflections)
Data
Analysts
and
Engineers
IT-Governed Self-Service Semantic Layer
Standardized, User-Defined Abstraction Layer Enabling Virtual Data Sets, with an Easy-to-Use UI
Data Analysts
✓ Consistent business logic & KPIs
✓ No more waiting for IT
✓ Use visualization tool(s) of choice
Data Engineers & Architects
✓ Centralize data security & governance
✓ No more reactive, tedious work
✓ Easy collaboration with data analysts

SQL
A Realistic Example: DW Offload
Continuous New Data
✗ Maxed capacity
✗ End-of-life support
✗ Complex ETL processes
✗ Legacy query engines
performance
RDBMS

SQL
A Realistic Example: DW Offload
Continuous New Data
RDBMS
✓ Unified layer
✓ Combine DW and DL data
✓ Address DW capacity issues
✓ Smooth transition
Third-Party Data
Bloomberg, S&P, AWS Data Exchange…
Semantic Model

Fast Performance
Apache Arrow-based columnar
execution increases throughput and
reduces cost
Transparent Acceleration
Reflections enable sub-second
queries and eliminate copies and BI
extracts
Semantic Layer
Data teams define and expose a
logical data model for governed
self-service
Ingest & Transform Data
DML and dbt integration help ingest
data into the lakehouse and transform
it as needed.
Open Data Formats
Apache Iceberg ensures no vendor
lock-in and the flexibility to use any
engine.
Enterprise-Grade Security
Role-based access control, native
row/column-level policies and
advanced integrations.
Dremio at a glance

Powering Analytics for Thousands of Companies
16

Merci !
10

Open Source Roots: Apache Arrow Inside
– Dremio seeded the market with its internal memory format
– Arrow now downloaded over 60M times per month
– Dremio is the only Arrow-based engine in the market
11
Apache Arrow was created by Dremio
– Data is immediately read into Arrow
– All operators use Arrow as input and output
– Gandiva: LLVM-based vectorized execution on Apache Arrow
Arrow-based vectorized execution

Data Sources
Data Lake Engine
BI Users
SQL
Data Scientists
Data
Consumer
Tools
⇅ Optimized Push-Downs ⇅
Coordinator
Node
Executor
Nodes
Orchestrated via Cloud, Kubernetes or YARN
External Data Reﬂection Stores Data Reﬂection Stores
Executor
Nodes
Executor
Nodes
Coordinator
Node
Coordinator
Node
DREMIO
Dremio deployment architecture

Query Acceleration: BI on Data Lakes
Columnar Cloud Cache (C3) Data Reflections
13
– NVMe-level I/O performance on S3/ADLS/GCS
– Eliminate S3/ADLS I/O costs (10-15% of cost per query)
– Use existing NVMe/SSD on EC2 instances & Azure VMs
– Transparent to analysts and engineers
– Enable low-latency (including sub-second) BI queries
– Eliminate cubes and BI extracts
– Reduce infrastructure costs by up to 100x
– Persisted on Data Lake as Parquet/Iceberg tables
– Transparent to analysts (advanced query plan rewrites)
NVMe NVMe NVMe NVMe
Data Lake
Columnar Cloud Cache (C3)
Executor Executor Executor Executor
ENGINE
User-specific cubes, extracts, aggregations
Domain-specific data marts
User picks the best optimization
DL/DW
Dremio picks the best optimization
DL
TRADITIONAL DREMIO
Reflections

Multi-Engine Architecture
XL
M
L
Engine Routing Rules
● User
● Roles
● Query type
● Query cost
● Connection parameters
● Date & time
● ...
Queues Engines
Query
14
LOWER EC2 COSTS
Auto-stop/start and right-sized
engines eliminate the need to
over-provision infrastructure.
60% NOISY NEIGHBOR CONCERNS
Workloads are physically separated
so one workload can’t impact the
performance of another workload.
0 CONTROL OF RESOURCES
Control resource allocation with policies
such as query priority, max query cost,
max queue time, max runtime, etc.
100%

The Dremio Advantage
Open Data, No Lock-In
● Modern and Intuitive User Interface
● Unified View of Data (on-prem,
hybrid and Cloud)
● Federated Queries
Based on community-driven standards:
● Apache Parquet
● Apache Iceberg
● Apache Arrow
Sub-Second Performance
at 1/10th the Cost
Self–Service Analytics
● Lightning-fast queries
● High concurrency
● No expensive data copies to manage
● No semantic layer
● No federated queries
● Cloud only
● Proprietary platform
● Must ingest data in order to query it
● Limited Apache Iceberg support
● Very expensive
● Data duplication
● No query acceleration
● Poor performance with open standards
● Designed for batch processing
(ETL/data science)
● No semantic layer
● Experimental federated queries
● Cloud only
● Focused on Delta Lake, not Apache
Iceberg
● No query acceleration, BI
extracts/imports required for low latency
● Limited and expensive for data serving
● Proven cost reduction after replacement
by Dremio

Vue d'ensemble Dremio

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Vue d'ensemble Dremio

Similar to Vue d'ensemble Dremio (20)

More from Modern Data Stack France

More from Modern Data Stack France (20)

Recently uploaded

Recently uploaded (20)

Vue d'ensemble Dremio