SlideShare a Scribd company logo
1 of 16
Confidential - Do Not Share or Distribute
Dremio
The Easy and Open Lakehouse Platform
1
Confidential - Do Not Share or Distribute
The Easy and Open Data Lakehouse Platform
– Data warehouse performance directly on the lake
– Query acceleration to eliminate copies and BI extracts
– Semantic layer to enable governed self-service
– Database connectors to enable queries on other sources
Enterprise Adoption
– 1000s of companies across all industries
– 5 of the Fortune 10
Open Source & Community
– Apache Arrow (60M+ downloads/m), Apache Iceberg, Nessie
– Creator and host of Subsurface LIVE conference
About Dremio
3 Confidential - Do Not Share or Distribute
SQL
Data Science Dashboards Apps
Companies Want to Democratize Data… But How?
▪ Everyone wants access
▪ Data volumes are
exploding
▪ Security risks
▪ Compliance requirements
▪ Limited resources
Application Databases | IoT | Web | Logs
Continuous New Data
ADLS RDBMS
S3 GCS
Cloud Object Storage On-Prem
4 Confidential - Do Not Share or Distribute
SQL
Data Science Dashboards Apps
Data Warehouses: Expensive, Proprietary, Complex
Application Databases | IoT | Web | Logs
Continuous New Data
✗ Skyrocketing costs
✗ Vendor lock-in
✗ Exploding backlog
✗ Can’t explore data
✗ No self-service
ADLS RDBMS
S3 GCS
Cloud Object Storage On-Prem
5 Confidential - Do Not Share or Distribute
SQL
Data Science Dashboards Apps
Dremio Data Lakehouse: Easy, Open, 1/10th the Cost
Application Databases | IoT | Web | Logs
Continuous New Data
⇅ ODBC | JDBC | REST | Arrow Flight ⇅
⇅ Parallelism | Caching | Optimized Push-Downs ⇅
✓ Sub-second performance
✓ Eliminate Data Silos
✓ Improve Data Discovery and
Access
✓ No Data Movement Required
✓ No Copies
✓ Inexpensive
✓ No lock-in
ADLS RDBMS
S3 GCS
Cloud Object Storage On-Prem
6 Confidential - Do Not Share or Distribute
Raw
zone
Physical
datasets
Semantic
zone
Virtual
datasets
Data
Engineers
BI Users
SQL
Data Scientists
⇅ ODBC | JDBC | REST | Arrow Flight ⇅
ADLS S3
or or
Acceleration
(Data Reflections)
Data
Analysts
and
Engineers
IT-Governed Self-Service Semantic Layer
Standardized, User-Defined Abstraction Layer Enabling Virtual Data Sets, with an Easy-to-Use UI
Data Analysts
✓ Consistent business logic & KPIs
✓ No more waiting for IT
✓ Use visualization tool(s) of choice
Data Engineers & Architects
✓ Centralize data security & governance
✓ No more reactive, tedious work
✓ Easy collaboration with data analysts
7 Confidential - Do Not Share or Distribute
SQL
Data Science Dashboards Apps
A Realistic Example: DW Offload
Application Databases | IoT | Web | Logs
Continuous New Data
✗ Maxed capacity
✗ End-of-life support
✗ Complex ETL processes
✗ Legacy query engines
performance
RDBMS
8 Confidential - Do Not Share or Distribute
SQL
Data Science Dashboards Apps
A Realistic Example: DW Offload
Application Databases | IoT | Web | Logs
Continuous New Data
RDBMS
✓ Unified layer
✓ Combine DW and DL data
✓ Address DW capacity issues
✓ Smooth transition
Third-Party Data
Bloomberg, S&P, AWS Data Exchange…
Semantic Model
Fast Performance
Apache Arrow-based columnar
execution increases throughput and
reduces cost
Transparent Acceleration
Reflections enable sub-second
queries and eliminate copies and BI
extracts
Semantic Layer
Data teams define and expose a
logical data model for governed
self-service
Ingest & Transform Data
DML and dbt integration help ingest
data into the lakehouse and transform
it as needed.
Open Data Formats
Apache Iceberg ensures no vendor
lock-in and the flexibility to use any
engine.
Enterprise-Grade Security
Role-based access control, native
row/column-level policies and
advanced integrations.
Dremio at a glance
Confidential - Do Not Share or Distribute
Powering Analytics for Thousands of Companies
16
Confidential - Do Not Share or Distribute
Merci !
10
Confidential - Do Not Share or Distribute
Open Source Roots: Apache Arrow Inside
– Dremio seeded the market with its internal memory format
– Arrow now downloaded over 60M times per month
– Dremio is the only Arrow-based engine in the market
11
Apache Arrow was created by Dremio
– Data is immediately read into Arrow
– All operators use Arrow as input and output
– Gandiva: LLVM-based vectorized execution on Apache Arrow
Arrow-based vectorized execution
Confidential - Do Not Share or Distribute
Data Sources
Data Lake Engine
BI Users
SQL
Data Scientists
Data
Consumer
Tools
⇅ ODBC | JDBC | REST | Arrow Flight ⇅
⇅ Optimized Push-Downs ⇅
Coordinator
Node
Executor
Nodes
Orchestrated via Cloud, Kubernetes or YARN
External Data Reflection Stores Data Reflection Stores
Executor
Nodes
Executor
Nodes
Coordinator
Node
Coordinator
Node
DREMIO
Dremio deployment architecture
Confidential - Do Not Share or Distribute
Query Acceleration: BI on Data Lakes
Columnar Cloud Cache (C3) Data Reflections
13
– NVMe-level I/O performance on S3/ADLS/GCS
– Eliminate S3/ADLS I/O costs (10-15% of cost per query)
– Use existing NVMe/SSD on EC2 instances & Azure VMs
– Transparent to analysts and engineers
– Enable low-latency (including sub-second) BI queries
– Eliminate cubes and BI extracts
– Reduce infrastructure costs by up to 100x
– Persisted on Data Lake as Parquet/Iceberg tables
– Transparent to analysts (advanced query plan rewrites)
NVMe NVMe NVMe NVMe
Data Lake
Columnar Cloud Cache (C3)
Executor Executor Executor Executor
ENGINE
User-specific cubes, extracts, aggregations
Domain-specific data marts
User picks the best optimization
DL/DW
Dremio picks the best optimization
DL
TRADITIONAL DREMIO
Reflections
Confidential - Do Not Share or Distribute
Multi-Engine Architecture
XL
M
L
Engine Routing Rules
● User
● Roles
● Query type
● Query cost
● Connection parameters
● Date & time
● ...
Queues Engines
Query
14
LOWER EC2 COSTS
Auto-stop/start and right-sized
engines eliminate the need to
over-provision infrastructure.
60% NOISY NEIGHBOR CONCERNS
Workloads are physically separated
so one workload can’t impact the
performance of another workload.
0 CONTROL OF RESOURCES
Control resource allocation with policies
such as query priority, max query cost,
max queue time, max runtime, etc.
100%
15 Confidential - Do Not Share or Distribute
The Dremio Advantage
Open Data, No Lock-In
● Modern and Intuitive User Interface
● Unified View of Data (on-prem,
hybrid and Cloud)
● Federated Queries
Based on community-driven standards:
● Apache Parquet
● Apache Iceberg
● Apache Arrow
Sub-Second Performance
at 1/10th the Cost
Self–Service Analytics
● Lightning-fast queries
● High concurrency
● No expensive data copies to manage
● No semantic layer
● No federated queries
● Cloud only
● Proprietary platform
● Must ingest data in order to query it
● Limited Apache Iceberg support
● Very expensive
● Data duplication
● No query acceleration
● Poor performance with open standards
● Designed for batch processing
(ETL/data science)
● No semantic layer
● Experimental federated queries
● Cloud only
● Focused on Delta Lake, not Apache
Iceberg
● No query acceleration, BI
extracts/imports required for low latency
● Limited and expensive for data serving
● Proven cost reduction after replacement
by Dremio

More Related Content

What's hot

Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Session découverte de la Data Virtualization
Session découverte de la Data VirtualizationSession découverte de la Data Virtualization
Session découverte de la Data VirtualizationDenodo
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesCarole Gunst
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...HostedbyConfluent
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingDatabricks
 
Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesCitiusTech
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 

What's hot (20)

Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Session découverte de la Data Virtualization
Session découverte de la Data VirtualizationSession découverte de la Data Virtualization
Session découverte de la Data Virtualization
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 
Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best Practices
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 

Similar to Vue d'ensemble Dremio

Data Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSDenodo
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...DataStax Academy
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Cloudera, Inc.
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionCloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
Updates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDSUpdates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDSShapeBlue
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
 
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)Manoj Kumar
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Platform Deep Dive
Platform Deep DivePlatform Deep Dive
Platform Deep DiveConrad23
 

Similar to Vue d'ensemble Dremio (20)

Data Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWS
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Updates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDSUpdates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDS
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Platform Deep Dive
Platform Deep DivePlatform Deep Dive
Platform Deep Dive
 

More from Modern Data Stack France

Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupModern Data Stack France
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Modern Data Stack France
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Modern Data Stack France
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...Modern Data Stack France
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with sparkModern Data Stack France
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlusModern Data Stack France
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)Modern Data Stack France
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Modern Data Stack France
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Modern Data Stack France
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015Modern Data Stack France
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandationModern Data Stack France
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Modern Data Stack France
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielModern Data Stack France
 
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REXHadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REXModern Data Stack France
 
The Cascading (big) data application framework
The Cascading (big) data application frameworkThe Cascading (big) data application framework
The Cascading (big) data application frameworkModern Data Stack France
 

More from Modern Data Stack France (20)

Stash - Data FinOPS
Stash - Data FinOPSStash - Data FinOPS
Stash - Data FinOPS
 
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Hug janvier 2016 -EDF
Hug   janvier 2016 -EDFHug   janvier 2016 -EDF
Hug janvier 2016 -EDF
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlus
 
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr  SPARK & RIAK -20160114_hug_franceHugfr  SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
 
Spark dataframe
Spark dataframeSpark dataframe
Spark dataframe
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandation
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
 
Spark meetup at viadeo
Spark meetup at viadeoSpark meetup at viadeo
Spark meetup at viadeo
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
 
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REXHadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
 
The Cascading (big) data application framework
The Cascading (big) data application frameworkThe Cascading (big) data application framework
The Cascading (big) data application framework
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 

Vue d'ensemble Dremio

  • 1. Confidential - Do Not Share or Distribute Dremio The Easy and Open Lakehouse Platform 1
  • 2. Confidential - Do Not Share or Distribute The Easy and Open Data Lakehouse Platform – Data warehouse performance directly on the lake – Query acceleration to eliminate copies and BI extracts – Semantic layer to enable governed self-service – Database connectors to enable queries on other sources Enterprise Adoption – 1000s of companies across all industries – 5 of the Fortune 10 Open Source & Community – Apache Arrow (60M+ downloads/m), Apache Iceberg, Nessie – Creator and host of Subsurface LIVE conference About Dremio
  • 3. 3 Confidential - Do Not Share or Distribute SQL Data Science Dashboards Apps Companies Want to Democratize Data… But How? ▪ Everyone wants access ▪ Data volumes are exploding ▪ Security risks ▪ Compliance requirements ▪ Limited resources Application Databases | IoT | Web | Logs Continuous New Data ADLS RDBMS S3 GCS Cloud Object Storage On-Prem
  • 4. 4 Confidential - Do Not Share or Distribute SQL Data Science Dashboards Apps Data Warehouses: Expensive, Proprietary, Complex Application Databases | IoT | Web | Logs Continuous New Data ✗ Skyrocketing costs ✗ Vendor lock-in ✗ Exploding backlog ✗ Can’t explore data ✗ No self-service ADLS RDBMS S3 GCS Cloud Object Storage On-Prem
  • 5. 5 Confidential - Do Not Share or Distribute SQL Data Science Dashboards Apps Dremio Data Lakehouse: Easy, Open, 1/10th the Cost Application Databases | IoT | Web | Logs Continuous New Data ⇅ ODBC | JDBC | REST | Arrow Flight ⇅ ⇅ Parallelism | Caching | Optimized Push-Downs ⇅ ✓ Sub-second performance ✓ Eliminate Data Silos ✓ Improve Data Discovery and Access ✓ No Data Movement Required ✓ No Copies ✓ Inexpensive ✓ No lock-in ADLS RDBMS S3 GCS Cloud Object Storage On-Prem
  • 6. 6 Confidential - Do Not Share or Distribute Raw zone Physical datasets Semantic zone Virtual datasets Data Engineers BI Users SQL Data Scientists ⇅ ODBC | JDBC | REST | Arrow Flight ⇅ ADLS S3 or or Acceleration (Data Reflections) Data Analysts and Engineers IT-Governed Self-Service Semantic Layer Standardized, User-Defined Abstraction Layer Enabling Virtual Data Sets, with an Easy-to-Use UI Data Analysts ✓ Consistent business logic & KPIs ✓ No more waiting for IT ✓ Use visualization tool(s) of choice Data Engineers & Architects ✓ Centralize data security & governance ✓ No more reactive, tedious work ✓ Easy collaboration with data analysts
  • 7. 7 Confidential - Do Not Share or Distribute SQL Data Science Dashboards Apps A Realistic Example: DW Offload Application Databases | IoT | Web | Logs Continuous New Data ✗ Maxed capacity ✗ End-of-life support ✗ Complex ETL processes ✗ Legacy query engines performance RDBMS
  • 8. 8 Confidential - Do Not Share or Distribute SQL Data Science Dashboards Apps A Realistic Example: DW Offload Application Databases | IoT | Web | Logs Continuous New Data RDBMS ✓ Unified layer ✓ Combine DW and DL data ✓ Address DW capacity issues ✓ Smooth transition Third-Party Data Bloomberg, S&P, AWS Data Exchange… Semantic Model
  • 9. Fast Performance Apache Arrow-based columnar execution increases throughput and reduces cost Transparent Acceleration Reflections enable sub-second queries and eliminate copies and BI extracts Semantic Layer Data teams define and expose a logical data model for governed self-service Ingest & Transform Data DML and dbt integration help ingest data into the lakehouse and transform it as needed. Open Data Formats Apache Iceberg ensures no vendor lock-in and the flexibility to use any engine. Enterprise-Grade Security Role-based access control, native row/column-level policies and advanced integrations. Dremio at a glance
  • 10. Confidential - Do Not Share or Distribute Powering Analytics for Thousands of Companies 16
  • 11. Confidential - Do Not Share or Distribute Merci ! 10
  • 12. Confidential - Do Not Share or Distribute Open Source Roots: Apache Arrow Inside – Dremio seeded the market with its internal memory format – Arrow now downloaded over 60M times per month – Dremio is the only Arrow-based engine in the market 11 Apache Arrow was created by Dremio – Data is immediately read into Arrow – All operators use Arrow as input and output – Gandiva: LLVM-based vectorized execution on Apache Arrow Arrow-based vectorized execution
  • 13. Confidential - Do Not Share or Distribute Data Sources Data Lake Engine BI Users SQL Data Scientists Data Consumer Tools ⇅ ODBC | JDBC | REST | Arrow Flight ⇅ ⇅ Optimized Push-Downs ⇅ Coordinator Node Executor Nodes Orchestrated via Cloud, Kubernetes or YARN External Data Reflection Stores Data Reflection Stores Executor Nodes Executor Nodes Coordinator Node Coordinator Node DREMIO Dremio deployment architecture
  • 14. Confidential - Do Not Share or Distribute Query Acceleration: BI on Data Lakes Columnar Cloud Cache (C3) Data Reflections 13 – NVMe-level I/O performance on S3/ADLS/GCS – Eliminate S3/ADLS I/O costs (10-15% of cost per query) – Use existing NVMe/SSD on EC2 instances & Azure VMs – Transparent to analysts and engineers – Enable low-latency (including sub-second) BI queries – Eliminate cubes and BI extracts – Reduce infrastructure costs by up to 100x – Persisted on Data Lake as Parquet/Iceberg tables – Transparent to analysts (advanced query plan rewrites) NVMe NVMe NVMe NVMe Data Lake Columnar Cloud Cache (C3) Executor Executor Executor Executor ENGINE User-specific cubes, extracts, aggregations Domain-specific data marts User picks the best optimization DL/DW Dremio picks the best optimization DL TRADITIONAL DREMIO Reflections
  • 15. Confidential - Do Not Share or Distribute Multi-Engine Architecture XL M L Engine Routing Rules ● User ● Roles ● Query type ● Query cost ● Connection parameters ● Date & time ● ... Queues Engines Query 14 LOWER EC2 COSTS Auto-stop/start and right-sized engines eliminate the need to over-provision infrastructure. 60% NOISY NEIGHBOR CONCERNS Workloads are physically separated so one workload can’t impact the performance of another workload. 0 CONTROL OF RESOURCES Control resource allocation with policies such as query priority, max query cost, max queue time, max runtime, etc. 100%
  • 16. 15 Confidential - Do Not Share or Distribute The Dremio Advantage Open Data, No Lock-In ● Modern and Intuitive User Interface ● Unified View of Data (on-prem, hybrid and Cloud) ● Federated Queries Based on community-driven standards: ● Apache Parquet ● Apache Iceberg ● Apache Arrow Sub-Second Performance at 1/10th the Cost Self–Service Analytics ● Lightning-fast queries ● High concurrency ● No expensive data copies to manage ● No semantic layer ● No federated queries ● Cloud only ● Proprietary platform ● Must ingest data in order to query it ● Limited Apache Iceberg support ● Very expensive ● Data duplication ● No query acceleration ● Poor performance with open standards ● Designed for batch processing (ETL/data science) ● No semantic layer ● Experimental federated queries ● Cloud only ● Focused on Delta Lake, not Apache Iceberg ● No query acceleration, BI extracts/imports required for low latency ● Limited and expensive for data serving ● Proven cost reduction after replacement by Dremio