SlideShare a Scribd company logo
Confidential - Do Not Share or Distribute
Dremio
The Easy and Open Lakehouse Platform
1
Confidential - Do Not Share or Distribute
The Easy and Open Data Lakehouse Platform
– Data warehouse performance directly on the lake
– Query acceleration to eliminate copies and BI extracts
– Semantic layer to enable governed self-service
– Database connectors to enable queries on other sources
Enterprise Adoption
– 1000s of companies across all industries
– 5 of the Fortune 10
Open Source & Community
– Apache Arrow (60M+ downloads/m), Apache Iceberg, Nessie
– Creator and host of Subsurface LIVE conference
About Dremio
3 Confidential - Do Not Share or Distribute
SQL
Data Science Dashboards Apps
Companies Want to Democratize Data… But How?
▪ Everyone wants access
▪ Data volumes are
exploding
▪ Security risks
▪ Compliance requirements
▪ Limited resources
Application Databases | IoT | Web | Logs
Continuous New Data
ADLS RDBMS
S3 GCS
Cloud Object Storage On-Prem
4 Confidential - Do Not Share or Distribute
SQL
Data Science Dashboards Apps
Data Warehouses: Expensive, Proprietary, Complex
Application Databases | IoT | Web | Logs
Continuous New Data
✗ Skyrocketing costs
✗ Vendor lock-in
✗ Exploding backlog
✗ Can’t explore data
✗ No self-service
ADLS RDBMS
S3 GCS
Cloud Object Storage On-Prem
5 Confidential - Do Not Share or Distribute
SQL
Data Science Dashboards Apps
Dremio Data Lakehouse: Easy, Open, 1/10th the Cost
Application Databases | IoT | Web | Logs
Continuous New Data
⇅ ODBC | JDBC | REST | Arrow Flight ⇅
⇅ Parallelism | Caching | Optimized Push-Downs ⇅
✓ Sub-second performance
✓ Eliminate Data Silos
✓ Improve Data Discovery and
Access
✓ No Data Movement Required
✓ No Copies
✓ Inexpensive
✓ No lock-in
ADLS RDBMS
S3 GCS
Cloud Object Storage On-Prem
6 Confidential - Do Not Share or Distribute
Raw
zone
Physical
datasets
Semantic
zone
Virtual
datasets
Data
Engineers
BI Users
SQL
Data Scientists
⇅ ODBC | JDBC | REST | Arrow Flight ⇅
ADLS S3
or or
Acceleration
(Data Reflections)
Data
Analysts
and
Engineers
IT-Governed Self-Service Semantic Layer
Standardized, User-Defined Abstraction Layer Enabling Virtual Data Sets, with an Easy-to-Use UI
Data Analysts
✓ Consistent business logic & KPIs
✓ No more waiting for IT
✓ Use visualization tool(s) of choice
Data Engineers & Architects
✓ Centralize data security & governance
✓ No more reactive, tedious work
✓ Easy collaboration with data analysts
7 Confidential - Do Not Share or Distribute
SQL
Data Science Dashboards Apps
A Realistic Example: DW Offload
Application Databases | IoT | Web | Logs
Continuous New Data
✗ Maxed capacity
✗ End-of-life support
✗ Complex ETL processes
✗ Legacy query engines
performance
RDBMS
8 Confidential - Do Not Share or Distribute
SQL
Data Science Dashboards Apps
A Realistic Example: DW Offload
Application Databases | IoT | Web | Logs
Continuous New Data
RDBMS
✓ Unified layer
✓ Combine DW and DL data
✓ Address DW capacity issues
✓ Smooth transition
Third-Party Data
Bloomberg, S&P, AWS Data Exchange…
Semantic Model
Fast Performance
Apache Arrow-based columnar
execution increases throughput and
reduces cost
Transparent Acceleration
Reflections enable sub-second
queries and eliminate copies and BI
extracts
Semantic Layer
Data teams define and expose a
logical data model for governed
self-service
Ingest & Transform Data
DML and dbt integration help ingest
data into the lakehouse and transform
it as needed.
Open Data Formats
Apache Iceberg ensures no vendor
lock-in and the flexibility to use any
engine.
Enterprise-Grade Security
Role-based access control, native
row/column-level policies and
advanced integrations.
Dremio at a glance
Confidential - Do Not Share or Distribute
Powering Analytics for Thousands of Companies
16
Confidential - Do Not Share or Distribute
Merci !
10
Confidential - Do Not Share or Distribute
Open Source Roots: Apache Arrow Inside
– Dremio seeded the market with its internal memory format
– Arrow now downloaded over 60M times per month
– Dremio is the only Arrow-based engine in the market
11
Apache Arrow was created by Dremio
– Data is immediately read into Arrow
– All operators use Arrow as input and output
– Gandiva: LLVM-based vectorized execution on Apache Arrow
Arrow-based vectorized execution
Confidential - Do Not Share or Distribute
Data Sources
Data Lake Engine
BI Users
SQL
Data Scientists
Data
Consumer
Tools
⇅ ODBC | JDBC | REST | Arrow Flight ⇅
⇅ Optimized Push-Downs ⇅
Coordinator
Node
Executor
Nodes
Orchestrated via Cloud, Kubernetes or YARN
External Data Reflection Stores Data Reflection Stores
Executor
Nodes
Executor
Nodes
Coordinator
Node
Coordinator
Node
DREMIO
Dremio deployment architecture
Confidential - Do Not Share or Distribute
Query Acceleration: BI on Data Lakes
Columnar Cloud Cache (C3) Data Reflections
13
– NVMe-level I/O performance on S3/ADLS/GCS
– Eliminate S3/ADLS I/O costs (10-15% of cost per query)
– Use existing NVMe/SSD on EC2 instances & Azure VMs
– Transparent to analysts and engineers
– Enable low-latency (including sub-second) BI queries
– Eliminate cubes and BI extracts
– Reduce infrastructure costs by up to 100x
– Persisted on Data Lake as Parquet/Iceberg tables
– Transparent to analysts (advanced query plan rewrites)
NVMe NVMe NVMe NVMe
Data Lake
Columnar Cloud Cache (C3)
Executor Executor Executor Executor
ENGINE
User-specific cubes, extracts, aggregations
Domain-specific data marts
User picks the best optimization
DL/DW
Dremio picks the best optimization
DL
TRADITIONAL DREMIO
Reflections
Confidential - Do Not Share or Distribute
Multi-Engine Architecture
XL
M
L
Engine Routing Rules
● User
● Roles
● Query type
● Query cost
● Connection parameters
● Date & time
● ...
Queues Engines
Query
14
LOWER EC2 COSTS
Auto-stop/start and right-sized
engines eliminate the need to
over-provision infrastructure.
60% NOISY NEIGHBOR CONCERNS
Workloads are physically separated
so one workload can’t impact the
performance of another workload.
0 CONTROL OF RESOURCES
Control resource allocation with policies
such as query priority, max query cost,
max queue time, max runtime, etc.
100%
15 Confidential - Do Not Share or Distribute
The Dremio Advantage
Open Data, No Lock-In
● Modern and Intuitive User Interface
● Unified View of Data (on-prem,
hybrid and Cloud)
● Federated Queries
Based on community-driven standards:
● Apache Parquet
● Apache Iceberg
● Apache Arrow
Sub-Second Performance
at 1/10th the Cost
Self–Service Analytics
● Lightning-fast queries
● High concurrency
● No expensive data copies to manage
● No semantic layer
● No federated queries
● Cloud only
● Proprietary platform
● Must ingest data in order to query it
● Limited Apache Iceberg support
● Very expensive
● Data duplication
● No query acceleration
● Poor performance with open standards
● Designed for batch processing
(ETL/data science)
● No semantic layer
● Experimental federated queries
● Cloud only
● Focused on Delta Lake, not Apache
Iceberg
● No query acceleration, BI
extracts/imports required for low latency
● Limited and expensive for data serving
● Proven cost reduction after replacement
by Dremio

More Related Content

What's hot

The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Denodo
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
Data Science Thailand
 
adb.pdf
adb.pdfadb.pdf
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
Alexey Grishchenko
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Kent Graziano
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
Knoldus Inc.
 
Traditional data warehouse vs data lake
Traditional data warehouse vs data lakeTraditional data warehouse vs data lake
Traditional data warehouse vs data lake
BHASKAR CHAUDHURY
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
Empowered Holdings, LLC
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
Kent Graziano
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 

What's hot (20)

The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
adb.pdf
adb.pdfadb.pdf
adb.pdf
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
 
Traditional data warehouse vs data lake
Traditional data warehouse vs data lakeTraditional data warehouse vs data lake
Traditional data warehouse vs data lake
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 

Similar to Vue d'ensemble Dremio

Data Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWS
Denodo
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Cloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
Cloudera, Inc.
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
DataStax Academy
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Cloudera, Inc.
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
Francisco González Jiménez
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Raul Chong
 
Updates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDSUpdates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDS
ShapeBlue
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Sam Palani
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloudera, Inc.
 
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Manoj Kumar
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Platform Deep Dive
Platform Deep DivePlatform Deep Dive
Platform Deep Dive
Conrad23
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
DataStax
 

Similar to Vue d'ensemble Dremio (20)

Data Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWS
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Updates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDSUpdates to Apache CloudStack and LINBIT SDS
Updates to Apache CloudStack and LINBIT SDS
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
Cloud Computing – Opportunities, Definitions, Options, and Risks (Part-1)
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Platform Deep Dive
Platform Deep DivePlatform Deep Dive
Platform Deep Dive
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 

More from Modern Data Stack France

Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
Modern Data Stack France
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017
Modern Data Stack France
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Modern Data Stack France
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
Modern Data Stack France
 
Hug janvier 2016 -EDF
Hug   janvier 2016 -EDFHug   janvier 2016 -EDF
Hug janvier 2016 -EDF
Modern Data Stack France
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlus
Modern Data Stack France
 
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr  SPARK & RIAK -20160114_hug_franceHugfr  SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
Modern Data Stack France
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
Modern Data Stack France
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Modern Data Stack France
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Modern Data Stack France
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Modern Data Stack France
 
Spark dataframe
Spark dataframeSpark dataframe
Spark dataframe
Modern Data Stack France
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandation
Modern Data Stack France
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Modern Data Stack France
 
Spark meetup at viadeo
Spark meetup at viadeoSpark meetup at viadeo
Spark meetup at viadeo
Modern Data Stack France
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Modern Data Stack France
 
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REXHadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Modern Data Stack France
 
The Cascading (big) data application framework
The Cascading (big) data application frameworkThe Cascading (big) data application framework
The Cascading (big) data application framework
Modern Data Stack France
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Modern Data Stack France
 

More from Modern Data Stack France (20)

Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Hug janvier 2016 -EDF
Hug   janvier 2016 -EDFHug   janvier 2016 -EDF
Hug janvier 2016 -EDF
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlus
 
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr  SPARK & RIAK -20160114_hug_franceHugfr  SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
 
Spark dataframe
Spark dataframeSpark dataframe
Spark dataframe
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandation
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
 
Spark meetup at viadeo
Spark meetup at viadeoSpark meetup at viadeo
Spark meetup at viadeo
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
 
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REXHadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
 
The Cascading (big) data application framework
The Cascading (big) data application frameworkThe Cascading (big) data application framework
The Cascading (big) data application framework
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
 

Recently uploaded

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

Vue d'ensemble Dremio

  • 1. Confidential - Do Not Share or Distribute Dremio The Easy and Open Lakehouse Platform 1
  • 2. Confidential - Do Not Share or Distribute The Easy and Open Data Lakehouse Platform – Data warehouse performance directly on the lake – Query acceleration to eliminate copies and BI extracts – Semantic layer to enable governed self-service – Database connectors to enable queries on other sources Enterprise Adoption – 1000s of companies across all industries – 5 of the Fortune 10 Open Source & Community – Apache Arrow (60M+ downloads/m), Apache Iceberg, Nessie – Creator and host of Subsurface LIVE conference About Dremio
  • 3. 3 Confidential - Do Not Share or Distribute SQL Data Science Dashboards Apps Companies Want to Democratize Data… But How? ▪ Everyone wants access ▪ Data volumes are exploding ▪ Security risks ▪ Compliance requirements ▪ Limited resources Application Databases | IoT | Web | Logs Continuous New Data ADLS RDBMS S3 GCS Cloud Object Storage On-Prem
  • 4. 4 Confidential - Do Not Share or Distribute SQL Data Science Dashboards Apps Data Warehouses: Expensive, Proprietary, Complex Application Databases | IoT | Web | Logs Continuous New Data ✗ Skyrocketing costs ✗ Vendor lock-in ✗ Exploding backlog ✗ Can’t explore data ✗ No self-service ADLS RDBMS S3 GCS Cloud Object Storage On-Prem
  • 5. 5 Confidential - Do Not Share or Distribute SQL Data Science Dashboards Apps Dremio Data Lakehouse: Easy, Open, 1/10th the Cost Application Databases | IoT | Web | Logs Continuous New Data ⇅ ODBC | JDBC | REST | Arrow Flight ⇅ ⇅ Parallelism | Caching | Optimized Push-Downs ⇅ ✓ Sub-second performance ✓ Eliminate Data Silos ✓ Improve Data Discovery and Access ✓ No Data Movement Required ✓ No Copies ✓ Inexpensive ✓ No lock-in ADLS RDBMS S3 GCS Cloud Object Storage On-Prem
  • 6. 6 Confidential - Do Not Share or Distribute Raw zone Physical datasets Semantic zone Virtual datasets Data Engineers BI Users SQL Data Scientists ⇅ ODBC | JDBC | REST | Arrow Flight ⇅ ADLS S3 or or Acceleration (Data Reflections) Data Analysts and Engineers IT-Governed Self-Service Semantic Layer Standardized, User-Defined Abstraction Layer Enabling Virtual Data Sets, with an Easy-to-Use UI Data Analysts ✓ Consistent business logic & KPIs ✓ No more waiting for IT ✓ Use visualization tool(s) of choice Data Engineers & Architects ✓ Centralize data security & governance ✓ No more reactive, tedious work ✓ Easy collaboration with data analysts
  • 7. 7 Confidential - Do Not Share or Distribute SQL Data Science Dashboards Apps A Realistic Example: DW Offload Application Databases | IoT | Web | Logs Continuous New Data ✗ Maxed capacity ✗ End-of-life support ✗ Complex ETL processes ✗ Legacy query engines performance RDBMS
  • 8. 8 Confidential - Do Not Share or Distribute SQL Data Science Dashboards Apps A Realistic Example: DW Offload Application Databases | IoT | Web | Logs Continuous New Data RDBMS ✓ Unified layer ✓ Combine DW and DL data ✓ Address DW capacity issues ✓ Smooth transition Third-Party Data Bloomberg, S&P, AWS Data Exchange… Semantic Model
  • 9. Fast Performance Apache Arrow-based columnar execution increases throughput and reduces cost Transparent Acceleration Reflections enable sub-second queries and eliminate copies and BI extracts Semantic Layer Data teams define and expose a logical data model for governed self-service Ingest & Transform Data DML and dbt integration help ingest data into the lakehouse and transform it as needed. Open Data Formats Apache Iceberg ensures no vendor lock-in and the flexibility to use any engine. Enterprise-Grade Security Role-based access control, native row/column-level policies and advanced integrations. Dremio at a glance
  • 10. Confidential - Do Not Share or Distribute Powering Analytics for Thousands of Companies 16
  • 11. Confidential - Do Not Share or Distribute Merci ! 10
  • 12. Confidential - Do Not Share or Distribute Open Source Roots: Apache Arrow Inside – Dremio seeded the market with its internal memory format – Arrow now downloaded over 60M times per month – Dremio is the only Arrow-based engine in the market 11 Apache Arrow was created by Dremio – Data is immediately read into Arrow – All operators use Arrow as input and output – Gandiva: LLVM-based vectorized execution on Apache Arrow Arrow-based vectorized execution
  • 13. Confidential - Do Not Share or Distribute Data Sources Data Lake Engine BI Users SQL Data Scientists Data Consumer Tools ⇅ ODBC | JDBC | REST | Arrow Flight ⇅ ⇅ Optimized Push-Downs ⇅ Coordinator Node Executor Nodes Orchestrated via Cloud, Kubernetes or YARN External Data Reflection Stores Data Reflection Stores Executor Nodes Executor Nodes Coordinator Node Coordinator Node DREMIO Dremio deployment architecture
  • 14. Confidential - Do Not Share or Distribute Query Acceleration: BI on Data Lakes Columnar Cloud Cache (C3) Data Reflections 13 – NVMe-level I/O performance on S3/ADLS/GCS – Eliminate S3/ADLS I/O costs (10-15% of cost per query) – Use existing NVMe/SSD on EC2 instances & Azure VMs – Transparent to analysts and engineers – Enable low-latency (including sub-second) BI queries – Eliminate cubes and BI extracts – Reduce infrastructure costs by up to 100x – Persisted on Data Lake as Parquet/Iceberg tables – Transparent to analysts (advanced query plan rewrites) NVMe NVMe NVMe NVMe Data Lake Columnar Cloud Cache (C3) Executor Executor Executor Executor ENGINE User-specific cubes, extracts, aggregations Domain-specific data marts User picks the best optimization DL/DW Dremio picks the best optimization DL TRADITIONAL DREMIO Reflections
  • 15. Confidential - Do Not Share or Distribute Multi-Engine Architecture XL M L Engine Routing Rules ● User ● Roles ● Query type ● Query cost ● Connection parameters ● Date & time ● ... Queues Engines Query 14 LOWER EC2 COSTS Auto-stop/start and right-sized engines eliminate the need to over-provision infrastructure. 60% NOISY NEIGHBOR CONCERNS Workloads are physically separated so one workload can’t impact the performance of another workload. 0 CONTROL OF RESOURCES Control resource allocation with policies such as query priority, max query cost, max queue time, max runtime, etc. 100%
  • 16. 15 Confidential - Do Not Share or Distribute The Dremio Advantage Open Data, No Lock-In ● Modern and Intuitive User Interface ● Unified View of Data (on-prem, hybrid and Cloud) ● Federated Queries Based on community-driven standards: ● Apache Parquet ● Apache Iceberg ● Apache Arrow Sub-Second Performance at 1/10th the Cost Self–Service Analytics ● Lightning-fast queries ● High concurrency ● No expensive data copies to manage ● No semantic layer ● No federated queries ● Cloud only ● Proprietary platform ● Must ingest data in order to query it ● Limited Apache Iceberg support ● Very expensive ● Data duplication ● No query acceleration ● Poor performance with open standards ● Designed for batch processing (ETL/data science) ● No semantic layer ● Experimental federated queries ● Cloud only ● Focused on Delta Lake, not Apache Iceberg ● No query acceleration, BI extracts/imports required for low latency ● Limited and expensive for data serving ● Proven cost reduction after replacement by Dremio