SlideShare a Scribd company logo
Presto for Applications
Presto Conference, Tel Aviv, April 2019
@DavidKrakov
Co-Founder & CTO
Who are we?
The platform for fast analytics on your cloud data lake
Founded
in 2017
Beta,
Pre-GA
$7.5M
Seed
Backed by
Shift is happening in how big data is used
Interactive
applications
Offline, Batch,
Exploratory
Data lake
Reporting
Ad-hoc queries
Batch processing
Business dashboards
Production applications
Real time decision
A task for Presto on S3?
Data Workload
Many sources
Complex schemas
Huge volumes
Diverse Queries
High Concurrency
Unpredictable load
User Expectations
Predictably Interactive
Previously: Presto Raptor
• Connector for low latency queries
• Store data copy on local SSDs
• Shared nothing (data is sharded)
• Columnar and sorted
• Use min/max to skip shards
Previously: Presto Raptor
Data modeling:
Sorting, Distribution
ETL & copy
management
Workload
optimization
Deployment
maintenance
➔ Long implementation cycles
➔ Expertise bottleneck
➔ Limited workloadsNeeds:
What is needed to shorten the cycle
No data modeling
No copy management
No workload bottlenecks
Simple Deployment
How Varada works
Data lake
Interactive response
SQL
Data view
Coordinator Worker
NVMe
SSD
Worker
Shared Everything: no bottlenecks
Worker Worker
Interconnected in Placement Group
NVMe
SSD
NVMe
SSD
NVMe
SSD
Coordinator Worker
NVMe
SSD
Worker
Shared Everything: no bottlenecks, no skew
Worker Worker
All
NVMe
SSDs
Worker
Worker
Worker
Worker
Shared Everything
Architecture
With NVMeF
Interconnected in Placement Group
NVMe
SSD
NVMe
SSD
NVMe
SSD
Coordinator
Materialized view: set the data with SQL
SSD
CREATE MATERIALIZED VIEW v
AS SELECT columns
FROM hive.orc_s3_table
WHERE condition
Define view
with SQL
Data Lake
Inline Indexing: no data modeling
SSD
Break all
dimensions to
nanoblocks
Inline index
across all
nanoblocks
All the dimensions are indexedDefine view
with SQL
Data Lake
Real Time Sync: no copy management
SSD
Auto synchronization from S3
Index & Data
synchronized to the
data lake on content
and on schema+
Track Data
Change
Data Lake
Ingredients of Index on Big Data
Random
I/O
Small
Blocks
Low
Latency
High
Bandwidth
Break data to small blocks
Up to few KBs of columnar
storage
Temp
31°
31°
32°
70°
Userid
ab32f27d
12eaaad4
ffff5ab1
NULL
Keyword
“four”
“score”
“and”
“seven”
Block on SSD
Block on SSD
Block on SSD
Block on SSD
Block on SSD
Block on SSD
Index across all dimensions
Every 100k rows of every column
Have an optimal index of their own
Temp
31°
31°
32°
70°
Index for the range
31° - 32°
Userid
ab32f27d
12eaaad4
ffff5ab1
NULL
Index for user id
High cardinality
Keyword
“four”
“score”
“and”
“seven”
Index for string
matching
Random walk the indexes in parallel
Temp
31°
31°
32°
70°
Index for the range
31° - 32°
Userid
ab32f27d
12eaaad4
ffff5ab1
NULL
Index for user id
High cardinality
Keyword
“four”
“score”
“and”
“seven”
Index for string
matching
SELECT WHERE temp > 40
Every query finds an index
Temp
31°
31°
32°
70°
Index for the range
31° - 32°
Userid
ab32f27d
12eaaad4
ffff5ab1
NULL
Index for user id
High cardinality
Keyword
“four”
“score”
“and”
“seven”
Index for string
matching
JOIN on a.UserId = b.UserID
Varada fast analytics platform for data lake
StorageCatalog
Metastore
AWS Cloud Data Lake
Data Applications
External Data SourcesVarada Platform
Relational
NoSQL
SQL engine
Presto
Auto Synchronization from S3 (Hive)
Connector
Materialized View and Index
Connectors
i3
Clusters
Thank you!

More Related Content

What's hot

Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
Databricks
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
Pramit Choudhary
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
Omid Vahdaty
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Flink Forward
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
Databricks
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
Ori Reshef
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
Amazon Web Services
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
Fabian Reinartz
 
Creating Continuously Up to Date Materialized Aggregates
Creating Continuously Up to Date Materialized AggregatesCreating Continuously Up to Date Materialized Aggregates
Creating Continuously Up to Date Materialized Aggregates
EDB
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Databricks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 

What's hot (20)

Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
 
Creating Continuously Up to Date Materialized Aggregates
Creating Continuously Up to Date Materialized AggregatesCreating Continuously Up to Date Materialized Aggregates
Creating Continuously Up to Date Materialized Aggregates
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep Learning
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 

Similar to Presto for apps deck varada prestoconf

Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
Amazon Web Services
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of Webops
Datadog
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
James Serra
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Amazon Web Services
 
re:Invent re:Cap - Big Data & IoT at Any Scale
re:Invent re:Cap - Big Data & IoT at Any Scalere:Invent re:Cap - Big Data & IoT at Any Scale
re:Invent re:Cap - Big Data & IoT at Any Scale
Adrian Hornsby
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
Jeff Smoley
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016
Łukasz Grala
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Trivadis
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Rakesh Jayaram
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
huguk
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
Clusterpoint
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
Amazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
Amazon Web Services
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
Amazon Web Services
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
Directi Group
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
Sergey Bushik
 

Similar to Presto for apps deck varada prestoconf (20)

Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of Webops
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
re:Invent re:Cap - Big Data & IoT at Any Scale
re:Invent re:Cap - Big Data & IoT at Any Scalere:Invent re:Cap - Big Data & IoT at Any Scale
re:Invent re:Cap - Big Data & IoT at Any Scale
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
 

Recently uploaded

Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
David Wilson
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
SynapseIndia
 
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptxMAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
janagijoythi
 
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
bellared2
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
shyamraj55
 
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
Priyanka Aash
 
(CISOPlatform Summit & SACON 2024) Workshop _ Most Dangerous Attack Technique...
(CISOPlatform Summit & SACON 2024) Workshop _ Most Dangerous Attack Technique...(CISOPlatform Summit & SACON 2024) Workshop _ Most Dangerous Attack Technique...
(CISOPlatform Summit & SACON 2024) Workshop _ Most Dangerous Attack Technique...
Priyanka Aash
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
313mohammedarshad
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
Ivanti
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
SubhamMandal40
 
Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptxAcumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
BrainSell Technologies
 
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
Priyanka Aash
 
Step-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From ScratchStep-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From Scratch
softsuave
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
ssuser1915fe1
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
Brian Pichman
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
Baishakhi Ray
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 

Recently uploaded (20)

Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
 
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptxMAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
 
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
 
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
 
(CISOPlatform Summit & SACON 2024) Workshop _ Most Dangerous Attack Technique...
(CISOPlatform Summit & SACON 2024) Workshop _ Most Dangerous Attack Technique...(CISOPlatform Summit & SACON 2024) Workshop _ Most Dangerous Attack Technique...
(CISOPlatform Summit & SACON 2024) Workshop _ Most Dangerous Attack Technique...
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
 
Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptxAcumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
 
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
 
Step-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From ScratchStep-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From Scratch
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 

Presto for apps deck varada prestoconf

  • 1. Presto for Applications Presto Conference, Tel Aviv, April 2019 @DavidKrakov Co-Founder & CTO
  • 2. Who are we? The platform for fast analytics on your cloud data lake Founded in 2017 Beta, Pre-GA $7.5M Seed Backed by
  • 3. Shift is happening in how big data is used Interactive applications Offline, Batch, Exploratory Data lake Reporting Ad-hoc queries Batch processing Business dashboards Production applications Real time decision
  • 4. A task for Presto on S3? Data Workload Many sources Complex schemas Huge volumes Diverse Queries High Concurrency Unpredictable load User Expectations Predictably Interactive
  • 5. Previously: Presto Raptor • Connector for low latency queries • Store data copy on local SSDs • Shared nothing (data is sharded) • Columnar and sorted • Use min/max to skip shards
  • 6. Previously: Presto Raptor Data modeling: Sorting, Distribution ETL & copy management Workload optimization Deployment maintenance ➔ Long implementation cycles ➔ Expertise bottleneck ➔ Limited workloadsNeeds:
  • 7. What is needed to shorten the cycle No data modeling No copy management No workload bottlenecks Simple Deployment
  • 8. How Varada works Data lake Interactive response SQL Data view
  • 9. Coordinator Worker NVMe SSD Worker Shared Everything: no bottlenecks Worker Worker Interconnected in Placement Group NVMe SSD NVMe SSD NVMe SSD
  • 10. Coordinator Worker NVMe SSD Worker Shared Everything: no bottlenecks, no skew Worker Worker All NVMe SSDs Worker Worker Worker Worker Shared Everything Architecture With NVMeF Interconnected in Placement Group NVMe SSD NVMe SSD NVMe SSD Coordinator
  • 11. Materialized view: set the data with SQL SSD CREATE MATERIALIZED VIEW v AS SELECT columns FROM hive.orc_s3_table WHERE condition Define view with SQL Data Lake
  • 12. Inline Indexing: no data modeling SSD Break all dimensions to nanoblocks Inline index across all nanoblocks All the dimensions are indexedDefine view with SQL Data Lake
  • 13. Real Time Sync: no copy management SSD Auto synchronization from S3 Index & Data synchronized to the data lake on content and on schema+ Track Data Change Data Lake
  • 14. Ingredients of Index on Big Data Random I/O Small Blocks Low Latency High Bandwidth
  • 15. Break data to small blocks Up to few KBs of columnar storage Temp 31° 31° 32° 70° Userid ab32f27d 12eaaad4 ffff5ab1 NULL Keyword “four” “score” “and” “seven” Block on SSD Block on SSD Block on SSD Block on SSD Block on SSD Block on SSD
  • 16. Index across all dimensions Every 100k rows of every column Have an optimal index of their own Temp 31° 31° 32° 70° Index for the range 31° - 32° Userid ab32f27d 12eaaad4 ffff5ab1 NULL Index for user id High cardinality Keyword “four” “score” “and” “seven” Index for string matching
  • 17. Random walk the indexes in parallel Temp 31° 31° 32° 70° Index for the range 31° - 32° Userid ab32f27d 12eaaad4 ffff5ab1 NULL Index for user id High cardinality Keyword “four” “score” “and” “seven” Index for string matching SELECT WHERE temp > 40
  • 18. Every query finds an index Temp 31° 31° 32° 70° Index for the range 31° - 32° Userid ab32f27d 12eaaad4 ffff5ab1 NULL Index for user id High cardinality Keyword “four” “score” “and” “seven” Index for string matching JOIN on a.UserId = b.UserID
  • 19. Varada fast analytics platform for data lake StorageCatalog Metastore AWS Cloud Data Lake Data Applications External Data SourcesVarada Platform Relational NoSQL SQL engine Presto Auto Synchronization from S3 (Hive) Connector Materialized View and Index Connectors i3 Clusters

Editor's Notes

  1. Taking on big data New approach to big data Index 1st, query 2nd / Indexing before/pre query Pre-ready indexing
  2. Applications or BI tools connected to Varada perform complex queries with sub-second response times—on any dimension and schema—saving months of data preparation and dramatically improving an analyst’s time to insight.