SlideShare a Scribd company logo
1 of 40
Download to read offline
Our Presto use case
and
performance test
Hironori Ogibayashi
Shin Matsuura
About us
● Hironori Ogibayashi(@angostura11)
● Shin Matsuura
○ IT Infrastructure team in Japanese
telecommunications carrier
○ Mainly working on middleware - test,
installation, deployment.
Todays Topic
● Presto use case
○ Deployment
○ Use case
○ Challenges
○ Future work
● Performance comparison between
Hive+Tez and Presto
Presto use case
Log Collection Flow
Fluentd
Aggregator
Hadoop Cluster Application
WebHDFS
・1500 Fluentd instances
・25,000 msg / sec
・400GB / day
・150 types of log
Log Usage
● Systems Infrastructure team
○ Checking trends in server performance
○ Performance analysis of Oracle
Database
● Application development team
○ Improving system and business
operations.
Application for Oracle DB Performance Analysis
- Check existing/potential problems of
Oracle database, for certain system,
certain period.
- Utilize logs stored in HDFS. Queries
were executed on Hive.
- But, it took more than one hour to
get the result...
- (So, we migrated to Presto.)
Why Presto?
● Frequent use of Interactive / ad-hoc
queries.
● Of cource, faster is better.
Hadoop Slave
Presto Deployment
Hadoop Slave
DataNode
TaskTracker
Presto Worker
Presto
Coordinator
Hive Metastore
Application/Client
・・・
● A decicated physical machine as a
Coordinator.
● Workers run on each Hadoop slaves.
● Logs in HDFS are periodically
converted to RCfiles.
● Presto versions
○ 0.66⇒0.73⇒0.75⇒0.82
Deployment Effect - Elapsed time of a single query
230sec
7sec
- Elapsed time of one of
the queries issued by the
application.
- Query was run on CDH4
(MRv1) cluster.
Deployment and Operation
● Deployment
○ Easy;Just extract binaries in each server and modify
configuration file.
○ Automated by Ansible + yum.
● What we use in operation
○ Query history
■ Coordinator Web UI
○ Logs
■ /var/presto/data/logs/{server.log,launcher.log}
○ Metrics
■ presto-metrics(https://github.com/xerial/presto-
metrics)⇒Fluentd⇒Elasticsearch + Kibana
○ sys schema
Challenges
● Worker crash / hang.
○ OutOfMemory. In case of hanging, we resolve to “kill -9”.
○ We Modified the memory parameter: task.shard.max-
threads×task.max-memory < -Xmx
● At first, we set node-scheduler.include-coordinator=true.
In which case, Coordinator crashed due to heavy query.
● SQL difference from HiveQL
○ At first our Application used both Hive and Presto because we used
Presto experimentally.Hence the Application had to support both
HiveQL and Presto(ANSI SQL).
○ Now, the application no longer use Hive.
Future work
● Improve Coodinator’s availability.
● Security
○ Now, all queries are executed as Presto’s daemon user.
● Resource isolation between Presto and Hadoop daemons.
Presto VS Hive+Tez
Contents
From a Performance perspective
Presto VS Hive+Tez
(not tuning any parameteres)
Conclusion
Presto VS Hive+Tez
Win Lose
How Fast??
Presto VS Hive+Tez
2.0~136 times
more details
Testing environment Configurations 2p12c
64GB Mem
36TB Disk
NN
DN DN DN
Hadoop(HDP2.1)
Presto(0.82)
Coodinator
Worker Worker Worker
Master : 3nodes
Slave : 3nodes
NN
Metastore
Sample data
300GB
csv file
50 columns
1.1B records
Performance measurement perspectives
• Query patterns
• Data format patterns
• Repetitive Querying
Query patterns
Queries
Query1: select count(*) from TestTBL
Query2: select * from TestTBL where col1 = ‘XXX’
Query3: select * from TestTBL where col1 = ‘XXX’ and col2 = ‘YYY’
Query4: select col1, count(*) from TestTBL group by col1
Query5: select col1, count(*) from TestTBL where col2 = ‘YYY’ group by col1
data format :Txt
Results: Query patterns
data format :Txt
Results: Query patterns
100x faster
Presto was faster in processing speed than
Hive+Tez in all queries.
Data format patterns
Data formats
• Text File (Textfile)
• Record Columnar File (RCfile)
• Optimized Row Columnar File (ORCfile)
Results: Data format patterns
※Query: Query2
Results: Data format patterns
※Query: Query2
Presto was faster in processing speed
than Hive+Tez in all data formats.
Repetitive Querying
Change in processing time with repetitions(Presto)
※Query: Query2
※Data format: Txt
Change in processing time with repetitions (Presto)
※Query: Query2
※Data format: Txt
Became faster After the second time.
Cache ???
2.5x faster
Change in processing time with repetitions (Hive+Tez)
※Query: Query2
※Data format: Txt
Change in processing time with repetitions (Hive+Tez)
※Query: Query2
※Data format: Txt
No real change in processing time
+α
Engine:Presto
Query × Data format
Engine:Presto
Query × Data format
Is using RCfile the most stable and fastest
way ??
Summary
Result
● Presto was faster than Hive+Tez in all queries.
● Presto was faster than Hive+Tez in all data formats.
● With repetitive Querying, presto became faster.
● By Using RCfile, Presto was the most stable and fastest.
Next
● Benchmark from node scaling and data volumn
perspectives.
● Benchmark while using compression functions of
ORCfile.
● Benchmark with HDP2.2.
Appendix
ほぼすべての条件で
2回目以降高速

More Related Content

What's hot

Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small StartHiroshi Toyama
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Martin Traverso
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookTreasure Data, Inc.
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015N Masahiro
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case Kai Sasaki
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178Kai Sasaki
 
Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Wojciech Biela
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...viirya
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseMatt Fuller
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FutureDataWorks Summit
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsSATOSHI TAGOMORI
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiDatabricks
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
 

What's hot (20)

Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small Start
 
Presto
PrestoPresto
Presto
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
Presto
PrestoPresto
Presto
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178
 
Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the Enterprise
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench Tools
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 

Viewers also liked

Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetupstevemcpherson
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
A Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on TezA Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on TezGw Liu
 
SQL on Hadoop 比較検証 【2014月11日における検証レポート】
SQL on Hadoop 比較検証 【2014月11日における検証レポート】SQL on Hadoop 比較検証 【2014月11日における検証レポート】
SQL on Hadoop 比較検証 【2014月11日における検証レポート】NTT DATA OSS Professional Services
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Sadayuki Furuhashi
 

Viewers also liked (6)

Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetup
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
A Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on TezA Benchmark Test on Presto, Spark Sql and Hive on Tez
A Benchmark Test on Presto, Spark Sql and Hive on Tez
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
SQL on Hadoop 比較検証 【2014月11日における検証レポート】
SQL on Hadoop 比較検証 【2014月11日における検証レポート】SQL on Hadoop 比較検証 【2014月11日における検証レポート】
SQL on Hadoop 比較検証 【2014月11日における検証レポート】
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 

Similar to 20140120 presto meetup_en

Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioAlluxio, Inc.
 
ApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptxApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptxXinliShang1
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudZhenxiao Luo
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Job Queues Overview
Job Queues OverviewJob Queues Overview
Job Queues Overviewjoeyrobert
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...javier ramirez
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopPrasanna Rajaperumal
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.comRenzo Tomà
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...Hisham Mardam-Bey
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloudOVHcloud
 

Similar to 20140120 presto meetup_en (20)

Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
ApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptxApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptx
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS Cloud
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Job Queues Overview
Job Queues OverviewJob Queues Overview
Job Queues Overview
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoop
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

20140120 presto meetup_en

  • 1. Our Presto use case and performance test Hironori Ogibayashi Shin Matsuura
  • 2. About us ● Hironori Ogibayashi(@angostura11) ● Shin Matsuura ○ IT Infrastructure team in Japanese telecommunications carrier ○ Mainly working on middleware - test, installation, deployment.
  • 3. Todays Topic ● Presto use case ○ Deployment ○ Use case ○ Challenges ○ Future work ● Performance comparison between Hive+Tez and Presto
  • 5. Log Collection Flow Fluentd Aggregator Hadoop Cluster Application WebHDFS ・1500 Fluentd instances ・25,000 msg / sec ・400GB / day ・150 types of log
  • 6. Log Usage ● Systems Infrastructure team ○ Checking trends in server performance ○ Performance analysis of Oracle Database ● Application development team ○ Improving system and business operations.
  • 7. Application for Oracle DB Performance Analysis - Check existing/potential problems of Oracle database, for certain system, certain period. - Utilize logs stored in HDFS. Queries were executed on Hive. - But, it took more than one hour to get the result... - (So, we migrated to Presto.)
  • 8. Why Presto? ● Frequent use of Interactive / ad-hoc queries. ● Of cource, faster is better.
  • 9. Hadoop Slave Presto Deployment Hadoop Slave DataNode TaskTracker Presto Worker Presto Coordinator Hive Metastore Application/Client ・・・ ● A decicated physical machine as a Coordinator. ● Workers run on each Hadoop slaves. ● Logs in HDFS are periodically converted to RCfiles. ● Presto versions ○ 0.66⇒0.73⇒0.75⇒0.82
  • 10. Deployment Effect - Elapsed time of a single query 230sec 7sec - Elapsed time of one of the queries issued by the application. - Query was run on CDH4 (MRv1) cluster.
  • 11. Deployment and Operation ● Deployment ○ Easy;Just extract binaries in each server and modify configuration file. ○ Automated by Ansible + yum. ● What we use in operation ○ Query history ■ Coordinator Web UI ○ Logs ■ /var/presto/data/logs/{server.log,launcher.log} ○ Metrics ■ presto-metrics(https://github.com/xerial/presto- metrics)⇒Fluentd⇒Elasticsearch + Kibana ○ sys schema
  • 12. Challenges ● Worker crash / hang. ○ OutOfMemory. In case of hanging, we resolve to “kill -9”. ○ We Modified the memory parameter: task.shard.max- threads×task.max-memory < -Xmx ● At first, we set node-scheduler.include-coordinator=true. In which case, Coordinator crashed due to heavy query. ● SQL difference from HiveQL ○ At first our Application used both Hive and Presto because we used Presto experimentally.Hence the Application had to support both HiveQL and Presto(ANSI SQL). ○ Now, the application no longer use Hive.
  • 13. Future work ● Improve Coodinator’s availability. ● Security ○ Now, all queries are executed as Presto’s daemon user. ● Resource isolation between Presto and Hadoop daemons.
  • 15. Contents From a Performance perspective Presto VS Hive+Tez (not tuning any parameteres)
  • 17. How Fast?? Presto VS Hive+Tez 2.0~136 times
  • 19. Testing environment Configurations 2p12c 64GB Mem 36TB Disk NN DN DN DN Hadoop(HDP2.1) Presto(0.82) Coodinator Worker Worker Worker Master : 3nodes Slave : 3nodes NN Metastore
  • 20. Sample data 300GB csv file 50 columns 1.1B records
  • 21. Performance measurement perspectives • Query patterns • Data format patterns • Repetitive Querying
  • 23. Queries Query1: select count(*) from TestTBL Query2: select * from TestTBL where col1 = ‘XXX’ Query3: select * from TestTBL where col1 = ‘XXX’ and col2 = ‘YYY’ Query4: select col1, count(*) from TestTBL group by col1 Query5: select col1, count(*) from TestTBL where col2 = ‘YYY’ group by col1
  • 24. data format :Txt Results: Query patterns
  • 25. data format :Txt Results: Query patterns 100x faster Presto was faster in processing speed than Hive+Tez in all queries.
  • 27. Data formats • Text File (Textfile) • Record Columnar File (RCfile) • Optimized Row Columnar File (ORCfile)
  • 28. Results: Data format patterns ※Query: Query2
  • 29. Results: Data format patterns ※Query: Query2 Presto was faster in processing speed than Hive+Tez in all data formats.
  • 31. Change in processing time with repetitions(Presto) ※Query: Query2 ※Data format: Txt
  • 32. Change in processing time with repetitions (Presto) ※Query: Query2 ※Data format: Txt Became faster After the second time. Cache ??? 2.5x faster
  • 33. Change in processing time with repetitions (Hive+Tez) ※Query: Query2 ※Data format: Txt
  • 34. Change in processing time with repetitions (Hive+Tez) ※Query: Query2 ※Data format: Txt No real change in processing time
  • 35.
  • 37. Engine:Presto Query × Data format Is using RCfile the most stable and fastest way ??
  • 38. Summary Result ● Presto was faster than Hive+Tez in all queries. ● Presto was faster than Hive+Tez in all data formats. ● With repetitive Querying, presto became faster. ● By Using RCfile, Presto was the most stable and fastest. Next ● Benchmark from node scaling and data volumn perspectives. ● Benchmark while using compression functions of ORCfile. ● Benchmark with HDP2.2.