Tajo case study bay area hug 20131105

Gruter
GruterFounder, CEO at Gruter
A case study:
Tajo on Big Telco
 
Jeong-shik Jang
System Development  Deployment
Gruter Inc, Seoul, South Korea

©2013 Gruter. All rights reserved.
 
Mobile carriers in S. Korea

2
 
Test setup
Performance test on Hive / Impala / Tajo
H/W
CPU

24 cores (Xeon 2.5 GHz, HT)

Memory

64 GB

Disks

3TB x 6 (NLSAS 7200 RPM)

Network

10G

Size

1 master + 6 data nodes

Versions:
Hadoop

cdh4.3.0

Hive

0.10.0-cdh4.3.0

Impala

impalad_version_1.1.1_RELEASE

Tajo

0.2-SNAPSHOT

Data size: 1.7 TB (4.1B rows, Q1), 8 or less GB (results of Q1, rest of Qs)
3
 
Test setup: Queries
Q1: scan using about 20 text pattern matching filters
Q2: 7 unions with joins
Q3: join
Q4: group by and order by
Q5: 30 text pattern matching filters with OR conditions, group
by, having, and order by

4
 
Results: Q1 – filter scan

• 

• 
1445.69
1400

NB:
* Tajo showed enhanced performance due to
dynamic task scheduling

1200
1000
800

895.96
789.09

Impala

600

Tajo
processing time (sec.)

400
200
0
5
 

Hive

Q1: scan using about 20 text pattern matching filters
1 of 11

Recommended

Performance evaluation of apache tajo by
Performance evaluation of apache tajoPerformance evaluation of apache tajo
Performance evaluation of apache tajoJihoon Son
2K views20 slides
Introduction to Apache Tajo: Data Warehouse for Big Data by
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataJihoon Son
4.4K views46 slides
EncExec: Secure In-Cache Execution by
EncExec: Secure In-Cache ExecutionEncExec: Secure In-Cache Execution
EncExec: Secure In-Cache ExecutionYue Chen
255 views35 slides
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has... by
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...PingCAP
120 views17 slides
Introduction to Apache Tajo: Future of Data Warehouse by
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseJihoon Son
2K views50 slides
Query optimization in Apache Tajo by
Query optimization in Apache TajoQuery optimization in Apache Tajo
Query optimization in Apache TajoJihoon Son
3.1K views44 slides

More Related Content

What's hot

Update on OpenTSDB and AsyncHBase by
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase HBaseCon
803 views32 slides
HBaseCon 2013: OpenTSDB at Box by
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxCloudera, Inc.
8.8K views35 slides
Toku DB by Aswin by
Toku DB by AswinToku DB by Aswin
Toku DB by AswinAgate Studio
1.1K views32 slides
OpenTSDB for monitoring @ Criteo by
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoNathaniel Braun
1.2K views81 slides
OpenTSDB 2.0 by
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
18.3K views41 slides
opentsdb in a real enviroment by
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviromentChen Robert
11.6K views8 slides

What's hot(18)

Update on OpenTSDB and AsyncHBase by HBaseCon
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon803 views
HBaseCon 2013: OpenTSDB at Box by Cloudera, Inc.
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at Box
Cloudera, Inc.8.8K views
OpenTSDB for monitoring @ Criteo by Nathaniel Braun
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
Nathaniel Braun1.2K views
OpenTSDB 2.0 by HBaseCon
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
HBaseCon18.3K views
opentsdb in a real enviroment by Chen Robert
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviroment
Chen Robert11.6K views
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr... by Ontico
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Ontico1.4K views
openTSDB - Metrics for a distributed world by Oliver Hankeln
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed world
Oliver Hankeln9.7K views
Time Series Data in a Time Series World by MapR Technologies
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
MapR Technologies4.1K views
PostgreSQL performance archaeology by Tomas Vondra
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeology
Tomas Vondra4.4K views
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ... by DataStax
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
DataStax728 views
Time Series Processing with Solr and Spark by Josef Adersberger
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
Josef Adersberger528 views

Viewers also liked

Evolución del mercado de viviendas usadas by
Evolución del mercado de viviendas usadasEvolución del mercado de viviendas usadas
Evolución del mercado de viviendas usadasCRISTIÁN E. GUTIERREZ GEBAUER
231 views2 slides
3 run ka sauda (Hindi Comic - Freelance Talents) by
3 run ka sauda (Hindi Comic - Freelance Talents)3 run ka sauda (Hindi Comic - Freelance Talents)
3 run ka sauda (Hindi Comic - Freelance Talents)Mohit Sharma
343 views17 slides
El político soy yo by
El político soy yoEl político soy yo
El político soy yoChristian Monzón
2.9K views1 slide
Project #1 - Water Pumping Station - Switchboard Upgrade by
Project #1 - Water Pumping Station - Switchboard UpgradeProject #1 - Water Pumping Station - Switchboard Upgrade
Project #1 - Water Pumping Station - Switchboard UpgradeDavid List
946 views14 slides
Project #3 - A Class Treatment Plant - AMIAD Micro-Strainers by
Project #3 - A Class Treatment Plant - AMIAD Micro-StrainersProject #3 - A Class Treatment Plant - AMIAD Micro-Strainers
Project #3 - A Class Treatment Plant - AMIAD Micro-StrainersDavid List
549 views23 slides
EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ... by
EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...
EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...Yogacharya AB Bhavanani
519 views10 slides

Viewers also liked(18)

3 run ka sauda (Hindi Comic - Freelance Talents) by Mohit Sharma
3 run ka sauda (Hindi Comic - Freelance Talents)3 run ka sauda (Hindi Comic - Freelance Talents)
3 run ka sauda (Hindi Comic - Freelance Talents)
Mohit Sharma343 views
Project #1 - Water Pumping Station - Switchboard Upgrade by David List
Project #1 - Water Pumping Station - Switchboard UpgradeProject #1 - Water Pumping Station - Switchboard Upgrade
Project #1 - Water Pumping Station - Switchboard Upgrade
David List946 views
Project #3 - A Class Treatment Plant - AMIAD Micro-Strainers by David List
Project #3 - A Class Treatment Plant - AMIAD Micro-StrainersProject #3 - A Class Treatment Plant - AMIAD Micro-Strainers
Project #3 - A Class Treatment Plant - AMIAD Micro-Strainers
David List549 views
EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ... by Yogacharya AB Bhavanani
EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...
EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...
Angiosperms 2016 by Jessi Dildy
Angiosperms 2016Angiosperms 2016
Angiosperms 2016
Jessi Dildy4.7K views
2016 bpa 검출시험 관련 환경부, 이마트 논평 반박자료 by 여성환경연대
2016 bpa 검출시험 관련 환경부, 이마트 논평 반박자료2016 bpa 검출시험 관련 환경부, 이마트 논평 반박자료
2016 bpa 검출시험 관련 환경부, 이마트 논평 반박자료
여성환경연대1.5K views
함수형사고 3장 양도하라 by Sunggon Song
함수형사고 3장 양도하라함수형사고 3장 양도하라
함수형사고 3장 양도하라
Sunggon Song355 views
스타트업사례로 본 로그 데이터분석 : Tajo on AWS by Gruter
스타트업사례로 본 로그 데이터분석 : Tajo on AWS스타트업사례로 본 로그 데이터분석 : Tajo on AWS
스타트업사례로 본 로그 데이터분석 : Tajo on AWS
Gruter3K views
한국 금융권을 위한 aws cloud 도입 제언 :: 정우진 :: AWS Finance Seminar by Amazon Web Services Korea
한국 금융권을 위한 aws cloud 도입 제언 :: 정우진 :: AWS Finance Seminar한국 금융권을 위한 aws cloud 도입 제언 :: 정우진 :: AWS Finance Seminar
한국 금융권을 위한 aws cloud 도입 제언 :: 정우진 :: AWS Finance Seminar
DDD Start! - 2장 아키텍처 개요 by Minchul Jung
DDD Start! - 2장 아키텍처 개요DDD Start! - 2장 아키텍처 개요
DDD Start! - 2장 아키텍처 개요
Minchul Jung1K views
실천하는 에코페미니스트들의 플랫폼 (2015) by 여성환경연대
실천하는 에코페미니스트들의 플랫폼 (2015)실천하는 에코페미니스트들의 플랫폼 (2015)
실천하는 에코페미니스트들의 플랫폼 (2015)
여성환경연대2.8K views
[Td 2015]너에게만 나는 반응해 반응형 응용프로그램(이규원) by Sang Don Kim
[Td 2015]너에게만 나는 반응해 반응형 응용프로그램(이규원)[Td 2015]너에게만 나는 반응해 반응형 응용프로그램(이규원)
[Td 2015]너에게만 나는 반응해 반응형 응용프로그램(이규원)
Sang Don Kim2.8K views

Similar to Tajo case study bay area hug 20131105

Using Derivation-Free Optimization in the Hadoop Cluster with Terasort by
Using Derivation-Free Optimization in the Hadoop Cluster  with TerasortUsing Derivation-Free Optimization in the Hadoop Cluster  with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster with TerasortAnhanguera Educacional S/A
508 views28 slides
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong by
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongCeph Community
99 views37 slides
The state of SQL-on-Hadoop in the Cloud by
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudNicolas Poggi
818 views35 slides
Hive, Presto, and Spark on TPC-DS benchmark by
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
9.6K views19 slides
Tajo_Meetup_20141120 by
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
3K views31 slides
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort by
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortUsing Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortAnhanguera Educacional S/A
873 views28 slides

Similar to Tajo case study bay area hug 20131105(20)

Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong by Ceph Community
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Ceph Community 99 views
The state of SQL-on-Hadoop in the Cloud by Nicolas Poggi
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
Nicolas Poggi818 views
Hive, Presto, and Spark on TPC-DS benchmark by Dongwon Kim
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
Dongwon Kim9.6K views
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort by Anhanguera Educacional S/A
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortUsing Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Strata + Hadoop 2015 Slides by Jun Liu
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
Jun Liu258 views
A TPC Benchmark of Hive LLAP and Comparison with Presto by Yu Liu
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu815 views
BDSE 2015 Evaluation of Big Data Platforms with HiBench by t_ivanov
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
t_ivanov1.7K views
PostgreSQL 9.6 Performance-Scalability Improvements by PGConf APAC
PostgreSQL 9.6 Performance-Scalability ImprovementsPostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability Improvements
PGConf APAC2.6K views
WBDB 2015 Performance Evaluation of Spark SQL using BigBench by t_ivanov
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
t_ivanov1.4K views
Resume_CQ_Edward by caiqi wang
Resume_CQ_EdwardResume_CQ_Edward
Resume_CQ_Edward
caiqi wang467 views
MLConf 2016 SigOpt Talk by Scott Clark by SigOpt
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt1.3K views
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016 by MLconf
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf1.7K views
Emerging technologies /frameworks in Big Data by Rahul Jain
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
Rahul Jain2.7K views
Terark Product and Technology by Xinyuan Fu
Terark Product and TechnologyTerark Product and Technology
Terark Product and Technology
Xinyuan Fu351 views
Profiling And Optimization Of Software Base Network Analysis Applications by Hargyo T. Nugroho
Profiling And Optimization Of Software Base Network Analysis ApplicationsProfiling And Optimization Of Software Base Network Analysis Applications
Profiling And Optimization Of Software Base Network Analysis Applications
Hargyo T. Nugroho550 views

More from Gruter

MelOn 빅데이터 플랫폼과 Tajo 이야기 by
MelOn 빅데이터 플랫폼과 Tajo 이야기MelOn 빅데이터 플랫폼과 Tajo 이야기
MelOn 빅데이터 플랫폼과 Tajo 이야기Gruter
3.2K views13 slides
Introduction to Apache Tajo: Future of Data Warehouse by
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseGruter
1.4K views50 slides
Expanding Your Data Warehouse with Tajo by
Expanding Your Data Warehouse with TajoExpanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoGruter
1.5K views66 slides
Introduction to Apache Tajo: Data Warehouse for Big Data by
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataGruter
4.8K views46 slides
Introduction to Apache Tajo by
Introduction to Apache TajoIntroduction to Apache Tajo
Introduction to Apache TajoGruter
3.5K views31 slides
What's New Tajo 0.10 and Its Beyond by
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondGruter
1.7K views30 slides

More from Gruter(20)

MelOn 빅데이터 플랫폼과 Tajo 이야기 by Gruter
MelOn 빅데이터 플랫폼과 Tajo 이야기MelOn 빅데이터 플랫폼과 Tajo 이야기
MelOn 빅데이터 플랫폼과 Tajo 이야기
Gruter3.2K views
Introduction to Apache Tajo: Future of Data Warehouse by Gruter
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
Gruter1.4K views
Expanding Your Data Warehouse with Tajo by Gruter
Expanding Your Data Warehouse with TajoExpanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with Tajo
Gruter1.5K views
Introduction to Apache Tajo: Data Warehouse for Big Data by Gruter
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
Gruter4.8K views
Introduction to Apache Tajo by Gruter
Introduction to Apache TajoIntroduction to Apache Tajo
Introduction to Apache Tajo
Gruter3.5K views
What's New Tajo 0.10 and Its Beyond by Gruter
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
Gruter1.7K views
Big data analysis with R and Apache Tajo (in Korean) by Gruter
Big data analysis with R and Apache Tajo (in Korean)Big data analysis with R and Apache Tajo (in Korean)
Big data analysis with R and Apache Tajo (in Korean)
Gruter6.2K views
Efficient In­‐situ Processing of Various Storage Types on Apache Tajo by Gruter
Efficient In­‐situ Processing of Various Storage Types on Apache TajoEfficient In­‐situ Processing of Various Storage Types on Apache Tajo
Efficient In­‐situ Processing of Various Storage Types on Apache Tajo
Gruter1.6K views
Tajo TPC-H Benchmark Test on AWS by Gruter
Tajo TPC-H Benchmark Test on AWSTajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWS
Gruter1.8K views
Data analysis with Tajo by Gruter
Data analysis with TajoData analysis with Tajo
Data analysis with Tajo
Gruter4.5K views
Gruter TECHDAY 2014 Realtime Processing in Telco by Gruter
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter3.6K views
Gruter TECHDAY 2014 MelOn BigData by Gruter
Gruter TECHDAY 2014 MelOn BigDataGruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigData
Gruter6.9K views
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean) by Gruter
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter3.8K views
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean) by Gruter
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter3.4K views
Gruter_TECHDAY_2014_01_SearchEngine (in Korean) by Gruter
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter3.5K views
Apache Tajo - BWC 2014 by Gruter
Apache Tajo - BWC 2014Apache Tajo - BWC 2014
Apache Tajo - BWC 2014
Gruter2.9K views
Elastic Search Performance Optimization - Deview 2014 by Gruter
Elastic Search Performance Optimization - Deview 2014Elastic Search Performance Optimization - Deview 2014
Elastic Search Performance Optimization - Deview 2014
Gruter5.1K views
Hadoop security DeView 2014 by Gruter
Hadoop security DeView 2014Hadoop security DeView 2014
Hadoop security DeView 2014
Gruter2.9K views
Vectorized processing in_a_nutshell_DeView2014 by Gruter
Vectorized processing in_a_nutshell_DeView2014Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014
Gruter2K views
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop by Gruter
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Gruter12K views

Recently uploaded

Unit 1_Lecture 2_Physical Design of IoT.pdf by
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdfStephenTec
12 views36 slides
Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 views1 slide
PRODUCT LISTING.pptx by
PRODUCT LISTING.pptxPRODUCT LISTING.pptx
PRODUCT LISTING.pptxangelicacueva6
14 views1 slide
Uni Systems for Power Platform.pptx by
Uni Systems for Power Platform.pptxUni Systems for Power Platform.pptx
Uni Systems for Power Platform.pptxUni Systems S.M.S.A.
56 views21 slides
PRODUCT PRESENTATION.pptx by
PRODUCT PRESENTATION.pptxPRODUCT PRESENTATION.pptx
PRODUCT PRESENTATION.pptxangelicacueva6
15 views1 slide
Design Driven Network Assurance by
Design Driven Network AssuranceDesign Driven Network Assurance
Design Driven Network AssuranceNetwork Automation Forum
15 views42 slides

Recently uploaded(20)

Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 views
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
6g - REPORT.pdf by Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software280 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10300 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi132 views

Tajo case study bay area hug 20131105

  • 1. A case study: Tajo on Big Telco
  • 2.   Jeong-shik Jang System Development Deployment Gruter Inc, Seoul, South Korea ©2013 Gruter. All rights reserved.
  • 3.  
  • 4. Mobile carriers in S. Korea 2
  • 5.  
  • 6. Test setup Performance test on Hive / Impala / Tajo H/W CPU 24 cores (Xeon 2.5 GHz, HT) Memory 64 GB Disks 3TB x 6 (NLSAS 7200 RPM) Network 10G Size 1 master + 6 data nodes Versions: Hadoop cdh4.3.0 Hive 0.10.0-cdh4.3.0 Impala impalad_version_1.1.1_RELEASE Tajo 0.2-SNAPSHOT Data size: 1.7 TB (4.1B rows, Q1), 8 or less GB (results of Q1, rest of Qs) 3
  • 7.  
  • 8. Test setup: Queries Q1: scan using about 20 text pattern matching filters Q2: 7 unions with joins Q3: join Q4: group by and order by Q5: 30 text pattern matching filters with OR conditions, group by, having, and order by 4
  • 9.  
  • 10. Results: Q1 – filter scan •  •  1445.69 1400 NB: * Tajo showed enhanced performance due to dynamic task scheduling 1200 1000 800 895.96 789.09 Impala 600 Tajo processing time (sec.) 400 200 0 5
  • 11.   Hive Q1: scan using about 20 text pattern matching filters
  • 12. Results: Q2 – unions, joins •  •  70 63.64 NB: 60 *Tajo materializing all query results to HDFS , as is the main goal *unions are processed in sequence in Tajo n ow (parallel processing is coming soon) 50 38.64 40 Impala 30 Tajo processing time (sec.) 20 10 0 6
  • 14. Results: Q3 – join •  •  101.45 NB: 100 *Tajo has an optimal selection/projection push down 80 Hive 60 Impala 40 36.81 20 0 7
  • 16. Results: Q4 – group by and sort •  •  25 24.7 20 15 Hive Impala 10 Tajo processing time (sec.) 5 0.45 0 8
  • 17.   Q4: group by and order by 0.65
  • 18. Results: Q5 – filters, group by, having and sort •  •  128.78 120 100 80 Hive 60 Impala Tajo 40 20 0 9
  • 19.   processing time (sec.) 17.03 6.03 Q6: Q5: 30 Text pattern matching filters with OR conditions, group by, having, and order by resulting in smaller set of output
  • 20. Results: Wrap up The project is underway; more findings expected in the future Performance enhancement thanks to dynamic task scheduling : some results showed better performan ce than Impala, despite Tajo materializing every qu ery result to HDFS, the project still being in its earl y stages, and Tajo still being an early build. 10
  • 21.  
  • 22. GRUTER: YOUR PARTNER IN THE BIG DATA REVOLUTION Phone Fax +82-70-8129-2950 +82-70-8129-2952 E-mail Web contact@gruter.com www.gruter.com Gruter, Inc. 5F Sehwa Office Building 889-70 Daechi-dong, Gangnam-gu, Seoul, South Korea 135-83 9 Jeong-shik Jang: jsjang@gruter.com ©2013 Gruter. All rights reserved.
  • 23.