Tajo case study bay area hug 20131105
Upcoming SlideShare
Loading in...5
×
 

Tajo case study bay area hug 20131105

on

  • 2,226 views

A presentation note on Apache Tajo in Bay Aread HUG meetup at 2013/11/05 : A case study of Tajo on Big Telco

A presentation note on Apache Tajo in Bay Aread HUG meetup at 2013/11/05 : A case study of Tajo on Big Telco

Statistics

Views

Total Views
2,226
Views on SlideShare
1,931
Embed Views
295

Actions

Likes
2
Downloads
37
Comments
0

3 Embeds 295

http://www.gruter.com 292
https://twitter.com 2
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Tajo case study bay area hug 20131105 Tajo case study bay area hug 20131105 Presentation Transcript

  • A case study: Tajo on Big Telco
  •    Jeong-shik Jang System Development & Deployment Gruter Inc, Seoul, South Korea ©2013 Gruter. All rights reserved.
  •   
  • Mobile carriers in S. Korea 2
  •   
  • Test setup Performance test on Hive / Impala / Tajo H/W CPU 24 cores (Xeon 2.5 GHz, HT) Memory 64 GB Disks 3TB x 6 (NLSAS 7200 RPM) Network 10G Size 1 master + 6 data nodes Versions: Hadoop cdh4.3.0 Hive 0.10.0-cdh4.3.0 Impala impalad_version_1.1.1_RELEASE Tajo 0.2-SNAPSHOT Data size: 1.7 TB (4.1B rows, Q1), 8 or less GB (results of Q1, rest of Qs) 3
  •   
  • Test setup: Queries Q1: scan using about 20 text pattern matching filters Q2: 7 unions with joins Q3: join Q4: group by and order by Q5: 30 text pattern matching filters with OR conditions, group by, having, and order by 4
  •   
  • Results: Q1 – filter scan •  •  1445.69 1400 NB: * Tajo showed enhanced performance due to dynamic task scheduling 1200 1000 800 895.96 789.09 Impala 600 Tajo processing time (sec.) 400 200 0 5
  •    Hive Q1: scan using about 20 text pattern matching filters
  • Results: Q2 – unions, joins •  •  70 63.64 NB: 60 *Tajo materializing all query results to HDFS , as is the main goal *unions are processed in sequence in Tajo n ow (parallel processing is coming soon) 50 38.64 40 Impala 30 Tajo processing time (sec.) 20 10 0 6
  •    Hive 9.11 Q2: 7 unions with joins
  • Results: Q3 – join •  •  101.45 NB: 100 *Tajo has an optimal selection/projection push down 80 Hive 60 Impala 40 36.81 20 0 7
  •    Q3: join 31.92 Tajo processing time (sec.)
  • Results: Q4 – group by and sort •  •  25 24.7 20 15 Hive Impala 10 Tajo processing time (sec.) 5 0.45 0 8
  •    Q4: group by and order by 0.65
  • Results: Q5 – filters, group by, having and sort •  •  128.78 120 100 80 Hive 60 Impala Tajo 40 20 0 9
  •    processing time (sec.) 17.03 6.03 Q6: Q5: 30 Text pattern matching filters with OR conditions, group by, having, and order by resulting in smaller set of output
  • Results: Wrap up The project is underway; more findings expected in the future Performance enhancement thanks to dynamic task scheduling : some results showed better performan ce than Impala, despite Tajo materializing every qu ery result to HDFS, the project still being in its earl y stages, and Tajo still being an early build. 10
  •   
  • GRUTER: YOUR PARTNER IN THE BIG DATA REVOLUTION Phone Fax +82-70-8129-2950 +82-70-8129-2952 E-mail Web contact@gruter.com www.gruter.com Gruter, Inc. 5F Sehwa Office Building 889-70 Daechi-dong, Gangnam-gu, Seoul, South Korea 135-83 9 Jeong-shik Jang: jsjang@gruter.com ©2013 Gruter. All rights reserved.
  •