Alan Gates, Hortonworks_Hadoop&SQL

417 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
417
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Speedup (y-axis) as a ratio to Hive 10 Text. Bigger is better.
  • Alan Gates, Hortonworks_Hadoop&SQL

    1. 1. Stinger Overview Page 1 •An initiative, not a project or product •Includes changes to Hive and a new project Tez •Two main goals –Improve Hive performance 100x over Hive 0.10 –Extend Hive SQL to include features needed for analytics •Hive will support: –BI tools connecting to Hadoop –Analysts performing ad-hoc, interactive queries –Still excellent at the large batch jobs it is used for today © 2013 Hortonworks
    2. 2. Stinger Mileposts Page 2 © 2013 Hortonworks Stinger Phase 3 •Buffer Cache •Cost Based Optimizer Stinger Phase 2 •YARN Resource Mgmnt •Hive on Apache Tez •Remove startup latency •Vectorized Operators Stinger Phase 1 •Reduce # of MR Jobs •SQL Analytics •ORCFile Format 1 2 Improve existing tools & preserve investments Enable Hive to support interactive workloads Released in Hive 0.11 Current Work Roadmap
    3. 3. © Hortonworks Inc. 2013 Performance Trajectory Page 3 1X 2X 12X 11X 21X 0X 5X 10X 15X 20X 25X Hive 10 Text Hive 10 RC Hive 11 RC Hive 11 ORC Hive 11 CP ORC, Tez… Query 27 Speedup 1X 14X 44X 57X 78X 0X 10X 20X 30X 40X 50X 60X 70X 80X 90X Hive 10 Text Hive 10 RC Hive 11 RC Hive 11 ORC Hive 11 CP ORC, Tez Query 82 Speedup

    ×