Processing High volume hive updates

3,017 views

Published on

Apache Hive provides a convenient SQL-based query language to data stored in HDFS. HDFS provides highly scaleable bandwidth to the data, but does not support arbitrary writes. One of Hortonworks` customers needs to store a high volume of customer data (> 1 TB/day) and that data contains a high percentage (15%) of record updates distributed across years. In many high-update use-cases, HBase would suffice, but the current lack of push down filters from Hive into HBase and HBase`s single level keys make it too expensive. Our solution is to use a custom record reader that stores the edit records as separate HDFS files and synthesizes the current set of records dynamically as the table is read. This provides an economical solution to their need that works within the framework provided by Hive. We believe this use case applies to many Hive users and plan to develop and open source a reusable solution.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,017
On SlideShare
0
From Embeds
0
Number of Embeds
141
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Processing High volume hive updates

  1. 1. High Volume Updates in HiveOwen O’Malleyowen@hortonworks.com@owen_omalleyJune 2012© Hortonworks Inc. 2012 Page 1
  2. 2. Who Am I? Page 2 © Hortonworks Inc. 2012
  3. 3. A Data Flood Page 3 © Hortonworks Inc. 2012
  4. 4. The Dataflow Page 4 © Hortonworks Inc. 2012
  5. 5. The Approach Page 5 © Hortonworks Inc. 2012
  6. 6. Why not Hbase? Page 6 © Hortonworks Inc. 2012
  7. 7. Limitations of a Single Key Page 7 © Hortonworks Inc. 2012
  8. 8. Hive Table Layout Page 8 © Hortonworks Inc. 2012
  9. 9. Design Page 9 © Hortonworks Inc. 2012
  10. 10. Repeatable Reads Page 10 © Hortonworks Inc. 2012
  11. 11. Stitching Buckets Together Page 11 © Hortonworks Inc. 2012
  12. 12. Limitations Page 12 © Hortonworks Inc. 2012
  13. 13. Additional Challenges from Hive Page 13 © Hortonworks Inc. 2012
  14. 14. Hive’s Output Committer Page 14 © Hortonworks Inc. 2012
  15. 15. Dynamic Partitions Page 15 © Hortonworks Inc. 2012
  16. 16. Conclusion Page 16 © Hortonworks Inc. 2012
  17. 17. Thank You!Questions & Answers Page 17 © Hortonworks Inc. 2012
  18. 18. Sessions will resume at 2:25pm Page 18

×