Hive acid-updates-summit-sjc-2014

A
© Hortonworks Inc. 2014
Adding ACID Updates to Hive
April 2014
Page 1
Owen O’Malley Alan Gates
owen@hortonworks.com gates@hortonworks.com
@owen_omalley @alanfgates
© Hortonworks Inc. 2014
Page 2
•Hive Only Updates Partitions
–Insert overwrite rewrites an entire partition
–Forces daily or even hourly partitions
•What Happens to Concurrent Readers?
–Ok for inserts, but overwrite causes races
–There is a zookeeper lock manager, but…
•No way to delete, update, or insert rows
–Makes adhoc work difficult
What’s Wrong?
© Hortonworks Inc. 2014
Page 3
•Hadoop and Hive have always…
–Worked without ACID
–Perceived as tradeoff for performance
•But, your data isn’t static
–It changes daily, hourly, or faster
–Ad hoc solutions require a lot of work
–Managing change makes the user’s life better
•Do or Do Not, There is NO Try
Why is ACID Critical?
© Hortonworks Inc. 2014
Page 4
•Updating a Dimension Table
–Changing a customer’s address
•Delete Old Records
–Remove records for compliance
•Update/Restate Large Fact Tables
–Fix problems after they are in the warehouse
•Streaming Data Ingest
–A continual stream of data coming in
–Typically from Flume or Storm
Use Cases
© Hortonworks Inc. 2014
Page 5
•HDFS Does Not Allow Arbitrary Writes
–Store changes as delta files
–Stitched together by client on read
•Writes get a Transaction ID
–Sequentially assigned by Metastore
•Reads get Committed Transactions
–Provides snapshot consistency
–No locks required
–Provide a snapshot of data from start of query
Design
© Hortonworks Inc. 2013
Stitching Buckets Together
Page 6
© Hortonworks Inc. 2014
Page 7
•Partition locations remain unchanged
–Still warehouse/$db/$tbl/$part
•Bucket Files Structured By Transactions
–Base files $part/base_$tid/bucket_*
–Delta files $part/delta_$tid_$tid/bucket_*
•Minor Compactions merge deltas
–Read delta_$tid1_$tid1 .. delta_$tid2_$tid2
–Written as delta_$tid1_$tid2
•Compaction doesn’t disturb readers
HDFS Layout
© Hortonworks Inc. 2014
Page 8
•Created new AcidInput/OutputFormat
–Unique key is transaction, bucket, row
•Reader returns most recent update
•Also Added Raw API for Compactor
–Provides previous events as well
•ORC implements new API
–Extends records with change metadata
–Add operation (d, u, i), transaction and key
Input and Output Formats
© Hortonworks Inc. 2014
Page 9
•Need to split buckets for MapReduce
–Need to split base and deltas the same way
–Use key ranges
–Use indexes
Distributing the Work
© Hortonworks Inc. 2014
Page 10
•Existing lock managers
–In memory - not durable
–ZooKeeper - requires additional components to
install, administer, etc.
•Locks need to be integrated with
transactions
–commit/rollback must atomically release locks
•We sort of have this database lying around
which has ACID characteristics (metastore)
•Transactions and locks stored in metastore
•Uses metastore DB to provide unique,
ascending ids for transactions and locks
Transaction Manager
© Hortonworks Inc. 2014
Page 11
•No explicit transactions in 0.13
–First implementation of INSERT, UPDATE,
DELETE will be auto-commit
–Will then add BEGIN, COMMIT, ROLLBACK
•Snapshot isolation
–Reader will see consistent data for the duration of
his/her query
–May extend to other isolation levels in the future
•Current transactions can be displayed
using new SHOW TRANSACTIONS
statement
Transaction Model
© Hortonworks Inc. 2014
Page 12
•Three types of locks
–shared
–semi-shared (can co-exist with shared, but not
other semi-shared)
–exclusive
•Operations require different locks
–SELECT, INSERT – shared
–UPDATE, DELETE – semi-shared
–DROP, INSERT OVERWRITE – exclusive
Locking Model
© Hortonworks Inc. 2014
Page 13
•Each transaction (or batch of
transactions in streaming ingest)
creates a new delta file
•Too many files = NameNode 
•Need a way to
–Collect many deltas into one delta – minor
compaction
–Rewrite base and delta to new base – major
compaction
Compactor
© Hortonworks Inc. 2014
Page 14
•Run when there are 10 or more deltas
(configurable)
•Results in base + 1 delta
Minor Compaction
/hive/warehouse/purchaselog/ds=201403311000/base_0028000
/hive/warehouse/purchaselog/ds=201403311000/delta_0028001_0028100
/hive/warehouse/purchaselog/ds=201403311000/delta_0028101_0028200
/hive/warehouse/purchaselog/ds=201403311000/delta_0028201_0028300
/hive/warehouse/purchaselog/ds=201403311000/delta_0028301_0028400
/hive/warehouse/purchaselog/ds=201403311000/delta_0028401_0028500
/hive/warehouse/purchaselog/ds=201403311000/base_0028000
/hive/warehouse/purchaselog/ds=201403311000/delta_0028001_0028500
© Hortonworks Inc. 2014
Page 15
•Run when deltas are 10% the size of
base (configurable)
•Results in new base
Major Compaction
/hive/warehouse/purchaselog/ds=201403311000/base_0028000
/hive/warehouse/purchaselog/ds=201403311000/delta_0028001_0028100
/hive/warehouse/purchaselog/ds=201403311000/delta_0028101_0028200
/hive/warehouse/purchaselog/ds=201403311000/delta_0028201_0028300
/hive/warehouse/purchaselog/ds=201403311000/delta_0028301_0028400
/hive/warehouse/purchaselog/ds=201403311000/delta_0028401_0028500
/hive/warehouse/purchaselog/ds=201403311000/base_0028500
© Hortonworks Inc. 2014
Page 16
•Metastore thrift server will schedule and
execute compactions
–No need for user to schedule
–User can initiate via new ALTER TABLE
COMPACT statement
•No locking required, compactions run at
same time as select, inserts
–Compactor aware of readers, does not remove old
files until readers have finished with them
•Current compactions can be viewed via
new SHOW COMPACTIONS statement
Compactor Continued
© Hortonworks Inc. 2014
Page 17
•Data is flowing in from generators in a stream
•Without this, you have to add it to Hive in
batches, often every hour
–Thus your users have to wait an hour before they can
see their data
•New interface in hive.hcatalog.streaming lets
applications write small batches of records and
commit them
–Users can now see data within a few seconds of it
arriving from the data generators
•Available for Apache Flume in HDP 2.1
–Working on Apache Storm integration
Application: Streaming Ingest
© Hortonworks Inc. 2014
Page 18
Streaming Ingest Illustrated
Flume
Agent
HDFS
© Hortonworks Inc. 2014
Page 19
Streaming Ingest Illustrated
Flume
Agent
HDFS
while (…)
write();
commit();
Commit can be
time based or size
based, up to writer
commit() flushes to
disk and sends
commit to metastore
© Hortonworks Inc. 2014
Page 20
Streaming Ingest Illustrated
Flume
Agent
HDFS
while (…)
write();
commit();
Next write() appends
to the same file
© Hortonworks Inc. 2014
Page 21
Streaming Ingest Illustrated
Flume
Agent
HDFS
while (…)
write();
commit();
Reader
Task
Reader uses txnid to
determine which records
to read
© Hortonworks Inc. 2014
Page 22
• Phase 1, Hive 0.13
–Transaction and new lock manager
–ORC file support
–Automatic and manual compaction
–Snapshot isolation
–Streaming ingest via Flume
• Phase 2, Hive 0.14 (we hope)
–INSERT … VALUES, UPDATE, DELETE
–BEGIN, COMMIT, ROLLBACK
• Future (all speculative based on user feedback)
–Versioned or point in time queries
–Additional isolation levels such as dirty read or read
committed
–MERGE
Phases of Development
© Hortonworks Inc. 2014
Page 23
•Only suitable for data warehousing, not
for OLTP
•Table must be bucketed, and (currently)
not sorted
–Sorting restriction will be removed in the future
Limitations
© Hortonworks Inc. 2014
Page 24
•Good
–Handles compactions for us
–Already has similar data model with LSM
•Bad
–No cross row transactions
–Would require us to write a transaction manager over
HBase, doable, but not less work
–Hfile is column family based rather than columnar
–HBase focused on point lookups and range scans
–Warehousing tends to require full scans
Why Not HBase?
© Hortonworks Inc. 2014
Page 25
•JIRA:
https://issues.apache.org/jira/browse/HI
VE-5317
•Adds ACID semantics to Hive
•Uses SQL standard commands
–INSERT, UPDATE, DELETE
•Provides scalable read and write access
Conclusion
© Hortonworks Inc. 2013
Thank You!
Questions & Answers
Page 26
1 of 26

More Related Content

What's hot(20)

Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
alanfgates5.1K views
Hive: Loading DataHive: Loading Data
Hive: Loading Data
Benjamin Leonhardi34K views
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang2.5K views
Apache Hive on ACIDApache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit2.8K views
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation1.5K views
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit5K views
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit2.7K views
ORC: 2015 Faster, Better, SmallerORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, Smaller
DataWorks Summit7.6K views
ORC 2015ORC 2015
ORC 2015
t3rmin4t0r3.3K views
LLAP Nov MeetupLLAP Nov Meetup
LLAP Nov Meetup
t3rmin4t0r710 views
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley36K views
Data organization: hive meetupData organization: hive meetup
Data organization: hive meetup
t3rmin4t0r2.1K views
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit1.5K views
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
Hortonworks903 views
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
alanfgates915 views
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit17.2K views

Similar to Hive acid-updates-summit-sjc-2014(20)

Recently uploaded(20)

CXL at OCPCXL at OCP
CXL at OCP
CXL Forum183 views
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya51 views
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
CXL Forum118 views
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver23 views

Hive acid-updates-summit-sjc-2014

  • 1. © Hortonworks Inc. 2014 Adding ACID Updates to Hive April 2014 Page 1 Owen O’Malley Alan Gates owen@hortonworks.com gates@hortonworks.com @owen_omalley @alanfgates
  • 2. © Hortonworks Inc. 2014 Page 2 •Hive Only Updates Partitions –Insert overwrite rewrites an entire partition –Forces daily or even hourly partitions •What Happens to Concurrent Readers? –Ok for inserts, but overwrite causes races –There is a zookeeper lock manager, but… •No way to delete, update, or insert rows –Makes adhoc work difficult What’s Wrong?
  • 3. © Hortonworks Inc. 2014 Page 3 •Hadoop and Hive have always… –Worked without ACID –Perceived as tradeoff for performance •But, your data isn’t static –It changes daily, hourly, or faster –Ad hoc solutions require a lot of work –Managing change makes the user’s life better •Do or Do Not, There is NO Try Why is ACID Critical?
  • 4. © Hortonworks Inc. 2014 Page 4 •Updating a Dimension Table –Changing a customer’s address •Delete Old Records –Remove records for compliance •Update/Restate Large Fact Tables –Fix problems after they are in the warehouse •Streaming Data Ingest –A continual stream of data coming in –Typically from Flume or Storm Use Cases
  • 5. © Hortonworks Inc. 2014 Page 5 •HDFS Does Not Allow Arbitrary Writes –Store changes as delta files –Stitched together by client on read •Writes get a Transaction ID –Sequentially assigned by Metastore •Reads get Committed Transactions –Provides snapshot consistency –No locks required –Provide a snapshot of data from start of query Design
  • 6. © Hortonworks Inc. 2013 Stitching Buckets Together Page 6
  • 7. © Hortonworks Inc. 2014 Page 7 •Partition locations remain unchanged –Still warehouse/$db/$tbl/$part •Bucket Files Structured By Transactions –Base files $part/base_$tid/bucket_* –Delta files $part/delta_$tid_$tid/bucket_* •Minor Compactions merge deltas –Read delta_$tid1_$tid1 .. delta_$tid2_$tid2 –Written as delta_$tid1_$tid2 •Compaction doesn’t disturb readers HDFS Layout
  • 8. © Hortonworks Inc. 2014 Page 8 •Created new AcidInput/OutputFormat –Unique key is transaction, bucket, row •Reader returns most recent update •Also Added Raw API for Compactor –Provides previous events as well •ORC implements new API –Extends records with change metadata –Add operation (d, u, i), transaction and key Input and Output Formats
  • 9. © Hortonworks Inc. 2014 Page 9 •Need to split buckets for MapReduce –Need to split base and deltas the same way –Use key ranges –Use indexes Distributing the Work
  • 10. © Hortonworks Inc. 2014 Page 10 •Existing lock managers –In memory - not durable –ZooKeeper - requires additional components to install, administer, etc. •Locks need to be integrated with transactions –commit/rollback must atomically release locks •We sort of have this database lying around which has ACID characteristics (metastore) •Transactions and locks stored in metastore •Uses metastore DB to provide unique, ascending ids for transactions and locks Transaction Manager
  • 11. © Hortonworks Inc. 2014 Page 11 •No explicit transactions in 0.13 –First implementation of INSERT, UPDATE, DELETE will be auto-commit –Will then add BEGIN, COMMIT, ROLLBACK •Snapshot isolation –Reader will see consistent data for the duration of his/her query –May extend to other isolation levels in the future •Current transactions can be displayed using new SHOW TRANSACTIONS statement Transaction Model
  • 12. © Hortonworks Inc. 2014 Page 12 •Three types of locks –shared –semi-shared (can co-exist with shared, but not other semi-shared) –exclusive •Operations require different locks –SELECT, INSERT – shared –UPDATE, DELETE – semi-shared –DROP, INSERT OVERWRITE – exclusive Locking Model
  • 13. © Hortonworks Inc. 2014 Page 13 •Each transaction (or batch of transactions in streaming ingest) creates a new delta file •Too many files = NameNode  •Need a way to –Collect many deltas into one delta – minor compaction –Rewrite base and delta to new base – major compaction Compactor
  • 14. © Hortonworks Inc. 2014 Page 14 •Run when there are 10 or more deltas (configurable) •Results in base + 1 delta Minor Compaction /hive/warehouse/purchaselog/ds=201403311000/base_0028000 /hive/warehouse/purchaselog/ds=201403311000/delta_0028001_0028100 /hive/warehouse/purchaselog/ds=201403311000/delta_0028101_0028200 /hive/warehouse/purchaselog/ds=201403311000/delta_0028201_0028300 /hive/warehouse/purchaselog/ds=201403311000/delta_0028301_0028400 /hive/warehouse/purchaselog/ds=201403311000/delta_0028401_0028500 /hive/warehouse/purchaselog/ds=201403311000/base_0028000 /hive/warehouse/purchaselog/ds=201403311000/delta_0028001_0028500
  • 15. © Hortonworks Inc. 2014 Page 15 •Run when deltas are 10% the size of base (configurable) •Results in new base Major Compaction /hive/warehouse/purchaselog/ds=201403311000/base_0028000 /hive/warehouse/purchaselog/ds=201403311000/delta_0028001_0028100 /hive/warehouse/purchaselog/ds=201403311000/delta_0028101_0028200 /hive/warehouse/purchaselog/ds=201403311000/delta_0028201_0028300 /hive/warehouse/purchaselog/ds=201403311000/delta_0028301_0028400 /hive/warehouse/purchaselog/ds=201403311000/delta_0028401_0028500 /hive/warehouse/purchaselog/ds=201403311000/base_0028500
  • 16. © Hortonworks Inc. 2014 Page 16 •Metastore thrift server will schedule and execute compactions –No need for user to schedule –User can initiate via new ALTER TABLE COMPACT statement •No locking required, compactions run at same time as select, inserts –Compactor aware of readers, does not remove old files until readers have finished with them •Current compactions can be viewed via new SHOW COMPACTIONS statement Compactor Continued
  • 17. © Hortonworks Inc. 2014 Page 17 •Data is flowing in from generators in a stream •Without this, you have to add it to Hive in batches, often every hour –Thus your users have to wait an hour before they can see their data •New interface in hive.hcatalog.streaming lets applications write small batches of records and commit them –Users can now see data within a few seconds of it arriving from the data generators •Available for Apache Flume in HDP 2.1 –Working on Apache Storm integration Application: Streaming Ingest
  • 18. © Hortonworks Inc. 2014 Page 18 Streaming Ingest Illustrated Flume Agent HDFS
  • 19. © Hortonworks Inc. 2014 Page 19 Streaming Ingest Illustrated Flume Agent HDFS while (…) write(); commit(); Commit can be time based or size based, up to writer commit() flushes to disk and sends commit to metastore
  • 20. © Hortonworks Inc. 2014 Page 20 Streaming Ingest Illustrated Flume Agent HDFS while (…) write(); commit(); Next write() appends to the same file
  • 21. © Hortonworks Inc. 2014 Page 21 Streaming Ingest Illustrated Flume Agent HDFS while (…) write(); commit(); Reader Task Reader uses txnid to determine which records to read
  • 22. © Hortonworks Inc. 2014 Page 22 • Phase 1, Hive 0.13 –Transaction and new lock manager –ORC file support –Automatic and manual compaction –Snapshot isolation –Streaming ingest via Flume • Phase 2, Hive 0.14 (we hope) –INSERT … VALUES, UPDATE, DELETE –BEGIN, COMMIT, ROLLBACK • Future (all speculative based on user feedback) –Versioned or point in time queries –Additional isolation levels such as dirty read or read committed –MERGE Phases of Development
  • 23. © Hortonworks Inc. 2014 Page 23 •Only suitable for data warehousing, not for OLTP •Table must be bucketed, and (currently) not sorted –Sorting restriction will be removed in the future Limitations
  • 24. © Hortonworks Inc. 2014 Page 24 •Good –Handles compactions for us –Already has similar data model with LSM •Bad –No cross row transactions –Would require us to write a transaction manager over HBase, doable, but not less work –Hfile is column family based rather than columnar –HBase focused on point lookups and range scans –Warehousing tends to require full scans Why Not HBase?
  • 25. © Hortonworks Inc. 2014 Page 25 •JIRA: https://issues.apache.org/jira/browse/HI VE-5317 •Adds ACID semantics to Hive •Uses SQL standard commands –INSERT, UPDATE, DELETE •Provides scalable read and write access Conclusion
  • 26. © Hortonworks Inc. 2013 Thank You! Questions & Answers Page 26