Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Hive ACID Project

1,748 views

Published on

Apache Hive ACID Project

Published in: Technology
  • Be the first to comment

Apache Hive ACID Project

  1. 1. Apache Hive ACID Project Eugene Koifman June 2016
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda  Motivations/Goals  What is included in the project  End user point of view  Architecture  Recent Progress  Possible future directions
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Motivations/Goals  Continuously adding new data to Hive in the past – INSERT INTO Target as SELECT FROM Staging – ALTER TABLE Target ADD PARTITION (dt=‘2016-06-30’) • Lots of files – bad for performance • Fewer files –users wait longer to see latest data  Modifying existing data – Analyzing log files – not that important. Sourcing data from an Operational Data Store – may be really important. – INSERT OVERWRITE TABLE Target SELECT * FROM Target WHERE … • Concurrency – Hope for the best (multiple updates) – ZooKeeper lock manager S/X locks – restrictive • Expensive to do repeatedly (write side)
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Goals  Make above use cases easy and efficient  Key Requirement – Long running analytics queries should run concurrently with update commands  NOT OLTP!!! – Support slowly changing tables – Not for 100s of concurrent queries trying to update the same partition
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ACID at High Level  A new type of table that supports Insert/Update/Delete SQL operations  Concept of ACID transaction – Atomic, Consistent, Isolated, Durable  Streaming Ingest API – Write a continuous stream of events to Hive in micro batches with transactional semantics
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ACID at High Level RDMS Compute Nodes HDFS Streaming Client SQL Client Meta Store openTxn/commit/abort Data txnID
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User Point of View  CREATE TABLE T(a int, b int) CLUSTERED BY (b) INTO 8 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');  Not all tables support transactional semantics  Table must be bucketed – important for query performance  Table cannot be sorted – ACID implementation requires its own sort order  Currently requires ORC File but anything implementing format – AcidInputFormat/AcidOutputFormat  Snapshot Isolation – Lock in the state of the DB as of the start of the query for the duration of the query  autoCommit=true
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Design – Storage Layer  Storage layer enhanced to support MVCC architecture – Multiple versions of each row – Allows concurrent readers/writers  HDFS – append only file system – All update operations are written to a delta file first – Files are combined on read and compaction  Even if you could update a file in the middle – The architecture of choice for analytics is columnar storage (ORC File) – Compresses by column – difficult to update  Random data access is prohibitively slow
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storage Layer Example  CREATE TABLE T(a int, b int) CLUSTERED BY (b) INTO 1 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');  Suppose the table contains (1,2),(3,4) hive> update T set a = -3 where a = 3; hive> update T set a = -1 where a = 1; Now the table has (-1,2),(-3,4)  hive> dfs -ls -R /user/hive/warehouse/t; /user/hive/warehouse/t/base_0000022/bucket_00000 /user/hive/warehouse/t/delta_0000023_0000023_0000/bucket_00000 /user/hive/warehouse/t/delta_0000024_0000024_0000/bucket_00000
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example Continued  bin/hive --orcfiledump -j -d /user/hive/warehouse/t/base_0000022/bucket_00000 {"operation":0,"originalTransaction":22,"bucket":0,"rowId":0,"currentTransaction":22,"row":{"a":3,"b":4}} {"operation":0,"originalTransaction":22,"bucket":0,"rowId":1,"currentTransaction":22,"row":{"a":1,"b":2}}  bin/hive --orcfiledump -j -d /…/t/delta_0000023_0000023_0000/bucket_00000 {"operation":1,"originalTransaction":22,"bucket":0,"rowId":0,"currentTransaction":23,"row":{"_col1":-3,"_col2":4}}  Each file is sorted by PK: originalTransaction,bucket,rowid  On read base & deltas are stitched together to produce correct version of each row.  Each read operation “knows” the state of all transactions up to the moment it started
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Producing The Snapshot base_0000022/bucket_00000 oTxn bucket rowId cTxn a b 22 0 0 22 3 4 22 0 1 22 1 2 select * from T a b -3 4 -1 2 delta_0000023_0000023_0000 oTxn bucket rowId cTxn a b 22 0 0 23 - 3 4 delta_0000024_0000024_0000 oTxn bucket rowId cTxn a b 22 0 1 24 -1 2
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Design - Compactor  More operations = more delta files  Compactor rewrites the table in the background – Minor compaction - merges delta files into fewer deltas – Major compactor merges deltas with base - more expensive – This amortizes the cost of updates and self tunes the tables • Makes ORC more efficient - larger stripes, better compression  Compaction can be triggered automatically or on demand – There are various configuration options to control when the process kicks in. – Compaction itself is a Map-Reduce job  Key design principle is that compactor does not affect readers/writers  Cleaner process – removes obsolete files
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Design - Concurrency  Transaction Manager – manages transaction ID assignment – keeps track of transaction state: open, committed, aborted  Lock Manager – DDL operations acquire eXclusive locks – Read operations acquire Shared locks. – Main goal is to prevent someone dropping a table while a query is in progress  State of both persisted in Hive Metastore  Write Set tracking to prevent Write-Write conflicts in concurrent transactions  Note that 2 Inserts are never in conflict since Hive does not enforce unique constraints.
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved  You are allowed to read acid and non-acid tables in same query.  You cannot write to acid and non-acid tables at the same time (multi-insert statement)
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Design - Streaming Ingest  Allows you to continuously write events to a hive table – Can commit periodically to make writes durable/visible – Can also call abort to make writes since last commit/abort invisible. – Optimized so that it can handle writing micro batches of events - every second. • Multiple transactions are written to one file – Only supports adding new data  Streaming tools like Storm and Flume rely on this API to ingest data into hive  This API is public so it can be used directly  Data written via Streaming API has the same transactional semantics as SQL side
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recent improvements  PPD  Schema Evolution  Split computation ( Tez version 0.7 required)  Usability – better lock info – compaction history – show locks filtering  Various safety checks - open txn limit  Metastore side processes like compaction are no longer singletons  Metastore scalability  Bug fixes (Hive, Flume, Storm)
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Future Work (Uncommitted transaction… may be rolled back)  Automatic/Dynamic bucketing  Merge statement (SQL Standard 2003) - HIVE-10924  Performance – Better Vectorization; some operations over acid tables don’t vectorize at all – Some do but not as well as they could  HCatalog integration (at least read side) to read from Pig/MR  Multi statement transactions, i.e. BEGIN TRANSACTION/COMMIT/ROLLBACK  Finer grained concurrency management/conflict detection  Better Monitoring/Alerting
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Etc  Documentaton – https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions – https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest  Follow/Contribute – https://issues.apache.org/jira/browse/HIVE- 14004?jql=project%20%3D%20HIVE%20AND%20component%20%3D%20Transactions  user@hive.apache.org  dev@hive.apache.org
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You

×