Transactional Operations
in Hive
Eugene Koifman
June 2017
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
 Motivations/Goals
 End user point of view
 Design
 Performance Improvements/Results
 Roadmap
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Motivations
 Modifying existing data
– INSERT OVERWRITE TABLE Target SELECT * FROM Target WHERE …
• Delete – OK, Update - ?
• Concurrency
– Hope for the best (multiple updates)
– ZooKeeper lock manager S/X locks – restrictive
• Expensive to do repeatedly (write side)
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Motivations
 Continuously adding new data to Hive in the past
– ALTER TABLE Target ADD PARTITION (dt=‘2016-06-30’)
• Lots of files – bad for performance
• Fewer files –users wait longer to see latest data
– INSERT INTO Target as SELECT FROM Staging
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Merge Statement – SQL Standard 2011 (Hive 2.2)
ID State County Value
1 CA LA 19.0
2 MA Norfolk 15.0
7 MA Suffolk 50.15
16 CA Orange 9.1
ID State Value
1 20.0
7 80.0
100 NH 6.0
MERGE INTO TARGET T
USING SOURCE S ON T.ID=S.ID
WHEN MATCHED THEN
UPDATE SET T.Value=S.Value
WHEN NOT MATCHED
INSERT (ID,State,Value)
VALUES(S.ID, S.State, S.Value)
ID State County Value
1 CA LA 20.0
2 MA Norfolk 15.0
7 MA Suffolk 80.0
16 CA Orange 9.1
100 NH null 6.0
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Goals
 Make above use cases easy and efficient
 Key Requirement
– Long running analytics queries should run concurrently with update commands
 NOT OLTP!!!
– Support slowly changing tables
– Not for 100s of concurrent queries trying to update the same partition
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
System at High Level
 A new type of table that supports Insert/Update/Delete/Merge SQL operations
 Concept of ACID transaction
– Atomic, Consistent, Isolated, Durable
 Streaming Ingest API
– Write a continuous stream of events to Hive in micro batches with transactional semantics
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User Point of View
 CREATE TABLE T(a int, b int) CLUSTERED BY (b) INTO 8 BUCKETS STORED AS ORC
TBLPROPERTIES ('transactional'='true');
 Not all tables support transactional semantics
 Table must be bucketed
 Table cannot be sorted
 Currently requires ORC File but anything implementing format
– AcidInputFormat/AcidOutputFormat
 autoCommit=true
 Transactions run at Snapshot Isolation
– Lock in the state of the DB as of the start of the query for the duration of the query
– Between Serializable and Repeatable Read
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design
 Transaction Manager
– Begin transaction and obtain a transaction ID
 Storage layer enhanced to support MVCC architecture
– Each row is tagged with unique ROW_ID (internal)
– Multiple versions of each row to allow concurrent readers and writers
– Result of each write is stored in a new Delta file
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How ACID is implemented in Hive?
 CREATE TABLE acidtbl (a INT, b STRING) CLUSTERED BY (a) INTO 1 BUCKETS STORED AS
ORC TBLPROPERTIES ('transactional'='true');
ACID Metadata Columns original_transaction_id
bucket_id
row_id
current_transaction_id
User Columns col_1:
a : INT
col_2:
b : STRING
ACID_PK
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How ACID is implemented in Hive?
 INSERT INTO acidtbl (a,b) VALUES (100, “foo”), (200, “xyz”), (300, “bee”);
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
delta_00001_00001/bucket_0000
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How ACID is implemented in Hive?
 UPDATE acidTbl SET b = “bar” where a = 300;
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
ACID_PK a b
{ 1, 0, 2 } 300 “bar”
delta_00001_00001/bucket_0000
delta_00002_00002/bucket_0000
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How ACID is implemented in Hive?
 DELETE FROM acidTbl where a = 200;
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
ACID_PK a b
{ 1, 0, 2 } 300 “bar”
ACID_PK a b
{ 1, 0, 1 } null null
delta_00001_00001/bucket_0000
delta_00002_00002/bucket_0000 delta_00003_00003/bucket_0000
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How ACID is implemented in Hive?
 SELECT * FROM acidtbl;
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
ACID_PK a b
{ 1, 0, 2 } 300 “bar”
ACID_PK a b
{ 1, 0, 1 } null null
delta_00001_00001/bucket_0000 delta_00002_00002/bucket_0000 delta_00003_00003/bucket_0000
{ 1, 0, 0 } 100 “foo” 100 “foo”{ 1, 0, 1 } 200 “xyz”{ 1, 0, 1 } null null{ 1, 0, 2 } 300 “bee”{ 1, 0, 2 } 300 “bar”
300 “bar”
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design - Compactor
 More operations = more delta files – make reads more expensive
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design - Compactor
 ALTER TABLE acidTbl COMPACT ‘MAJOR’;
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
ACID_PK a b
{ 1, 0, 2 } 300 “bar”
ACID_PK a b
{ 1, 0, 1 } null nulldelta_00001_00001/bucket_0000
delta_00002_00002/bucket_0000
delta_00003_00003/bucket_0000
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 2 } 200 “bar”
base_00003/bucket_0000
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design - Compactor
 Compactor rewrites the table in the background
– Minor compaction - merges delta files into fewer deltas
– Major compactor merges deltas with base - more expensive
– This amortizes the cost of updates and self tunes the tables
• Makes ORC more efficient - larger stripes, better compression
 Compaction can be triggered automatically or on demand
– There are various configuration options to control when the process kicks in.
– Compaction itself is a Map-Reduce job
 Key design principle is that compactor does not affect readers/writers
 Cleaner process – removes obsolete files
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design - Concurrency
 Transaction Manager
– manages transaction ID assignment
– keeps track of transaction state: open, committed, aborted
 Lock Manager
– DDL operations acquire eXclusive locks
– Read operations acquire Shared locks
– Also locks non transactional tables – different logic
• hive.txn.strict.locking.mode
 State of both persisted in Hive Metastore
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design - Concurrency
 Write Set tracking to prevent Write-Write conflicts in concurrent transactions
 Note that 2 Inserts are never in conflict since Hive does not enforce unique
constraints.
 You are allowed to read acid and non-acid tables in same query.
 You cannot write to acid and non-acid tables at the same time (multi-insert
statement)
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design - Streaming Ingest
 Allows you to continuously write events to a hive table
– Can commit periodically to make writes durable/visible
– Can also call abort to make writes since last commit/abort invisible.
– Optimized so that it can handle writing micro batches of events - every second.
• Multiple transactions are written to one file
– Only supports adding new data
 Streaming tools like NiFi, Storm and Flume rely on this API to ingest data into hive
 This API is public so it can be used directly
 Data written via Streaming API has the same transactional semantics as SQL side
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Merge Statement – SQL Standard 2011 (Hive 2.2)
ID State County Value
1 CA LA 19.0
2 MA Norfolk 15.0
7 MA Suffolk 50.15
16 CA Orange 9.1
ID State Value
1 20.0
7 80.0
100 NH 6.0
MERGE INTO TARGET T
USING SOURCE S ON T.ID=S.ID
WHEN MATCHED THEN
UPDATE SET T.Value=S.Value
WHEN NOT MATCHED
INSERT (ID,State,Value)
VALUES(S.ID, S.State, S.Value)
ID State County Value
1 CA LA 20.0
2 MA Norfolk 15.0
7 MA Suffolk 80.0
16 CA Orange 9.1
100 NH null 6.0
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SQL Merge
Target
Source
ACID_PK ID Stat
e
Count
y
Value
{ 1, 0, 1 } 1 CA LA 20.0
{ 1, 0, 3 } 7 MA Suffolk 80.0
ACID_PK ID State Coun
ty
Value
{ 2, 0, 1 } 100 NH 6.0
delta_00002_00002/bucket_0000
delta_00002_00002_001/bucket_0000
Right Outer Join
ON T.ID=S.ID
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
W/o MERGE – much less efficient
 UPDATE Target set Value= 20.0 where ID = 1;
 UPDATE Target set Value = 80.0 where ID = 7;
 INSERT INTO Target (ID, State, Value) VALUES(100, ‘NH’, 6.0);
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Performance
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Work-In-Progress
 Split an update into combination of delete and insert
 UPDATE acidTbl SET b = “bar” where a = 300;
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
ACID_PK a b
{ 1, 0, 2 } 300 “bar”
delta_00001_00001/bucket_0000
delta_00002_00002/bucket_0000
ACID_PK a b
{ 2, 0, 0 } 300 “bar”
ACID_PK a b
{ 1, 0, 2 } null null
delta_00002_00002/bucket_0000 delete_delta_00002_00002/bucket_0000
Enabled
PPD
Splits for
Delta files
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Benefits
 Improved PPD
 Better Network Utilization
 Better Memory Utilization
 Full Vectorization of Reads
 Updating bucket/partition columns
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Performance
 TPC-H Benchmark
– 10 node cluster at Scale Factor 1000 (1 TB of data)
– 11 delta files with 90 GB data each
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Future Work
 Multi statement transactions, i.e. BEGIN TRANSACTION/COMMIT/ROLLBACK
 Performance
– Smarter Compaction
 Finer grained concurrency management/conflict detection
 Read Committed w/Lock Based scheduling
 Better Monitoring/Alerting
 LOAD DATA … support
 Optional bucketing
 SMB support – user defined sort order
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Further Reading
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Etc
 Documentation
– https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
– https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
 Follow/Contribute
– https://issues.apache.org/jira/browse/HIVE-
14004?jql=project%20%3D%20HIVE%20AND%20component%20%3D%20Transactions
 user@hive.apache.org
 dev@hive.apache.org
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You

Transactional SQL in Apache Hive

  • 1.
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda  Motivations/Goals  End user point of view  Design  Performance Improvements/Results  Roadmap
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved Motivations  Modifying existing data – INSERT OVERWRITE TABLE Target SELECT * FROM Target WHERE … • Delete – OK, Update - ? • Concurrency – Hope for the best (multiple updates) – ZooKeeper lock manager S/X locks – restrictive • Expensive to do repeatedly (write side)
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved Motivations  Continuously adding new data to Hive in the past – ALTER TABLE Target ADD PARTITION (dt=‘2016-06-30’) • Lots of files – bad for performance • Fewer files –users wait longer to see latest data – INSERT INTO Target as SELECT FROM Staging
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved Merge Statement – SQL Standard 2011 (Hive 2.2) ID State County Value 1 CA LA 19.0 2 MA Norfolk 15.0 7 MA Suffolk 50.15 16 CA Orange 9.1 ID State Value 1 20.0 7 80.0 100 NH 6.0 MERGE INTO TARGET T USING SOURCE S ON T.ID=S.ID WHEN MATCHED THEN UPDATE SET T.Value=S.Value WHEN NOT MATCHED INSERT (ID,State,Value) VALUES(S.ID, S.State, S.Value) ID State County Value 1 CA LA 20.0 2 MA Norfolk 15.0 7 MA Suffolk 80.0 16 CA Orange 9.1 100 NH null 6.0
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved Goals  Make above use cases easy and efficient  Key Requirement – Long running analytics queries should run concurrently with update commands  NOT OLTP!!! – Support slowly changing tables – Not for 100s of concurrent queries trying to update the same partition
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved System at High Level  A new type of table that supports Insert/Update/Delete/Merge SQL operations  Concept of ACID transaction – Atomic, Consistent, Isolated, Durable  Streaming Ingest API – Write a continuous stream of events to Hive in micro batches with transactional semantics
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved User Point of View  CREATE TABLE T(a int, b int) CLUSTERED BY (b) INTO 8 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');  Not all tables support transactional semantics  Table must be bucketed  Table cannot be sorted  Currently requires ORC File but anything implementing format – AcidInputFormat/AcidOutputFormat  autoCommit=true  Transactions run at Snapshot Isolation – Lock in the state of the DB as of the start of the query for the duration of the query – Between Serializable and Repeatable Read
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved Design
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved Design  Transaction Manager – Begin transaction and obtain a transaction ID  Storage layer enhanced to support MVCC architecture – Each row is tagged with unique ROW_ID (internal) – Multiple versions of each row to allow concurrent readers and writers – Result of each write is stored in a new Delta file
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved How ACID is implemented in Hive?  CREATE TABLE acidtbl (a INT, b STRING) CLUSTERED BY (a) INTO 1 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true'); ACID Metadata Columns original_transaction_id bucket_id row_id current_transaction_id User Columns col_1: a : INT col_2: b : STRING ACID_PK
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved How ACID is implemented in Hive?  INSERT INTO acidtbl (a,b) VALUES (100, “foo”), (200, “xyz”), (300, “bee”); ACID_PK a b { 1, 0, 0 } 100 “foo” { 1, 0, 1 } 200 “xyz” { 1, 0, 2 } 300 “bee” delta_00001_00001/bucket_0000
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved How ACID is implemented in Hive?  UPDATE acidTbl SET b = “bar” where a = 300; ACID_PK a b { 1, 0, 0 } 100 “foo” { 1, 0, 1 } 200 “xyz” { 1, 0, 2 } 300 “bee” ACID_PK a b { 1, 0, 2 } 300 “bar” delta_00001_00001/bucket_0000 delta_00002_00002/bucket_0000
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved How ACID is implemented in Hive?  DELETE FROM acidTbl where a = 200; ACID_PK a b { 1, 0, 0 } 100 “foo” { 1, 0, 1 } 200 “xyz” { 1, 0, 2 } 300 “bee” ACID_PK a b { 1, 0, 2 } 300 “bar” ACID_PK a b { 1, 0, 1 } null null delta_00001_00001/bucket_0000 delta_00002_00002/bucket_0000 delta_00003_00003/bucket_0000
  • 15.
    15 © HortonworksInc. 2011 – 2016. All Rights Reserved How ACID is implemented in Hive?  SELECT * FROM acidtbl; ACID_PK a b { 1, 0, 0 } 100 “foo” { 1, 0, 1 } 200 “xyz” { 1, 0, 2 } 300 “bee” ACID_PK a b { 1, 0, 2 } 300 “bar” ACID_PK a b { 1, 0, 1 } null null delta_00001_00001/bucket_0000 delta_00002_00002/bucket_0000 delta_00003_00003/bucket_0000 { 1, 0, 0 } 100 “foo” 100 “foo”{ 1, 0, 1 } 200 “xyz”{ 1, 0, 1 } null null{ 1, 0, 2 } 300 “bee”{ 1, 0, 2 } 300 “bar” 300 “bar”
  • 16.
    16 © HortonworksInc. 2011 – 2016. All Rights Reserved Design - Compactor  More operations = more delta files – make reads more expensive
  • 17.
    17 © HortonworksInc. 2011 – 2016. All Rights Reserved Design - Compactor  ALTER TABLE acidTbl COMPACT ‘MAJOR’; ACID_PK a b { 1, 0, 0 } 100 “foo” { 1, 0, 1 } 200 “xyz” { 1, 0, 2 } 300 “bee” ACID_PK a b { 1, 0, 2 } 300 “bar” ACID_PK a b { 1, 0, 1 } null nulldelta_00001_00001/bucket_0000 delta_00002_00002/bucket_0000 delta_00003_00003/bucket_0000 ACID_PK a b { 1, 0, 0 } 100 “foo” { 1, 0, 2 } 200 “bar” base_00003/bucket_0000
  • 18.
    18 © HortonworksInc. 2011 – 2016. All Rights Reserved Design - Compactor  Compactor rewrites the table in the background – Minor compaction - merges delta files into fewer deltas – Major compactor merges deltas with base - more expensive – This amortizes the cost of updates and self tunes the tables • Makes ORC more efficient - larger stripes, better compression  Compaction can be triggered automatically or on demand – There are various configuration options to control when the process kicks in. – Compaction itself is a Map-Reduce job  Key design principle is that compactor does not affect readers/writers  Cleaner process – removes obsolete files
  • 19.
    19 © HortonworksInc. 2011 – 2016. All Rights Reserved Design - Concurrency  Transaction Manager – manages transaction ID assignment – keeps track of transaction state: open, committed, aborted  Lock Manager – DDL operations acquire eXclusive locks – Read operations acquire Shared locks – Also locks non transactional tables – different logic • hive.txn.strict.locking.mode  State of both persisted in Hive Metastore
  • 20.
    20 © HortonworksInc. 2011 – 2016. All Rights Reserved Design - Concurrency  Write Set tracking to prevent Write-Write conflicts in concurrent transactions  Note that 2 Inserts are never in conflict since Hive does not enforce unique constraints.  You are allowed to read acid and non-acid tables in same query.  You cannot write to acid and non-acid tables at the same time (multi-insert statement)
  • 21.
    21 © HortonworksInc. 2011 – 2016. All Rights Reserved Design - Streaming Ingest  Allows you to continuously write events to a hive table – Can commit periodically to make writes durable/visible – Can also call abort to make writes since last commit/abort invisible. – Optimized so that it can handle writing micro batches of events - every second. • Multiple transactions are written to one file – Only supports adding new data  Streaming tools like NiFi, Storm and Flume rely on this API to ingest data into hive  This API is public so it can be used directly  Data written via Streaming API has the same transactional semantics as SQL side
  • 22.
    22 © HortonworksInc. 2011 – 2016. All Rights Reserved Merge Statement – SQL Standard 2011 (Hive 2.2) ID State County Value 1 CA LA 19.0 2 MA Norfolk 15.0 7 MA Suffolk 50.15 16 CA Orange 9.1 ID State Value 1 20.0 7 80.0 100 NH 6.0 MERGE INTO TARGET T USING SOURCE S ON T.ID=S.ID WHEN MATCHED THEN UPDATE SET T.Value=S.Value WHEN NOT MATCHED INSERT (ID,State,Value) VALUES(S.ID, S.State, S.Value) ID State County Value 1 CA LA 20.0 2 MA Norfolk 15.0 7 MA Suffolk 80.0 16 CA Orange 9.1 100 NH null 6.0
  • 23.
    23 © HortonworksInc. 2011 – 2016. All Rights Reserved SQL Merge Target Source ACID_PK ID Stat e Count y Value { 1, 0, 1 } 1 CA LA 20.0 { 1, 0, 3 } 7 MA Suffolk 80.0 ACID_PK ID State Coun ty Value { 2, 0, 1 } 100 NH 6.0 delta_00002_00002/bucket_0000 delta_00002_00002_001/bucket_0000 Right Outer Join ON T.ID=S.ID
  • 24.
    24 © HortonworksInc. 2011 – 2016. All Rights Reserved W/o MERGE – much less efficient  UPDATE Target set Value= 20.0 where ID = 1;  UPDATE Target set Value = 80.0 where ID = 7;  INSERT INTO Target (ID, State, Value) VALUES(100, ‘NH’, 6.0);
  • 25.
    25 © HortonworksInc. 2011 – 2016. All Rights Reserved Performance
  • 26.
    26 © HortonworksInc. 2011 – 2016. All Rights Reserved Work-In-Progress  Split an update into combination of delete and insert  UPDATE acidTbl SET b = “bar” where a = 300; ACID_PK a b { 1, 0, 0 } 100 “foo” { 1, 0, 1 } 200 “xyz” { 1, 0, 2 } 300 “bee” ACID_PK a b { 1, 0, 2 } 300 “bar” delta_00001_00001/bucket_0000 delta_00002_00002/bucket_0000 ACID_PK a b { 2, 0, 0 } 300 “bar” ACID_PK a b { 1, 0, 2 } null null delta_00002_00002/bucket_0000 delete_delta_00002_00002/bucket_0000 Enabled PPD Splits for Delta files
  • 27.
    27 © HortonworksInc. 2011 – 2016. All Rights Reserved Benefits  Improved PPD  Better Network Utilization  Better Memory Utilization  Full Vectorization of Reads  Updating bucket/partition columns
  • 28.
    28 © HortonworksInc. 2011 – 2016. All Rights Reserved Performance  TPC-H Benchmark – 10 node cluster at Scale Factor 1000 (1 TB of data) – 11 delta files with 90 GB data each
  • 29.
    29 © HortonworksInc. 2011 – 2016. All Rights Reserved Future Work  Multi statement transactions, i.e. BEGIN TRANSACTION/COMMIT/ROLLBACK  Performance – Smarter Compaction  Finer grained concurrency management/conflict detection  Read Committed w/Lock Based scheduling  Better Monitoring/Alerting  LOAD DATA … support  Optional bucketing  SMB support – user defined sort order
  • 30.
    30 © HortonworksInc. 2011 – 2016. All Rights Reserved Further Reading
  • 31.
    31 © HortonworksInc. 2011 – 2016. All Rights Reserved Etc  Documentation – https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions – https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest  Follow/Contribute – https://issues.apache.org/jira/browse/HIVE- 14004?jql=project%20%3D%20HIVE%20AND%20component%20%3D%20Transactions  user@hive.apache.org  dev@hive.apache.org
  • 32.
    32 © HortonworksInc. 2011 – 2016. All Rights Reserved Thank You

Editor's Notes

  • #4 Easiest way to explain this is to talk about how you used to do some things in Hive before Hive ACID project.
  • #5 Easiest way to explain this is to talk about how you used to do some things in Hive before Hive ACID project.
  • #6 Target is the table inside the Warehouse Source table contains the changes to apply
  • #23 Target is the table inside the Warehouse Source table contains the changes to apply