Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive-14535 : Cloud storage
Gopal V
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloud “FileSystems” are Strange Beasts
“There are no directories. Only paths.”
“There are no users. Only keys.”
“There are no permissions. Only acl rules.”
“There is consistency, but not as we know it.”
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“Directories vs Paths.”
• Storage of Path information can be assumed to be a sorted hash-table.
• File listings are no longer listing off a tree, but prefix search
• Directories don’t need to necessarily exist for a path below it
• Listing a single level is more complex than a full-depth traversal
• Renames can cause rebalancing and moving about of the structure
• Adjacent files are sometimes more expensive than random ops
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“Users & permissions vs keys & ACLs”
• Distinguishing the user for an accessing process has no meaning
• Access keys are often rotated and occasionally invalidated
• User identity can be mapped to a key (externally or by id management)
• Buckets are commonly used to differentiate stores, instead of permissions
• Permissions are rarely set or applied per-file, but across path patterns
• Permissions set to a directory need extra user checks to be useful (chmod +x)
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“Consistency”
• Arguably the most complex issue
• Renames needn’t be consistent, creates can have collisions
• Reads can return old data for the same path when overwriting
• Versioned reads are complex to manage and hard to throw a “Time machine” over
• Cross-Region Replication often lags and doubles stale-read issues
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Micro-Managed Hive Tables
• Support for all Hive input formats, including user ones
• Avoid rename operations as much as possible
• Never collide final paths for different inserts
• Ongoing inserts should be atomic across > 1 partitions
• Snapshot isolation for data reads for existing partitions being back-filled
• Stage data without accidental partial-reads for bucket replication
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Micro-Managed Hive Tables
CREATE TABLE `web_returns_hive_commit`(…
`wr_net_loss` float)
PARTITIONED BY (`wr_returned_date_sk` int)
STORED AS <FORMAT>
LOCATION 's3a://hwdev-hive-14535/web_returns_hive_commit'
TBLPROPERTIES
('transactional'='true',
'transactional_properties'='insert_only');
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Micro-Managed Hive Tables
drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450820
drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450820/mm_0
-rw-rw-rw- 1 cloudbreak 1791 2016-12-07 00:55 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450820/mm_0/000021_0
drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450821
drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450821/mm_0
-rw-rw-rw- 1 cloudbreak 2186 2016-12-07 00:55 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450821/mm_0/000022_0
drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450822
drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450822/mm_0
-rw-rw-rw- 1 cloudbreak 1814 2016-12-07 00:55 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450822/mm_0/000023_0
/web_returns_hive_commit/wr_returned_date_sk=2450820/mm_0/000021_0
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“Take a number” for inserts
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Read: tracking committed data
• Similar to Hive-ACID (ORC)
• Committed txns disappear from the tracking data
• With each query, it takes a highest known txn + list of open/aborted txns
• All valid transactions are < max(transaction_id) and not IN (open_txns)
• The transaction filtering is done at the listing level for all formats
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Branch + Future Work
Current measurement has 21% reduction in partition load time (+HIVE-15368)
Time taken to load dynamic partitions: 350.846 seconds -> 274.715 seconds
Work continues in the branch for hive-14535
Work ongoing to optimize to take advantage of faster recursive listings
Discussions towards incremental refresh for cube engines for backfill
Questions?
Suggestions?

Hive - 1455: Cloud Storage

  • 1.
    Page1 © HortonworksInc. 2011 – 2015. All Rights Reserved Hive-14535 : Cloud storage Gopal V
  • 2.
    Page2 © HortonworksInc. 2011 – 2015. All Rights Reserved Cloud “FileSystems” are Strange Beasts “There are no directories. Only paths.” “There are no users. Only keys.” “There are no permissions. Only acl rules.” “There is consistency, but not as we know it.”
  • 3.
    Page3 © HortonworksInc. 2011 – 2015. All Rights Reserved “Directories vs Paths.” • Storage of Path information can be assumed to be a sorted hash-table. • File listings are no longer listing off a tree, but prefix search • Directories don’t need to necessarily exist for a path below it • Listing a single level is more complex than a full-depth traversal • Renames can cause rebalancing and moving about of the structure • Adjacent files are sometimes more expensive than random ops
  • 4.
    Page4 © HortonworksInc. 2011 – 2015. All Rights Reserved “Users & permissions vs keys & ACLs” • Distinguishing the user for an accessing process has no meaning • Access keys are often rotated and occasionally invalidated • User identity can be mapped to a key (externally or by id management) • Buckets are commonly used to differentiate stores, instead of permissions • Permissions are rarely set or applied per-file, but across path patterns • Permissions set to a directory need extra user checks to be useful (chmod +x)
  • 5.
    Page5 © HortonworksInc. 2011 – 2015. All Rights Reserved “Consistency” • Arguably the most complex issue • Renames needn’t be consistent, creates can have collisions • Reads can return old data for the same path when overwriting • Versioned reads are complex to manage and hard to throw a “Time machine” over • Cross-Region Replication often lags and doubles stale-read issues
  • 6.
    Page6 © HortonworksInc. 2011 – 2015. All Rights Reserved Micro-Managed Hive Tables • Support for all Hive input formats, including user ones • Avoid rename operations as much as possible • Never collide final paths for different inserts • Ongoing inserts should be atomic across > 1 partitions • Snapshot isolation for data reads for existing partitions being back-filled • Stage data without accidental partial-reads for bucket replication
  • 7.
    Page7 © HortonworksInc. 2011 – 2015. All Rights Reserved Micro-Managed Hive Tables CREATE TABLE `web_returns_hive_commit`(… `wr_net_loss` float) PARTITIONED BY (`wr_returned_date_sk` int) STORED AS <FORMAT> LOCATION 's3a://hwdev-hive-14535/web_returns_hive_commit' TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only');
  • 8.
    Page8 © HortonworksInc. 2011 – 2015. All Rights Reserved Micro-Managed Hive Tables drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450820 drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450820/mm_0 -rw-rw-rw- 1 cloudbreak 1791 2016-12-07 00:55 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450820/mm_0/000021_0 drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450821 drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450821/mm_0 -rw-rw-rw- 1 cloudbreak 2186 2016-12-07 00:55 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450821/mm_0/000022_0 drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450822 drwxrwxrwx - cloudbreak 0 2016-12-07 21:42 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450822/mm_0 -rw-rw-rw- 1 cloudbreak 1814 2016-12-07 00:55 s3a://hwdev-hive-14535/web_returns_hive_commit/wr_returned_date_sk=2450822/mm_0/000023_0 /web_returns_hive_commit/wr_returned_date_sk=2450820/mm_0/000021_0
  • 9.
    Page9 © HortonworksInc. 2011 – 2015. All Rights Reserved “Take a number” for inserts
  • 10.
    Page10 © HortonworksInc. 2011 – 2015. All Rights Reserved Read: tracking committed data • Similar to Hive-ACID (ORC) • Committed txns disappear from the tracking data • With each query, it takes a highest known txn + list of open/aborted txns • All valid transactions are < max(transaction_id) and not IN (open_txns) • The transaction filtering is done at the listing level for all formats
  • 11.
    Page11 © HortonworksInc. 2011 – 2015. All Rights Reserved Branch + Future Work Current measurement has 21% reduction in partition load time (+HIVE-15368) Time taken to load dynamic partitions: 350.846 seconds -> 274.715 seconds Work continues in the branch for hive-14535 Work ongoing to optimize to take advantage of faster recursive listings Discussions towards incremental refresh for cube engines for backfill Questions? Suggestions?