Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!

For Map/Reduce programmers used to HDFS, the mutability of HBase tables poses new challenges: Data can change over the duration of a job, multiple jobs can write concurrently, writes are effective immediately, and it is not trivial to clean up partial writes. Revision Manager introduces atomic commits and point-in-time consistent snapshots over a table, guaranteeing repeatable reads and protection from partial writes. Revision Manager is optimized for a relatively small number of concurrent write jobs, which is typical within Hadoop clusters. This session will discuss the implementation of Revision Manager using ZooKeeper and coprocessors, and paying extra care to ensure security in multi-tenant clusters. Revision Manager is available as part of the HBase storage handler in HCatalog, but can easily be used stand-alone with little coding effort.

  • Login to see the comments

HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!

  1. 1. Relaxed Transactions for HBase Francis Liu, Software Engineer 5/22/12
  2. 2. Mutable Data Writes are effective immediately Read_Job Table1 Map1 C1=1 C2=2 Map2 C3=3 Map3
  3. 3. Mutable Data Writes are effective immediately Job1 Read_Job Table1 Map1 Map1 C1=1 Map2 C2=2 Write_Job Map2 C3=4 Map3 Map3
  4. 4. Mutable Data Partial writes in the midst of failures Write_Job Table1 Map1 C1=1 Map2 C2=2 C3=? Map3
  5. 5. Mutable Data Partial writes in the midst of failures Write_Job Table1 Map1 C1=1 Map2 Read_Job C2=2 C3=? Map3
  6. 6. Revision Manager Optimized for batch processing › Large number of writes (ie Data Ingestion, Batch updates) Cross row write transactions within a table Coprocessor Endpoint › Leverage HBase Security Zookeeper for persistence › table revision information Experimental feature in Hcatalog 0.4
  7. 7. Architecture Revision Mgr Revision Mgr Service Client (Coprocessor) InputFormat/ RegionServer Zookeeper OutputFormat
  8. 8. API For reads › RevisionManager.createSnapshot(tableName) › SnapshotFilter.filter(result) For writes › RevisionManager.beginWriteTransaction(table, families) › RevisionManager.commitWriteTransaction(transaction) › RevisionManager.abortWriteTransaction(transaction)
  9. 9. Concepts Revision › Monotonically increasing number › All “Puts” of a job are written with the same revision number as the cell version TableSnapshot › Point-in-time consistent view of a table › Used for reading › Latest committed revision › List of aborted revisions › Upper bound on visible revision per CF Transaction › Write transaction › Revision Number › List of column Families being written to
  10. 10. Relaxed Transaction Properties Immutable Input Change After Commit Precedence Preservation
  11. 11. Immutable Input Consistent Read Write CellA=1 Read Snapshot1 CellA=1 CellA=1 CellA=1Read_Job1 Begin t1 CellA=2 CommitWrite_Job1
  12. 12. Change After Commit Revisions are only viewable after commit › A job cannot see it‟s own writes Aborted revisions are added to a table‟s aborted list Timed out revisions are aborted
  13. 13. Change After Commit Write CellA=1 Read Snapshot1 CellA=1Read_Job1 Begin t1 CellA=2 CommitWrite_Job1 t1 change read Snapshot2 CellA=2Read_Job2
  14. 14. Precedence Preservation Snapshot Isolation › Transaction is aborted when a write conflict is detected Conflicts › Concurrent transactions to the same Column Family › Inefficient to abort Resolved during read time • For every CF – find: min_rev = min(active_revision) – Only return closest revision to min_rev • min_rev is what‟s stored in a snapshot
  15. 15. Precedence Preservation CellA=1 Write CellB=1 Read Begin t1 CellA=2 CellB=2 CommitWrite_Job1 Changes are not visible due to t1 Begin t2 CellA=3 CommitWrite_Job2 Snapshot1 CellA=1 CellB=1Read_Job1 * CellA and CellB are members of the same column family
  16. 16. Snapshot Filter Consumes TableSnapshot Read time filtering › Aborted revisions › Revisions written after snapshot was taken › Conflicting/Blocked revisions
  17. 17. Flow - Read User/Client › RevisionManager.createSnapshot() • TableSnapshot instance is serialized into JobConf RecordReader › Using SnapshotFilter.filter(result)
  18. 18. Flow - Read SnapshotRecordReader SnapshotFilter ScannerIterator next(key,value) Loop result != null and filtered == null next() next result filter(result) filtered result next record
  19. 19. Flow - Write User/Client › HBaseOutputFormat.checkOutputSpecs(FileSystem, JobConf) • Write transaction is started by calling beginWriteTransaction(Transaction) • Transaction instance is serialized into JobConf RecordWriter › Puts make use of the revision number as the version OutputCommitter › OutputCommitter.commitJob(JobContext) • RevisionManager.commitWriteTransaction(Transaction) › OutputCommitter.abortJob(JobContext) • RevisionManager.abortWriteTransaction(Transaction)
  20. 20. Usage Using HCatalog Revision Manager usage is done under the covers. Work is being done to decouple HCatalog from HBaseInputFormat/HBaseOutputFormat Other frameworks can make use of the RevisionManager API
  21. 21. Usage: HCatalogCreate Tablehcat –e “create table my_table(key string, gpa string) STORED BYorg.apache.hcatalog.hbase.HBaseHCatStorageHandlerTBLPROPERTIES (hbase.columns.mapping=:key,info:gpa);”Using PigA = LOAD „table1‟ USING org.apache.hcatalog.pig.HCatLoader();STORE A INTO „table1‟ USING org.apache.hcatalog.pig.HCatStorer();Using MapReduceHCatInputFormat.setInput(job,…)HCatOutputFormat.setOutput(job,…)
  22. 22. Future Work Compaction of aborted transactions Server-side filtering using HBase Filters Compatibility with Hive
  23. 23. Further Info
  24. 24. Questions?