Apache Accumulo Overview

1,245 views

Published on

An overview of the Apache Accumulo high-performance, scalable, distributed key/value store.

Published in: Data & Analytics
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,245
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
37
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Apache Accumulo Overview

  1. 1. 11 Apache Accumulo Overview Bill Havanki Solutions Architect, Cloudera Government Solutions
  2. 2. 2 ©2014 Cloudera, Inc. All rights reserved. 2 •Quick History •Storage Model •Loading and Querying •Daemons •Getting Started, a.k.a., the Pitch Agenda
  3. 3. 3 A Quick History 3
  4. 4. 4 ©2014 Cloudera, Inc. All rights reserved. Google BigTable Compressed, high-performance, scalable, distributed sorted map 4
  5. 5. 5 ©2014 Cloudera, Inc. All rights reserved. Google BigTable • Began development in 2004 • Built on Google File System • Non-relational • Byte-oriented and schemaless • Stores data in the petabyte range • Research paper published in 2006 5
  6. 6. 6 ©2014 Cloudera, Inc. All rights reserved. Child(ren) of BigTable • Apache HBase (begun 2006, top-level 2010) • Apache Cassandra (begun 2008-ish, top-level 2010) • Apache Accumulo ... 6
  7. 7. 7 ©2014 Cloudera, Inc. All rights reserved. From Cloudbase to Accumulo • Started in 2008 as National Security Agency project • Submitted to Apache Incubator in 2011 (and renamed) • Top-level project in 2012 7
  8. 8. 8 Storage Model 8
  9. 9. 9 ©2014 Cloudera, Inc. All rights reserved. Key / Value Store Accumulo stores tables of key / value pairs 9
  10. 10. 10 ©2014 Cloudera, Inc. All rights reserved. Key / Value Store A row is a sorted sequence of key / value pairs Each pair is a cell 10
  11. 11. 11 ©2014 Cloudera, Inc. All rights reserved. The Key 11 row column timestamp family qualifier visibility
  12. 12. 12 ©2014 Cloudera, Inc. All rights reserved. An example key 12 bhavanki column 1401041295 personal middle PII
  13. 13. 13 ©2014 Cloudera, Inc. All rights reserved. Another example key 13 brees column 1401041296 employment salary FIN
  14. 14. 14 ©2014 Cloudera, Inc. All rights reserved. It’s all bytes All key and value data are stored as bytes except timestamp is a long There are no built-in data types but lexicoders help with common types Key components are usually UTF-8 strings 14
  15. 15. 15 ©2014 Cloudera, Inc. All rights reserved. Some rows for you 15 row cf cq cv ts value bhavanki job employer 2013-09-01 Cloudera bhavanki personal beer 2013-09-15 Omission bhavanki personal house NOMUGGL 2014-01-25 Ravenclaw brees job employer 2013-10-01 White Cliffs brees personal house NOMUGGL 2014-01-01 Hufflepuff
  16. 16. 16 ©2014 Cloudera, Inc. All rights reserved. Visibility Labels Boolean expression Specialist | (Management & SpecTraining) Authorizations are provided in each scan 16
  17. 17. 17 ©2014 Cloudera, Inc. All rights reserved. Locality Groups You can identify sets of one or more column families as locality groups Data in a locality group is stored together for improved read performance 17
  18. 18. 18 ©2014 Cloudera, Inc. All rights reserved. Tablets A table is comprised of one or more tablets 18 employeesemployees employees;Semployees;Semployees;Hemployees;H employees;~employees;~
  19. 19. 19 ©2014 Cloudera, Inc. All rights reserved. Tablets Tablets maps to data files in HDFS 19 employees;Semployees;Semployees;Hemployees;H employees;~employees;~ rfile 2rfile 2rfile 1rfile 1 rfile 3rfile 3
  20. 20. 20 ©2014 Cloudera, Inc. All rights reserved. Tablets Data also kept in write-ahead logs and memtable 20 employees;Hemployees;H rfile 1rfile 1 walogswalogs memtablememtable
  21. 21. 21 Loading and Querying 21
  22. 22. 22 ©2014 Cloudera, Inc. All rights reserved. Java Client API 22
  23. 23. 23 ©2014 Cloudera, Inc. All rights reserved. Java Client API Read using scanners Scanner s = conn.createScanner(“employees”, new Authorizations()); s.setRange(“alice”, “eve”); s.setColumnFamily(“personal”); for (Entry<Key, Value> e : s) employeeIds.add(e.getKey().getRow()); 23
  24. 24. 24 ©2014 Cloudera, Inc. All rights reserved. Java Client API Read access via iterator pattern • server-side system iterators handle timestamps, authorization checks, and lots more • iterators almost always wrap other iterators, forming a chain • you can define your own, client-side or server-side 24
  25. 25. 25 ©2014 Cloudera, Inc. All rights reserved. Java Client API Scanners fetch sorted rows from one range Batch scanners fetch unsorted rows from multiple ranges in parallel Isolated scanners ensure that you do not see a row mid-change 25
  26. 26. 26 ©2014 Cloudera, Inc. All rights reserved. MapReduce AccumuloInputFormat AccumuloOutputFormat 26
  27. 27. 27 ©2014 Cloudera, Inc. All rights reserved. MapReduce AccumuloRowInputFormat AccumuloRowOutputFormat 27
  28. 28. 28 ©2014 Cloudera, Inc. All rights reserved. Shell Command-line / manual access to Accumulo data • scan, insert, delete • iterator management • table management (creation, deletion, cloning) • user and authorization management • table splitting and merging • ... more 28
  29. 29. 29 ©2014 Cloudera, Inc. All rights reserved. Bulk Import Got lots of data to import quickly? • Use MR job to format data using AccumuloFileOutputFormat • Import files using shell Trade off latency / availablity for throughput 29
  30. 30. 30 Daemons 30
  31. 31. 31 ©2014 Cloudera, Inc. All rights reserved. Tablet Server Serves tablets (table data) • writes data to walog, memtable; deals with compaction • serves data for reads from files, memtable • handles recovery from walogs in case of server failure Most client calls go to tablet servers 31
  32. 32. 32 ©2014 Cloudera, Inc. All rights reserved. Master • assigns tablets to tablet servers • detects tablet server failures and reassigns tablets • balances tablet assignments over time • coordinates table operations Multiple supported for failover, only one active 32
  33. 33. 33 ©2014 Cloudera, Inc. All rights reserved. Everybody Else in Accumulo Garbage Collector (GC) - identifies and deletes files in HDFS that are no longer needed Tracer - listens for and stores distributed trace messages using a special table 33
  34. 34. 34 ©2014 Cloudera, Inc. All rights reserved. Everybody Else in Accumulo • Monitor - collects and serves status information • server status • log inspection • performance data • table inspection 34
  35. 35. 35 ©2014 Cloudera, Inc. All rights reserved. Everybody Else outside Accumulo • HDFS (as part of Apache Hadoop) • stores tablet files • stores write-ahead logs (1.5+) • MapReduce (Hadoop) • bulk import • batch processing • Apache ZooKeeper 35
  36. 36. 36 Getting Started a.k.a. the Pitch 36
  37. 37. 37 ©2014 Cloudera, Inc. All rights reserved. Easy as 1-2-3? 1.Install Hadoop (HDFS and MapReduce) 2.Install ZooKeeper 3.Install Accumulo! 37
  38. 38. 38 ©2014 Cloudera, Inc. All rights reserved. Making Steps 1 and 2 Easier Use a complete, pre-packaged Hadoop distribution ... like CDH! a leading commercial distribution centered on Apache Hadoop •many ecosystem components •configured / updated to work together 38
  39. 39. 39 ©2014 Cloudera, Inc. All rights reserved. Making Steps 1 and 2 Easier Cloudera Manager •deployment •configuration •operation •security 39
  40. 40. 40 ©2014 Cloudera, Inc. All rights reserved. Making Step 3 Easier Standard Apache Accumulo installation is via tarball • no longer shipping RPM / DEB / ... Using CDH/CM you can use: • a tarball, RPM or DEB with Accumulo packaged for CDH • a parcel (like RPM / ZIP) for easier upgrades • 1.4.4 and 1.4.5 available now • 1.6.0 soon 40
  41. 41. 41 ©2014 Cloudera, Inc. All rights reserved. Where to Go for More • http://accumulo.apache.org/ • http://www.cloudera.com/content/cloudera/en/products-and-service • http://www.cloudera.com/content/cloudera/en/products-and-service • http://www.cloudera.com/content/cloudera/en/products-and- services/cdh/accumulo.html 41
  42. 42. 42 ©2014 Cloudera, Inc. All rights reserved. Accumulo Summit Join us on June 12 42
  43. 43. 43 ©2014 Cloudera, Inc. All rights reserved. Quick Thanks • My slide reviewers • Sean Busbey • Mike Drob • Accumulo community • You all for listening 43
  44. 44. 44 ©2014 Cloudera, Inc. All rights reserved. Thank you! Bill Havanki bhavanki@clouderagovt.com 44

×