Introduction to Apache Accumulo

8,035 views

Published on

Description of Apache Accumulo including data model, scaling and recovery features, API, security, and applications

Published in: Technology

Introduction to Apache Accumulo

  1. 1. Apache AccumuloIntroduction
  2. 2. Introduction• Aaron Cordova • Founded Accumulo project with several others • Led development through release 1.0 • aaron@tetraconcepts.com
  3. 3. Agenda• Introduction• Data Model• API• Architecture - scaling, recovery• Security• Data-lifecycle• Applications
  4. 4. Introduction
  5. 5. History• Began writing in summer of 2008, after comparing design goals with BigTable paper and existing implementations Hbase, Hypertable• Released internal version 1.0 summer of 2009.• September 2011 accepted as an Apache Incubator project. Doug Cutting, founder of Hadoop, was the Champion Sponsor• Feb 2012 1.4 Released• March 2012 graduates to a top level Apache project• V 1.5 due out soon
  6. 6. Introduction• Accumulo is a sparse, distributed, sorted, multi-dimensional map• Modeled after Google’s BigTable design• Scales to trillions of records and 100s of Terabytes• Features automatic load balancing, high-availability, dynamic control over data layout
  7. 7. Data Model
  8. 8. Data Model Key Column Value row ID Timestamp Family Qualifier Visibility
  9. 9. Data Model (Logical 2D table structure) attribute: attribute: purchases: returns:hat age phone sneakers bill 49 555-1212 $100 - george 38 - $80 $30
  10. 10. Physical layout (sorted keys) row col fam col qual col vis time value bill attribute age public Jun 2010 49 bill attribute phone private Jun 2010 555-1212 bill purchases sneakers public Apr 2010 $100 george attribute age private Oct 2009 38 george purchases sneakers public Nov 2009 $80 george returns hat public Dec 2009 $30
  11. 11. High-level API
  12. 12. Accumulo API• To use Accumulo, must write a an application using the Accumulo Java client library. There is no SQL (hence NoSQL)• Data is packaged into Mutation objects which are added to a BatchWriter which sends them to TabletServers• Clients can scan a set of key value pairs by specifying optional start and end keys (Range) and obtaining a Scanner. Iterating over the scanner returns sorted key value pairs for that range. Each scan takes milliseconds to start.• Can scan over a subset of the columns• Can send a set of Ranges to a BatchScanner, get matching key value pairs, unsorted
  13. 13. Insert row col fam col qual col vis time value bill attribute age public Jun 2010 49 bill purchases sneakers public Apr 2010 $100 george attribute age private Oct 2009 38 george purchases sneakers public Nov 2009 $80 george returns hat public Dec 2009 $30
  14. 14. Insert bill attribute phone private Jun 2010 555-1212 row col fam col qual col vis time value bill attribute age public Jun 2010 49 bill purchases sneakers public Apr 2010 $100 george attribute age private Oct 2009 38 george purchases sneakers public Nov 2009 $80 george returns hat public Dec 2009 $30
  15. 15. Insert row col fam col qual col vis time value bill attribute age public Jun 2010 49 bill attribute phone private Jun 2010 555-1212 bill purchases sneakers public Apr 2010 $100 george attribute age private Oct 2009 38 george purchases sneakers public Nov 2009 $80 george returns hat public Dec 2009 $30
  16. 16. Scan - Full key lookup bill attribute phone private Jun 2010 row col fam col qual col vis time value bill attribute age public Jun 2010 49 bill attribute phone private Jun 2010 555-1212 bill purchases sneakers public Apr 2010 $100 george attribute age private Oct 2009 38 george purchases sneakers public Nov 2009 $80 george returns hat public Dec 2009 $30
  17. 17. Scan - Single row bill row col fam col qual col vis time value bill attribute age public Jun 2010 49 bill attribute phone private Jun 2010 555-1212 bill purchases sneakers public Apr 2010 $100 george attribute age private Oct 2009 38 george purchases sneakers public Nov 2009 $80 george returns hat public Dec 2009 $30
  18. 18. Scan - Multiple Rows bill - will row col fam col qual col vis time value bill attribute age public Jun 2010 49 bill attribute phone private Jun 2010 555-1212 bill purchases sneakers public Apr 2010 $100 george attribute age private Oct 2009 38 george purchases sneakers public Nov 2009 $80 george returns hat public Dec 2009 $30
  19. 19. Scan - Multiple Rows, Selected Columns bill - will, fetch purchases row col fam col qual col vis time value bill attribute age public Jun 2010 49 bill attribute phone private Jun 2010 555-1212 bill purchases sneakers public Apr 2010 $100 george attribute age private Oct 2009 38 george purchases sneakers public Nov 2009 $80 george returns hat public Dec 2009 $30
  20. 20. Architecture - Scaling and Recovery
  21. 21. Performance Accumulo BigTable circa 2006 Cassandra• Accumulo ‘scales’ because aggregate read and write 10000 performance increase as more machines are added, and Thousands of writes per second because individual reads/write performance remains very good 1000 even with trillions of key-value pairs already in the system• Sources: http://www.slideshare.net/ acordova00/accumulo-on-ec2 100 http://techblog.netflix.com/2011/11/ benchmarking-cassandra-scalability-on.html http://static.googleusercontent.com/ external_content/untrusted_dlcp/ research.google.com/en/us/archive/bigtable- 10 osdi06.pdf 1 16 64 256 1024 Number of machines
  22. 22. Accumulo Prerequisites• One to hundreds of computers with local hard drives, connected via ethernet• Password-less SSH access• Local directory for write-ahead logs• Hadoop and ZooKeeper installed, configured, and running
  23. 23. Architecture Accumulo ZooKeeper HDFS MapReduce
  24. 24. Architecture: HDFS HDFS NameNode DataNodes File
  25. 25. Architecture: HDFS HDFS NameNode DataNodes Block 1 Block 2
  26. 26. Architecture: HDFS HDFS NameNode DataNodes
  27. 27. Architecture: Tables Accumulo Master Tablet Servers Table
  28. 28. Architecture: Tables Accumulo Master Tablet Servers P1 P2 P3
  29. 29. Architecture: Tables Accumulo Master Tablet Servers
  30. 30. Architecture: Writes P1 Mem Table File1 HDFS
  31. 31. Architecture: Writes P1 Mem Client Table Write-ahead Log File1 HDFS
  32. 32. Architecture: Writes P1 Mem Table Write-ahead Log File1 File 2 HDFS
  33. 33. Architecture: Writes P1 Mem Table X Write-ahead Log File1 File 2 HDFS
  34. 34. Architecture: Splits row col fam col qual col vis time value bill attribute age public Jun 2010 49 bill attribute phone private Jun 2010 555-1212 bill purchases sneakers public Apr 2010 $100 george attribute age private Oct 2009 38 george purchases sneakers public Nov 2009 $80 george returns hat public Dec 2009 $30
  35. 35. Architecture: Splits Accumulo Master Tablet Servers
  36. 36. Architecture: Splits Accumulo Master Tablet Servers
  37. 37. Architecture: Splits Accumulo Master Tablet Servers
  38. 38. Sorted keys - dynamic partitioning• Because keys are sorted, tables can be partitioned based on the data • partitions (tablets) are uniform in size, regardless of data distribution, (as long as single rows are smaller than the partition size) • not based on the number of servers• Can add /remove / fail servers at any time, the system is always automatically balanced
  39. 39. Partitioning Contrast• Some relational databases allow partitioning. May require users to choose a field or two on which to partition. Hopefully that field is uniformly distributed• Hash-based systems (default Cassandra, CouchDB, Riak, Voldemort) avoid this problem, but at the cost of range scans. Some support range scans via other means.• Many systems couple partition storage with partition service, requiring data movement to rebalance partition service (MongoDB, Cassandra, etc)
  40. 40. Architecture: Reads P1 Mem Client Table Merge File1 File 2
  41. 41. Architecture: Recovery Accumulo Master Tablet Servers DataNodes NameNode
  42. 42. Architecture: Recovery Accumulo Master Tablet Servers DataNodes NameNode
  43. 43. Architecture: Recovery Accumulo Master Tablet Servers DataNodes NameNode
  44. 44. Architecture: Recovery Accumulo Master Tablet Servers DataNodes NameNode
  45. 45. Architecture: Recovery Accumulo Master Tablet Servers Master reassigns DataNodes NameNode
  46. 46. Architecture: Recovery Accumulo Master Tablet Servers Replay Write-ahead Log DataNodes NameNode
  47. 47. Architecture: Recovery Accumulo Master Tablet Servers DataNodes NameNode
  48. 48. Architecture: Recovery Accumulo Master Tablet Servers DataNodes NameNode
  49. 49. Metadata Hierarchy metadata table root md1 md2 md3 user tables user1 user2 index1 index2
  50. 50. Architecture: Lookup Accumulo Master Tablet Servers ZooKeeper Client
  51. 51. Architecture: Lookup Accumulo Master Tablet Servers ZooKeeper Client knows zookeeper, finds root tablet Client
  52. 52. Architecture: Lookup Accumulo Master Tablet Servers ZooKeeper Scan root tabletfind metadata tablet that describes theuser table we want Client
  53. 53. Architecture: Lookup Accumulo Master Tablet Servers ZooKeeper Read location infoof tablets of user table and cache it Client
  54. 54. Architecture: Lookup Accumulo Master Tablet Servers ZooKeeper Read directly from serverholding the tablets we want Client
  55. 55. Architecture: Lookup Accumulo Master Tablet Servers ZooKeeper Find other tablets via cache lookups Client
  56. 56. Security
  57. 57. Security • Design and Guarantees • Data Labeling • Authentication • User Configuration
  58. 58. Data Security • Accumulo will only return cells whose visibility labels are satisfied by user credentials presented at Scan time • Two necessary conditions • Correctly labeling data on ingest • Presenting right user credentials
  59. 59. Security Labels Extension of BigTable data model column row ID timestamp value family qualifier visibility
  60. 60. Column Visibilityrow col fam col qual col vis time valuebill attribute age public Jun 2010 49bill attribute phone private Jun 2010 555-1212bill purchases sneakers public Apr 2010 $100george attribute age private Oct 2009 38george purchases sneakers public Nov 2009 $80george returns hat public Dec 2009 $30
  61. 61. Security Label Syntax • A & B - both A and B required • A | B - must have either A or B • (A | B) & C - must have C and A or B • A | (B & C) - must have A or both B and C • A & (B | (C & D))
  62. 62. Security Label Example • Drive needs: • license&over15 • Join military: • (over17|(over16&parentConsent)) & (greencard| USCitizen) • Access to Classified data • TS&SI&(USA|GBR|NZL|CAN|AUS)
  63. 63. Security Model Security Perimeter Accumulo auths data auths Trusted Client Auth Service verifyID, password, cert data User
  64. 64. Trusted Client Responsibility • Ensure that credentials belong to the user • Ensure that the user is authenticated
  65. 65. Application Authorization• Trusted Client applications must have max authorizations set before they can be passed• The Trusted Client limits the set of authorizations by application
  66. 66. Application Authorization Example • Data may be labeled with any combination of the following: { personal, research, finance, diet, cancer } • We wish to limit certain applications to a subset
  67. 67. Example Tablerow colF ColQ col vis valuerow0 name - personal|finance Johnrow0 age - personal|research 49row0 phone - personal|finance 555-1212row0 owed - personal|finance $5440row0 diagnosis - personal|(research & melanom cancer) arow0 diagnosis - personal|(research & diet) diabetes
  68. 68. Application AuthorizationsCancer Research: cancer diagnoses, ageDiabetes Research: diet info, ageAccounting System: balance, name, phonePersonal Records Management: all
  69. 69. Security Model Security Perimeter Accumulo Cancer Auth Service Research AppID, password, cert Researcher
  70. 70. Security Model Security Perimeter Accumulo Cancer Auth Service Research App verifyID, password, cert Researcher
  71. 71. Security Model Security Perimeter Accumulo research, cancer, diabetes Cancer Auth Service Research App verifyID, password, cert Researcher
  72. 72. Security Model Security Perimeter Accumulo research, cancer Cancer Auth Service Research AppID, password, cert Researcher
  73. 73. Security Model Security Perimeter Accumulo research, data cancer Cancer Auth Service Research AppID, password, cert Researcher
  74. 74. Security Model Security Perimeter Accumulo research, data cancer Cancer Auth Service Research AppID, password, cert data Researcher
  75. 75. Data life-cycle
  76. 76. Data Model Key Column Value row ID Timestamp Family Qualifier Visibility
  77. 77. Versions What can we do with multiple versions of the same data? rowID family qualifier timestamp value row1 fam1 qual1 1005 2 row1 fam1 qual1 1004 5 row1 fam1 qual1 1003 3 row1 fam1 qual1 1002 2 row1 fam1 qual1 1001 7
  78. 78. Iterators• Mechanism for adding online functionality to tables • Aggregation (called Combiners) • Age-Off • Filtering (including by security label)
  79. 79. Versioning Iterators rowID family qualifier timestamp value row1 fam1 qual1 1005 2 row1 fam1 qual1 1004 5 row1 fam1 qual1 1003 3 row1 fam1 qual1 1002 2 row1 fam1 qual1 1001 7
  80. 80. Filtering Iterators • Age Off • RegEx • Arbitrary filtering
  81. 81. Age Off• Can specify a particular date - e.g. delete everything older than July 1, 2007• Can specify a time period - e.g. delete everything older than 6 months
  82. 82. Age-Off Current Time: 1103rowID family qualifier timestamp valuerow1 fam1 qual1 1005 2 K/V pair is more thanrow1 fam1 qual1 1004 5 100 sec. oldrow1 fam1 qual1 1003 3row1 fam1 qual1 1002 2row1 fam1 qual1 1001 7
  83. 83. Age-Off Current Time: 1104rowID family qualifier timestamp value K/V pair is more thanrow1 fam1 qual1 1005 2 100 sec. oldrow1 fam1 qual1 1004 5row1 fam1 qual1 1003 3row1 fam1 qual1 1002 2row1 fam1 qual1 1001 7
  84. 84. Age-Off Current Time: 1105 K/V pair is more thanrowID family qualifier timestamp value 100 sec. oldrow1 fam1 qual1 1005 2row1 fam1 qual1 1004 5row1 fam1 qual1 1003 3row1 fam1 qual1 1002 2row1 fam1 qual1 1001 7
  85. 85. Manual Deletes• Can insert ‘deletes’. They are inserted like other key-value pairs, any keys with an older timestamp is suppressed from reads• Compactions write non-deleted data to new files• Old files are then removed from HDFS• To ensure data is deleted from disk, • write deletes (they are now absent from query results) • compact (can compact a particular range of a table if it’s large)
  86. 86. Garbage Collection• Garbage collector compares the files in HDFS with the set of files currently active• When files are no longer on the active list, GC waits for a while, then deletes from HDFS
  87. 87. Applications• Fast lookups / scan on extremely large tables with flexible schemas, varying security• Large index across heterogeneous data sets• Continuous Summary Analytics via Iterators• Secure Storage of key value pairs for MapReduce jobs
  88. 88. Where does your data come from?• BigTable was designed to store data for web applications serving millions of users. Web application creates all the data. Many NoSQL databases are designed solely for this purpose. Accumulo can certainly support that.• However, many organizations have lots of data from various sources. Different schema, different security levels. Bringing them together for analysis is very valuable. Accumulo can support this too.
  89. 89. Indexing and queries• BigTable data model supports building a wide variety of indexes • Simple strings, numbers, geo points, ip addresses, etc• Each has to be coupled with query code• New applications should examine their data access use cases, indexes and query code to accomplish those can then be written• Best applications are constructed so each user request is a single scan, or a small number of scans
  90. 90. Compared to MapReduce• Hadoop’s HDFS stores simple files. Usually unsorted.• MapReduce is designed to process all or most of the files at once.• Accumulo maintains a set of sorted files in HDFS• Accumulo scans are designed to access a small portion of the data quickly.• Fairly complementary
  91. 91. Tough use case• Ran MapReduce on some input data set to create a large result set.• Now have a few new records, want to update the result set• MapReduce has to process all the data again, have to wait• Accumulo allows users to perform a limited set of operations to update a result set incrementally, using Iterators• Result sets are always up to date, immediately after insert
  92. 92. Combiners row col fam col qual col vis time value bill perf June_calls P June 1 9 bill perf June_calls P June 4 3 bill perf July_calls P July 3 4 bill perf July_calls P July 11 7 bill perf August_calls P Aug 12 5 bill perf August_calls P Aug 29 2
  93. 93. Combiners row col fam col qual col vis time value bill perf June_calls P - 12 bill perf July_calls P - 11 bill perf August_calls P - 7
  94. 94. Combiners • Almost equivalent to Reduce of MapReduce except: • Cannot assume we have seen all the values for a particular key • Exactly equivalent to a Combiner function
  95. 95. Combiners• Useful Combiners: • Event count (StringSummation or LongSummation aggregator) • Event hour occurrence histogram (NumArraySummation aggregator) • Event duration histogram (NumArraySummation aggregator)
  96. 96. Conceptual Graph Representation b d a c e g f
  97. 97. Edge table row col fam col qual col vis time value a edge f 1.0 c edge b 1.0 c edge d 1.0 d edge b 1.0 d edge e 1.0 e edge d 1.0 f edge g 1.0 g edge e 1.0 g edge f 1.0
  98. 98. Edge Weights• Summing Combiners are typically used to efficiently and incrementally update edge weights • See SummingCombiner
  99. 99. Edge table Incoming: a, edge, f, 1.0row col fam col qual col vis time valuea edge f 1.0c edge b 1.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0
  100. 100. Edge tablerow col fam col qual col vis time valuea edge f 2.0c edge b 1.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0
  101. 101. Edge table Incoming: c, edge, b, 6.0row col fam col qual col vis time valuea edge f 2.0c edge b 1.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0
  102. 102. Edge tablerow col fam col qual col vis time valuea edge f 2.0c edge b 7.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0
  103. 103. Edge table Incoming: a, edge, f, 2.3row col fam col qual col vis time valuea edge f 2.0c edge b 7.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0
  104. 104. Edge tablerow col fam col qual col vis time valuea edge f 4.3c edge b 7.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0
  105. 105. Edge Table Applications• Graph Analytics - traversal, neighbors, connected components• Neighborhood = feature vector. Vector-based machine learning techniques. Nearest neighbor search, clustering, classification• Automated dossiers, fact accumulation - ‘tell me everything we know about X’ in a single scan• Find entities based on features - ‘show me everyone who has feature value > x’ or ‘with < 5 neighbors of type k’
  106. 106. RDF Triplesrow col fam col qual col vis time valueDC is_capital_of USA 1.0Don vacations_in Arctic 7.0Don is_employed_by MI6 1.0Sean has_status “007” 1.0Sean starred_with Ursula 1.0Sean starred_with Anya 0.7Sean starred_with Teresa 0.3
  107. 107. RDF Triples - RYA• See RYA project : http://www.usna.edu/Users/cs/adina/research/ Rya_CloudI2012.pdf
  108. 108. Additional Training
  109. 109. Additional Training• Talked about the basics today• 3 days of developer training with hands on examples covering • installation, configuration, read / write API, MapReduce, security, table configuration, indexing specific types, querying index tables, combiners, custom iterators, table constraints, storing relational data, joins, high performance considerations, document-partitioned indexing (text search), machine learning, object persistence• 2 days of administrator training covering • hardware selection, process assignment, troubleshooting, maintenance, replication and high availability, cluster modification, failure handling
  110. 110. Next Scheduled Training Sessions• March 5-7 Columbia MD• April 9-11 Columbia MD• http://www.tetraconcepts.com/training• aaron@tetraconcepts.com• brian@tetraconcepts.com

×