Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Accumulo meetup 20130109

4,747 views

Published on

Published in: Technology
  • Be the first to comment

Accumulo meetup 20130109

  1. 1. Securely explore your data APPACHE ACCUMULO Adam Fuchs and John Vines sqrrldata, inc. January 9, 2013
  2. 2. APACHE ACCUMULO Sorted, Distributed Key/Value Store Based on Google’s Big Table Design Built on Top of Apache Hadoop and Apache Zookeeper Augments and Integrates With the Hadoop ecosystem © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 2
  3. 3. TODAY’S TALK Overview of the Accumulo Project Accumulo Design Table Design Strategy Live Demonstration © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 3
  4. 4. ACCUMULO TIMELINE NSA open sources Accumulo into incubation at Apache Google publishes Bigtablepaper 2005 2006 Google Publishes Papers: GFS (2003) Map Reduce (2004) 2007 2008 2009 NSA begins development of Accumulo © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Accumulo becomes a top-level Apache project 2010 2011 First sqrrl release planned 2012 2013 sqrrl is founded 4
  5. 5. ACCUMULO’S STRENGTHS Apache Accumulo excels at: - Security Cell-level security reduces the cost of application development in the presence of complex legal or policy restrictions on data use Mandatory access control keeps your data safe - Scalability Proven reliability and performance at the multi-petabyte scale High-performance parallel I/O library - Adaptability Flexible schema support to quickly ingest new data sources Sorted key/value paradigm supports a multitude of search and analysis applications Server-side programming framework “iterator trees” support bestin-class aggregation, filtering, and complex query semantics © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 5
  6. 6. BASIC SCHEMA Accumulo stores sorted key/value pairs (entries). An Accumulo key is a 5-tuple, consisting of: - Row: Controls Atomicity - Column Family: Controls Locality - Column Qualifier: Controls Uniqueness - Visibility Label: Controls Access - Timestamp: Controls Versioning Keys are sorted: -Hierarchically: Row first, then column family, and so on. - Lexicographically: Compare first byte, then second, and so on. Values are byte arrays. © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 6
  7. 7. KEY/VALUE EXAMPLES Row Col. Fam. Col. Qual. John Doe Visibility JD Timesta mp Value Jane Doe Friends 20121130 Jane Doe PhoneNumbe 555-1212 r John Doe Friends Jane Doe JD 20121201 John Doe Notes PCP PCP_JD 20120912 Patient suffers from an acute … John Doe Test Results Cholesterol JD|PCP_JD 20120912 183 John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass John Doe Test Results Mental Health PSYCH_JD 20120801 Crazy! John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100 … 20090115 © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 7
  8. 8. VISIBILITY SYNTAX & SEMANTICS © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 8
  9. 9. TABLET ORGANIZATION Well-Known Location (zookeeper) Collections of entries from tables Tables are partitioned into Tablets Metadata tablets hold info about other tablets, forming a 3-level hierarchy A Tablet is a unit of work for a Tablet Server Root Tablet -∞ to ∞ Metadata Tablet 1 Metadata Tablet 2 -∞ to “Encyclopedia:Ocelot” “Encyclopedia:Ocelot” to ∞ Table: Adam’s Table Data Tablet -∞ : thing Data Tablet thing : ∞ Table: Encyclopedia Data Tablet -∞ : Ocelot © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Data Tablet Ocelot : Yak Data Tablet Yak : ∞ Table: Foo Data Tablet -∞ to ∞ 9
  10. 10. ACCUMULO ARCHITECTURE Zookeeper Zookeeper Delegate Authority, Configs Zookeeper Delegate Authority, Configs Tablet Server Tablet Read/Write Assign/Balance Tablet Server Master Application Application Tablet Store/Replicate Tablet Server Application HDFS Scan Delete Tablet Garbage Collector © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 10
  11. 11. TABLET DATA FLOW Tablet Writes In-Memory Map Scan Iterator Tree Iterator Tree Minor Compaction Sorted, Indexed File Sorted, Indexed File Write Ahead Log (For Recovery) Iterator Major Tree Merging / Compaction © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Reads Sorted, Indexed File 11
  12. 12. ITERATOR FRAMEWORK Iterator Operations: - File Reads - Block Caching - Merging - Deletion - Isolation - Locality Groups - Range Selection - Column Selection - Cell-level Security - Versioning - Filtering - Aggregation - Partitioned Joins © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 12
  13. 13. CLIENT API new ZooKeeperInstance(...) Instance new MockInstance() getConnector(auth info...) Range IteratorOption Connector TableOperations Authorizations InstanceOperations createScanner(...) createBatchScanner(...) createBatchWriter(...) SecurityOperations Scanner BatchScanner BatchWriter iterator() addMutation(...) Map.Entry Key Mutation Value © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 13
  14. 14. TABLE DESIGN Table: Graphs Document-distributed indexing Multi-dimensional index Custom index © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Inverted Index Row: <UUID> <Term> Column Family: <Type> <Type> + <Field> Column Qualifier: <Field> <UUID> Value: No built-in secondary indices Sort Order  Index Basic design pattern: forward and inverted index tables Additional table design patterns Forward Index <Term> <Digest of Event> 14
  15. 15. DEMO TIME! © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 15
  16. 16. OPEN SOURCE PROJECT Apache Software Foundation project since October 2011 site: http://accumulo.apache.org jira: https://issues.apache.org/jira/browse/ACCUMULO lists: http://accumulo.apache.org/mailing_list.html © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 16
  17. 17. CURRENT CONTRIBUTORS © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 17
  18. 18. CONTACT Adam Fuchs CTO adam@sqrrl.com John Vines Director of Ecosystems john@sqrrl.com sqrrl data, Inc. www.sqrrl.com @sqrrl_inc info@sqrrl.com © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 18

×