• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Accumulo meetup 20130109
 

Accumulo meetup 20130109

on

  • 3,792 views

 

Statistics

Views

Total Views
3,792
Views on SlideShare
1,587
Embed Views
2,205

Actions

Likes
2
Downloads
90
Comments
0

7 Embeds 2,205

http://www.sqrrl.com 1941
http://sqrrl.com 120
http://localhost 119
http://bigdata-guide.blogspot.be 11
https://twitter.com 10
http://bigdata-guide.blogspot.ru 3
http://bigdata-guide.blogspot.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Sort order across all keys.Columns can differ across rows.Value can have many different types.Entry can convey information with an empty value.Each key has a visibility – a given read will see a subset of the data.Visibility is part of key uniqueness – PSYCH_JD is withholding information from JD.
  • Tablet Servers have 4 primary functions:Hosting RPCs (read, write, etc.)Managing resources (RAM, CPU, File I/O, etc.)Scheduling background tasks (compactions, caching, etc.)Handling key/value pairs (via Iterators)BecauseAccumulodoesn’t use hashing to assign key-value pairs to servers, we need:We need to store the mapping of TabletServer-to-Tablet . This mapping is stored in another Tablet in Accumulo called the Metadata Table. A client need only scan a portion of the Metdata table to find which TabletServers have the Tablets it wants. (binary search through the metadata hierarchy (NEED input, is this correct) The Metadata table’sTabletServer-to-Tablet assignments must also be stored somewhere. These are written to the first Tablet of the Metadata table, called the Root Tablet. However, you may notice that the Root Tablet itself is stored in Accumulo! Somewhat of a circular dependency. That’s what we use Zookeeper for: The location of the Root Tablet is always known to ZooKeeper.
  • ApachAccumulo runs on top of Hadoop and ZooKeeperIt relies on HDFS for storage of data and Zookeeper for:- storage of config data (location of metadata Root Tablet) - locking of tabletsZookeeper uses Quorum consistency algorithms for High Availability to prevent Single Point of FailureZookeeper itself is actually a K/V datastore, but holds very little dataAlthough we’ll be running ZK on the same machines as Accumulo and Hadoop, it’s recommended that the Quorum of servers be separate.

Accumulo meetup 20130109 Accumulo meetup 20130109 Presentation Transcript

  • Securely explore your data APPACHE ACCUMULO Adam Fuchs and John Vines sqrrldata, inc. January 9, 2013
  • APACHE ACCUMULO Sorted, Distributed Key/Value Store Based on Google’s Big Table Design Built on Top of Apache Hadoop and Apache Zookeeper Augments and Integrates With the Hadoop ecosystem © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 2
  • TODAY’S TALK Overview of the Accumulo Project Accumulo Design Table Design Strategy Live Demonstration © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 3
  • ACCUMULO TIMELINE NSA open sources Accumulo into incubation at Apache Google publishes Bigtablepaper 2005 2006 Google Publishes Papers: GFS (2003) Map Reduce (2004) 2007 2008 2009 NSA begins development of Accumulo © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Accumulo becomes a top-level Apache project 2010 2011 First sqrrl release planned 2012 2013 sqrrl is founded 4
  • ACCUMULO’S STRENGTHS Apache Accumulo excels at: - Security Cell-level security reduces the cost of application development in the presence of complex legal or policy restrictions on data use Mandatory access control keeps your data safe - Scalability Proven reliability and performance at the multi-petabyte scale High-performance parallel I/O library - Adaptability Flexible schema support to quickly ingest new data sources Sorted key/value paradigm supports a multitude of search and analysis applications Server-side programming framework “iterator trees” support bestin-class aggregation, filtering, and complex query semantics © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 5
  • BASIC SCHEMA Accumulo stores sorted key/value pairs (entries). An Accumulo key is a 5-tuple, consisting of: - Row: Controls Atomicity - Column Family: Controls Locality - Column Qualifier: Controls Uniqueness - Visibility Label: Controls Access - Timestamp: Controls Versioning Keys are sorted: -Hierarchically: Row first, then column family, and so on. - Lexicographically: Compare first byte, then second, and so on. Values are byte arrays. © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 6
  • KEY/VALUE EXAMPLES Row Col. Fam. Col. Qual. John Doe Visibility JD Timesta mp Value Jane Doe Friends 20121130 Jane Doe PhoneNumbe 555-1212 r John Doe Friends Jane Doe JD 20121201 John Doe Notes PCP PCP_JD 20120912 Patient suffers from an acute … John Doe Test Results Cholesterol JD|PCP_JD 20120912 183 John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass John Doe Test Results Mental Health PSYCH_JD 20120801 Crazy! John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100 … 20090115 © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 7
  • VISIBILITY SYNTAX & SEMANTICS © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 8
  • TABLET ORGANIZATION Well-Known Location (zookeeper) Collections of entries from tables Tables are partitioned into Tablets Metadata tablets hold info about other tablets, forming a 3-level hierarchy A Tablet is a unit of work for a Tablet Server Root Tablet -∞ to ∞ Metadata Tablet 1 Metadata Tablet 2 -∞ to “Encyclopedia:Ocelot” “Encyclopedia:Ocelot” to ∞ Table: Adam’s Table Data Tablet -∞ : thing Data Tablet thing : ∞ Table: Encyclopedia Data Tablet -∞ : Ocelot © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Data Tablet Ocelot : Yak Data Tablet Yak : ∞ Table: Foo Data Tablet -∞ to ∞ 9
  • ACCUMULO ARCHITECTURE Zookeeper Zookeeper Delegate Authority, Configs Zookeeper Delegate Authority, Configs Tablet Server Tablet Read/Write Assign/Balance Tablet Server Master Application Application Tablet Store/Replicate Tablet Server Application HDFS Scan Delete Tablet Garbage Collector © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 10
  • TABLET DATA FLOW Tablet Writes In-Memory Map Scan Iterator Tree Iterator Tree Minor Compaction Sorted, Indexed File Sorted, Indexed File Write Ahead Log (For Recovery) Iterator Major Tree Merging / Compaction © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Reads Sorted, Indexed File 11
  • ITERATOR FRAMEWORK Iterator Operations: - File Reads - Block Caching - Merging - Deletion - Isolation - Locality Groups - Range Selection - Column Selection - Cell-level Security - Versioning - Filtering - Aggregation - Partitioned Joins © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 12
  • CLIENT API new ZooKeeperInstance(...) Instance new MockInstance() getConnector(auth info...) Range IteratorOption Connector TableOperations Authorizations InstanceOperations createScanner(...) createBatchScanner(...) createBatchWriter(...) SecurityOperations Scanner BatchScanner BatchWriter iterator() addMutation(...) Map.Entry Key Mutation Value © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 13
  • TABLE DESIGN Table: Graphs Document-distributed indexing Multi-dimensional index Custom index © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Inverted Index Row: <UUID> <Term> Column Family: <Type> <Type> + <Field> Column Qualifier: <Field> <UUID> Value: No built-in secondary indices Sort Order  Index Basic design pattern: forward and inverted index tables Additional table design patterns Forward Index <Term> <Digest of Event> 14
  • DEMO TIME! © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 15
  • OPEN SOURCE PROJECT Apache Software Foundation project since October 2011 site: http://accumulo.apache.org jira: https://issues.apache.org/jira/browse/ACCUMULO lists: http://accumulo.apache.org/mailing_list.html © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 16
  • CURRENT CONTRIBUTORS © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 17
  • CONTACT Adam Fuchs CTO adam@sqrrl.com John Vines Director of Ecosystems john@sqrrl.com sqrrl data, Inc. www.sqrrl.com @sqrrl_inc info@sqrrl.com © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 18