Securely explore your data

APPACHE
ACCUMULO
Adam Fuchs and John Vines
sqrrldata, inc.
January 9, 2013
APACHE ACCUMULO

Sorted, Distributed Key/Value Store
Based on Google’s Big Table Design
Built on Top of Apache Hadoop and Apache Zookeeper
Augments and Integrates With the Hadoop ecosystem

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

2
TODAY’S TALK
Overview of the Accumulo Project
Accumulo Design
Table Design Strategy
Live Demonstration

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

3
ACCUMULO TIMELINE
NSA open
sources
Accumulo into
incubation at
Apache

Google
publishes
Bigtablepaper

2005

2006

Google Publishes
Papers:
GFS (2003)
Map Reduce (2004)

2007

2008

2009

NSA begins
development of
Accumulo

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Accumulo becomes
a top-level Apache
project

2010

2011

First sqrrl
release planned

2012

2013

sqrrl is founded

4
ACCUMULO’S STRENGTHS
Apache Accumulo excels at:
- Security
Cell-level security reduces the cost of application development in
the presence of complex legal or policy restrictions on data use
Mandatory access control keeps your data safe
- Scalability
Proven reliability and performance at the multi-petabyte scale
High-performance parallel I/O library
- Adaptability
Flexible schema support to quickly ingest new data sources
Sorted key/value paradigm supports a multitude of search and
analysis applications
Server-side programming framework “iterator trees” support bestin-class aggregation, filtering, and complex query semantics
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

5
BASIC SCHEMA
Accumulo stores sorted key/value pairs (entries).

An Accumulo key is a 5-tuple, consisting of:
- Row: Controls Atomicity
- Column Family: Controls Locality
- Column Qualifier: Controls Uniqueness
- Visibility Label: Controls Access
- Timestamp: Controls Versioning

Keys are sorted:
-Hierarchically: Row first, then column family, and so on.
- Lexicographically: Compare first byte, then second, and so on.

Values are byte arrays.

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

6
KEY/VALUE EXAMPLES
Row

Col. Fam.

Col. Qual.

John Doe

Visibility

JD

Timesta
mp

Value

Jane Doe

Friends

20121130

Jane Doe

PhoneNumbe
555-1212
r

John Doe

Friends

Jane Doe

JD

20121201

John Doe

Notes

PCP

PCP_JD

20120912

Patient suffers
from an acute …

John Doe

Test Results

Cholesterol

JD|PCP_JD

20120912

183

John Doe

Test Results

Mental Health

JD|PSYCH_JD

20120801

Pass

John Doe

Test Results

Mental Health

PSYCH_JD

20120801

Crazy!

John Doe

Test Results

X-Ray

JD|PHYS_JD

20120513

1010110110100
…

20090115

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

7
VISIBILITY SYNTAX & SEMANTICS

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

8
TABLET ORGANIZATION
Well-Known
Location
(zookeeper)

Collections of entries from tables
Tables are partitioned into Tablets
Metadata tablets hold info about other tablets,
forming a 3-level hierarchy
A Tablet is a unit of work for a Tablet Server

Root Tablet
-∞ to ∞

Metadata Tablet 1

Metadata Tablet 2

-∞ to
“Encyclopedia:Ocelot”

“Encyclopedia:Ocelot” to ∞

Table: Adam’s Table
Data Tablet
-∞ : thing

Data Tablet
thing : ∞

Table: Encyclopedia
Data Tablet
-∞ : Ocelot

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Data Tablet
Ocelot : Yak

Data Tablet
Yak : ∞

Table: Foo
Data Tablet
-∞ to ∞

9
ACCUMULO ARCHITECTURE
Zookeeper

Zookeeper

Delegate
Authority,
Configs

Zookeeper
Delegate
Authority,
Configs

Tablet Server

Tablet
Read/Write
Assign/Balance

Tablet Server

Master

Application

Application
Tablet
Store/Replicate

Tablet Server

Application

HDFS
Scan

Delete

Tablet

Garbage
Collector
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

10
TABLET DATA FLOW
Tablet
Writes

In-Memory
Map

Scan

Iterator
Tree

Iterator
Tree

Minor
Compaction

Sorted,
Indexed
File

Sorted,
Indexed
File
Write Ahead
Log
(For Recovery)

Iterator
Major Tree

Merging /
Compaction

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Reads

Sorted,
Indexed
File

11
ITERATOR FRAMEWORK
Iterator Operations:
- File Reads
- Block Caching
- Merging
- Deletion
- Isolation
- Locality Groups
- Range Selection
- Column Selection
- Cell-level Security
- Versioning
- Filtering
- Aggregation
- Partitioned Joins

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

12
CLIENT API
new ZooKeeperInstance(...)

Instance

new MockInstance()

getConnector(auth info...)

Range
IteratorOption

Connector

TableOperations

Authorizations

InstanceOperations

createScanner(...)

createBatchScanner(...) createBatchWriter(...)

SecurityOperations

Scanner

BatchScanner

BatchWriter

iterator()
addMutation(...)

Map.Entry
Key

Mutation

Value

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

13
TABLE DESIGN
Table:

Graphs
Document-distributed indexing
Multi-dimensional index
Custom index

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Inverted Index

Row:

<UUID>

<Term>

Column
Family:

<Type>

<Type> +
<Field>

Column
Qualifier:

<Field>

<UUID>

Value:

No built-in secondary indices
Sort Order  Index
Basic design pattern: forward and
inverted index tables
Additional table design patterns

Forward Index

<Term>

<Digest of
Event>

14
DEMO TIME!

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

15
OPEN SOURCE PROJECT
Apache Software Foundation project since October 2011
site: http://accumulo.apache.org
jira: https://issues.apache.org/jira/browse/ACCUMULO
lists: http://accumulo.apache.org/mailing_list.html

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

16
CURRENT CONTRIBUTORS

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

17
CONTACT

Adam Fuchs
CTO
adam@sqrrl.com

John Vines
Director of Ecosystems
john@sqrrl.com

sqrrl data, Inc.
www.sqrrl.com
@sqrrl_inc
info@sqrrl.com
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

18

Accumulo meetup 20130109

  • 1.
    Securely explore yourdata APPACHE ACCUMULO Adam Fuchs and John Vines sqrrldata, inc. January 9, 2013
  • 2.
    APACHE ACCUMULO Sorted, DistributedKey/Value Store Based on Google’s Big Table Design Built on Top of Apache Hadoop and Apache Zookeeper Augments and Integrates With the Hadoop ecosystem © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 2
  • 3.
    TODAY’S TALK Overview ofthe Accumulo Project Accumulo Design Table Design Strategy Live Demonstration © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 3
  • 4.
    ACCUMULO TIMELINE NSA open sources Accumulointo incubation at Apache Google publishes Bigtablepaper 2005 2006 Google Publishes Papers: GFS (2003) Map Reduce (2004) 2007 2008 2009 NSA begins development of Accumulo © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Accumulo becomes a top-level Apache project 2010 2011 First sqrrl release planned 2012 2013 sqrrl is founded 4
  • 5.
    ACCUMULO’S STRENGTHS Apache Accumuloexcels at: - Security Cell-level security reduces the cost of application development in the presence of complex legal or policy restrictions on data use Mandatory access control keeps your data safe - Scalability Proven reliability and performance at the multi-petabyte scale High-performance parallel I/O library - Adaptability Flexible schema support to quickly ingest new data sources Sorted key/value paradigm supports a multitude of search and analysis applications Server-side programming framework “iterator trees” support bestin-class aggregation, filtering, and complex query semantics © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 5
  • 6.
    BASIC SCHEMA Accumulo storessorted key/value pairs (entries). An Accumulo key is a 5-tuple, consisting of: - Row: Controls Atomicity - Column Family: Controls Locality - Column Qualifier: Controls Uniqueness - Visibility Label: Controls Access - Timestamp: Controls Versioning Keys are sorted: -Hierarchically: Row first, then column family, and so on. - Lexicographically: Compare first byte, then second, and so on. Values are byte arrays. © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 6
  • 7.
    KEY/VALUE EXAMPLES Row Col. Fam. Col.Qual. John Doe Visibility JD Timesta mp Value Jane Doe Friends 20121130 Jane Doe PhoneNumbe 555-1212 r John Doe Friends Jane Doe JD 20121201 John Doe Notes PCP PCP_JD 20120912 Patient suffers from an acute … John Doe Test Results Cholesterol JD|PCP_JD 20120912 183 John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass John Doe Test Results Mental Health PSYCH_JD 20120801 Crazy! John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100 … 20090115 © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 7
  • 8.
    VISIBILITY SYNTAX &SEMANTICS © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 8
  • 9.
    TABLET ORGANIZATION Well-Known Location (zookeeper) Collections ofentries from tables Tables are partitioned into Tablets Metadata tablets hold info about other tablets, forming a 3-level hierarchy A Tablet is a unit of work for a Tablet Server Root Tablet -∞ to ∞ Metadata Tablet 1 Metadata Tablet 2 -∞ to “Encyclopedia:Ocelot” “Encyclopedia:Ocelot” to ∞ Table: Adam’s Table Data Tablet -∞ : thing Data Tablet thing : ∞ Table: Encyclopedia Data Tablet -∞ : Ocelot © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Data Tablet Ocelot : Yak Data Tablet Yak : ∞ Table: Foo Data Tablet -∞ to ∞ 9
  • 10.
    ACCUMULO ARCHITECTURE Zookeeper Zookeeper Delegate Authority, Configs Zookeeper Delegate Authority, Configs Tablet Server Tablet Read/Write Assign/Balance TabletServer Master Application Application Tablet Store/Replicate Tablet Server Application HDFS Scan Delete Tablet Garbage Collector © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 10
  • 11.
    TABLET DATA FLOW Tablet Writes In-Memory Map Scan Iterator Tree Iterator Tree Minor Compaction Sorted, Indexed File Sorted, Indexed File WriteAhead Log (For Recovery) Iterator Major Tree Merging / Compaction © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Reads Sorted, Indexed File 11
  • 12.
    ITERATOR FRAMEWORK Iterator Operations: -File Reads - Block Caching - Merging - Deletion - Isolation - Locality Groups - Range Selection - Column Selection - Cell-level Security - Versioning - Filtering - Aggregation - Partitioned Joins © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 12
  • 13.
    CLIENT API new ZooKeeperInstance(...) Instance newMockInstance() getConnector(auth info...) Range IteratorOption Connector TableOperations Authorizations InstanceOperations createScanner(...) createBatchScanner(...) createBatchWriter(...) SecurityOperations Scanner BatchScanner BatchWriter iterator() addMutation(...) Map.Entry Key Mutation Value © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 13
  • 14.
    TABLE DESIGN Table: Graphs Document-distributed indexing Multi-dimensionalindex Custom index © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Inverted Index Row: <UUID> <Term> Column Family: <Type> <Type> + <Field> Column Qualifier: <Field> <UUID> Value: No built-in secondary indices Sort Order  Index Basic design pattern: forward and inverted index tables Additional table design patterns Forward Index <Term> <Digest of Event> 14
  • 15.
    DEMO TIME! © 2013Sqrrl | All Rights Reserved | Proprietary and Confidential 15
  • 16.
    OPEN SOURCE PROJECT ApacheSoftware Foundation project since October 2011 site: http://accumulo.apache.org jira: https://issues.apache.org/jira/browse/ACCUMULO lists: http://accumulo.apache.org/mailing_list.html © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 16
  • 17.
    CURRENT CONTRIBUTORS © 2013Sqrrl | All Rights Reserved | Proprietary and Confidential 17
  • 18.
    CONTACT Adam Fuchs CTO adam@sqrrl.com John Vines Directorof Ecosystems john@sqrrl.com sqrrl data, Inc. www.sqrrl.com @sqrrl_inc info@sqrrl.com © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 18

Editor's Notes

  • #8 Sort order across all keys.Columns can differ across rows.Value can have many different types.Entry can convey information with an empty value.Each key has a visibility – a given read will see a subset of the data.Visibility is part of key uniqueness – PSYCH_JD is withholding information from JD.
  • #10 Tablet Servers have 4 primary functions:Hosting RPCs (read, write, etc.)Managing resources (RAM, CPU, File I/O, etc.)Scheduling background tasks (compactions, caching, etc.)Handling key/value pairs (via Iterators)BecauseAccumulodoesn’t use hashing to assign key-value pairs to servers, we need:We need to store the mapping of TabletServer-to-Tablet . This mapping is stored in another Tablet in Accumulo called the Metadata Table. A client need only scan a portion of the Metdata table to find which TabletServers have the Tablets it wants. (binary search through the metadata hierarchy (NEED input, is this correct) The Metadata table’sTabletServer-to-Tablet assignments must also be stored somewhere. These are written to the first Tablet of the Metadata table, called the Root Tablet. However, you may notice that the Root Tablet itself is stored in Accumulo! Somewhat of a circular dependency. That’s what we use Zookeeper for: The location of the Root Tablet is always known to ZooKeeper.
  • #11 ApachAccumulo runs on top of Hadoop and ZooKeeperIt relies on HDFS for storage of data and Zookeeper for:- storage of config data (location of metadata Root Tablet) - locking of tabletsZookeeper uses Quorum consistency algorithms for High Availability to prevent Single Point of FailureZookeeper itself is actually a K/V datastore, but holds very little dataAlthough we’ll be running ZK on the same machines as Accumulo and Hadoop, it’s recommended that the Quorum of servers be separate.