Introduction to Apache Accumulo

Introduction

• Aaron Cordova

• Founded Accumulo project with several others

• Led development through release 1.0

• aaron@tetraconcepts.com

Agenda

• Introduction

• Data Model

• API

• Architecture - scaling, recovery

• Security

• Data-lifecycle

• Applications

History

• Began writing in summer of 2008, after comparing design goals with BigTable
paper and existing implementations Hbase, Hypertable

• Released internal version 1.0 summer of 2009.

• September 2011 accepted as an Apache Incubator project. Doug Cutting,
founder of Hadoop, was the Champion Sponsor

• Feb 2012 1.4 Released

• March 2012 graduates to a top level Apache project

• V 1.5 due out soon

Introduction

• Accumulo is a sparse, distributed, sorted, multi-dimensional map

• Modeled after Google’s BigTable design

• Scales to trillions of records and 100s of Terabytes

• Features automatic load balancing, high-availability, dynamic control over
data layout

Data Model

Key
Column Value
row ID Timestamp
Family Qualiﬁer Visibility

Data Model (Logical 2D table structure)

attribute: attribute: purchases:
returns:hat
age phone sneakers

bill 49 555-1212 $100 -

george 38 - $80 $30

Physical layout (sorted keys)

row col fam col qual col vis time value

bill attribute age public Jun 2010 49

bill attribute phone private Jun 2010 555-1212

bill purchases sneakers public Apr 2010 $100

george attribute age private Oct 2009 38

george purchases sneakers public Nov 2009 $80

george returns hat public Dec 2009 $30

Accumulo API

• To use Accumulo, must write a an application using the Accumulo Java client
library. There is no SQL (hence NoSQL)

• Data is packaged into Mutation objects which are added to a BatchWriter
which sends them to TabletServers

• Clients can scan a set of key value pairs by specifying optional start and end
keys (Range) and obtaining a Scanner. Iterating over the scanner returns
sorted key value pairs for that range. Each scan takes milliseconds to start.

• Can scan over a subset of the columns

• Can send a set of Ranges to a BatchScanner, get matching key value pairs,
unsorted

Insert







Scan - Full key lookup
bill attribute phone private Jun 2010








Scan - Single row

bill








Scan - Multiple Rows

bill - will








Scan - Multiple Rows, Selected Columns

bill - will, fetch purchases








Architecture - Scaling and Recovery

Performance Accumulo
BigTable circa 2006
Cassandra
• Accumulo ‘scales’ because
aggregate read and write 10000
performance increase as more
machines are added, and

Thousands of writes per second
because individual reads/write
performance remains very good 1000
even with trillions of key-value
pairs already in the system

• Sources: http://www.slideshare.net/
acordova00/accumulo-on-ec2 100
http://techblog.netﬂix.com/2011/11/
benchmarking-cassandra-scalability-on.html

http://static.googleusercontent.com/
external_content/untrusted_dlcp/
research.google.com/en/us/archive/bigtable- 10
osdi06.pdf 1 16 64 256 1024
Number of machines

Accumulo Prerequisites

• One to hundreds of computers with local hard drives, connected via ethernet

• Password-less SSH access

• Local directory for write-ahead logs

• Hadoop and ZooKeeper installed, conﬁgured, and running

Architecture

Accumulo

ZooKeeper
HDFS MapReduce

Architecture: HDFS

HDFS
NameNode

DataNodes

File

Architecture: HDFS

HDFS
NameNode

DataNodes

Block 1 Block 2

Architecture: HDFS

HDFS
NameNode

DataNodes

Architecture: Tables

Accumulo
Master

Tablet Servers

Table


Accumulo
Master

Tablet Servers

P1 P2 P3


Accumulo
Master

Tablet Servers

Architecture: Writes

P1 Mem
Table

File1

HDFS


P1 Mem Client
Table

Write-ahead Log
File1

HDFS


P1 Mem
Table

Write-ahead Log
File1 File 2

HDFS


P1 Mem
Table

X Write-ahead Log
File1 File 2

HDFS

Architecture: Splits








Architecture: Splits

Accumulo
Master

Tablet Servers

Sorted keys - dynamic partitioning
• Because keys are sorted, tables can be partitioned based on the data

• partitions (tablets) are uniform in size, regardless of data distribution,
(as long as single rows are smaller than the partition size)

• not based on the number of servers

• Can add /remove / fail servers at any time, the system is always automatically
balanced

Partitioning Contrast

• Some relational databases allow partitioning. May require users to choose a
ﬁeld or two on which to partition. Hopefully that ﬁeld is uniformly distributed

• Hash-based systems (default Cassandra, CouchDB, Riak, Voldemort) avoid
this problem, but at the cost of range scans. Some support range scans via
other means.

• Many systems couple partition storage with partition service, requiring data
movement to rebalance partition service (MongoDB, Cassandra, etc)

Architecture: Reads

P1 Mem Client
Table

Merge

File1 File 2

Architecture: Recovery

Accumulo
Master

Tablet Servers

DataNodes
NameNode


Accumulo
Master

Tablet Servers

Master reassigns
DataNodes
NameNode


Accumulo
Master

Tablet Servers

Replay Write-ahead Log
DataNodes
NameNode

Metadata Hierarchy

metadata table
root

md1 md2 md3

user tables

user1 user2 index1 index2

Architecture: Lookup

Accumulo
Master

Tablet Servers

ZooKeeper

Client


Accumulo
Master

Tablet Servers

ZooKeeper
Client knows zookeeper,
ﬁnds root tablet

Client


Accumulo
Master

Tablet Servers

ZooKeeper
Scan root tablet
ﬁnd metadata tablet
that describes the
user table we want
Client


Accumulo
Master

Tablet Servers

ZooKeeper
Read location info
of tablets of user table
and cache it
Client


Accumulo
Master

Tablet Servers

ZooKeeper
Read directly from server
holding the tablets we want

Client


Accumulo
Master

Tablet Servers

ZooKeeper
Find other tablets
via cache lookups

Client

Security

• Design and Guarantees

• Data Labeling

• Authentication

• User Conﬁguration

Data Security

• Accumulo will only return cells whose visibility labels are
satisﬁed by user credentials presented at Scan time

• Two necessary conditions

• Correctly labeling data on ingest

• Presenting right user credentials

Security Labels

Extension of BigTable data model

column
row ID timestamp value
family qualiﬁer visibility

Column Visibility





Security Label Syntax

• A & B - both A and B required

• A | B - must have either A or B

• (A | B) & C - must have C and A or B

• A | (B & C) - must have A or both B and C

• A & (B | (C & D))

Security Label Example

• Drive needs:

• license&over15

• Join military:

• (over17|(over16&parentConsent)) & (greencard|
USCitizen)

• Access to Classiﬁed data

• TS&SI&(USA|GBR|NZL|CAN|AUS)

Security Model

Security Perimeter

Accumulo

auths data
auths
Trusted Client Auth Service
verify

ID, password, cert data
User

Trusted Client Responsibility

• Ensure that credentials belong to the user

• Ensure that the user is authenticated

Application Authorization

• Trusted Client applications must have max authorizations set
before they can be passed

• The Trusted Client limits the set of authorizations by application

Application Authorization Example

• Data may be labeled with any combination of the following:

{ personal, research, ﬁnance, diet, cancer }

• We wish to limit certain applications to a subset

Application Authorizations

Cancer Research: cancer diagnoses, age

Diabetes Research: diet info, age

Accounting System: balance, name, phone

Personal Records Management: all

Security Model

Security Perimeter

Accumulo

Cancer Auth Service
Research App

ID, password, cert
Researcher

Security Model

Security Perimeter

Accumulo

Cancer Auth Service
Research App verify

ID, password, cert
Researcher

Security Model

Security Perimeter

Accumulo

research, cancer, diabetes
Cancer Auth Service
Research App verify

ID, password, cert
Researcher

Security Model

Security Perimeter

Accumulo
research,
cancer
Cancer Auth Service
Research App

ID, password, cert
Researcher

Security Model

Security Perimeter

Accumulo
research,
data
cancer
Cancer Auth Service
Research App

ID, password, cert
Researcher

Security Model

Security Perimeter

Accumulo
research,
data
cancer
Cancer Auth Service
Research App

ID, password, cert data
Researcher

Versions

What can we do with multiple versions of the same data?

rowID family qualiﬁer timestamp value
row1 fam1 qual1 1005 2

Iterators

• Mechanism for adding online functionality to tables

• Aggregation (called Combiners)

• Age-Oﬀ

• Filtering (including by security label)

Versioning Iterators


Filtering Iterators

• Age Oﬀ

• RegEx

• Arbitrary ﬁltering

Age Off

• Can specify a particular date - e.g. delete everything older than July 1, 2007

• Can specify a time period - e.g. delete everything older than 6 months

Age-Off

Current Time: 1103
row1 fam1 qual1 1005 2 K/V pair is
more than
row1 fam1 qual1 1004 5 100 sec. old

Age-Off

Current Time: 1104
rowID family qualiﬁer timestamp value K/V pair is
more than
row1 fam1 qual1 1005 2 100 sec. old

Age-Off

Current Time: 1105 K/V pair is
more than
rowID family qualiﬁer timestamp value 100 sec. old

Manual Deletes

• Can insert ‘deletes’. They are inserted like other key-value pairs, any keys
with an older timestamp is suppressed from reads

• Compactions write non-deleted data to new ﬁles

• Old ﬁles are then removed from HDFS

• To ensure data is deleted from disk,

• write deletes (they are now absent from query results)

• compact (can compact a particular range of a table if it’s large)

Garbage Collection

• Garbage collector compares the files in HDFS with the set of files currently
active

• When files are no longer on the active list, GC waits for a while, then deletes
from HDFS

Applications

• Fast lookups / scan on extremely large tables with ﬂexible schemas, varying
security

• Large index across heterogeneous data sets

• Continuous Summary Analytics via Iterators

• Secure Storage of key value pairs for MapReduce jobs

Where does your data come from?

• BigTable was designed to store data for web applications serving millions of
users. Web application creates all the data. Many NoSQL databases are
designed solely for this purpose. Accumulo can certainly support that.

• However, many organizations have lots of data from various sources. Diﬀerent
schema, diﬀerent security levels. Bringing them together for analysis is very
valuable. Accumulo can support this too.

Indexing and queries

• BigTable data model supports building a wide variety of indexes

• Simple strings, numbers, geo points, ip addresses, etc

• Each has to be coupled with query code

• New applications should examine their data access use cases, indexes and
query code to accomplish those can then be written

• Best applications are constructed so each user request is a single scan, or a
small number of scans

Compared to MapReduce

• Hadoop’s HDFS stores simple files. Usually unsorted.

• MapReduce is designed to process all or most of the files at once.

• Accumulo maintains a set of sorted files in HDFS

• Accumulo scans are designed to access a small portion of the data quickly.

• Fairly complementary

Tough use case

• Ran MapReduce on some input data set to create a large result set.

• Now have a few new records, want to update the result set

• MapReduce has to process all the data again, have to wait

• Accumulo allows users to perform a limited set of operations to update a
result set incrementally, using Iterators

• Result sets are always up to date, immediately after insert

Combiners


bill perf June_calls P June 1 9

bill perf June_calls P June 4 3

bill perf July_calls P July 3 4

bill perf July_calls P July 11 7

bill perf August_calls P Aug 12 5

bill perf August_calls P Aug 29 2

Combiners


bill perf June_calls P - 12

bill perf July_calls P - 11

bill perf August_calls P - 7

Combiners

• Almost equivalent to Reduce of MapReduce except:

• Cannot assume we have seen all the values for a
particular key

• Exactly equivalent to a Combiner function

Combiners

• Useful Combiners:

• Event count (StringSummation or LongSummation aggregator)

• Event hour occurrence histogram (NumArraySummation
aggregator)

• Event duration histogram (NumArraySummation aggregator)

Conceptual Graph Representation

b d
a

c
e

g
f

Edge table

a edge f 1.0
c edge b 1.0
c edge d 1.0
d edge b 1.0
d edge e 1.0
e edge d 1.0
f edge g 1.0
g edge e 1.0
g edge f 1.0

Edge Weights

• Summing Combiners are typically used to eﬃciently and
incrementally update edge weights

• See SummingCombiner

Edge table

Incoming: a, edge, f, 1.0
a edge f 1.0
c edge b 1.0
c edge d 1.0
d edge b 1.0
d edge e 1.0
e edge d 1.0
f edge g 1.0

Edge table

a edge f 2.0
c edge b 1.0
c edge d 1.0
d edge b 1.0
d edge e 1.0
e edge d 1.0
f edge g 1.0

Edge table

Incoming: c, edge, b, 6.0
a edge f 2.0
c edge b 1.0
c edge d 1.0
d edge b 1.0
d edge e 1.0
e edge d 1.0
f edge g 1.0

Edge table

a edge f 2.0
c edge b 7.0
c edge d 1.0
d edge b 1.0
d edge e 1.0
e edge d 1.0
f edge g 1.0

Edge table

Incoming: a, edge, f, 2.3
a edge f 2.0
c edge b 7.0
c edge d 1.0
d edge b 1.0
d edge e 1.0
e edge d 1.0
f edge g 1.0

Edge table

a edge f 4.3
c edge b 7.0
c edge d 1.0
d edge b 1.0
d edge e 1.0
e edge d 1.0
f edge g 1.0

Edge Table Applications

• Graph Analytics - traversal, neighbors, connected components

• Neighborhood = feature vector. Vector-based machine learning techniques.
Nearest neighbor search, clustering, classiﬁcation

• Automated dossiers, fact accumulation - ‘tell me everything we know about
X’ in a single scan

• Find entities based on features - ‘show me everyone who has feature value >
x’ or ‘with < 5 neighbors of type k’

RDF Triples

DC is_capital_of USA 1.0
Don vacations_in Arctic 7.0
Don is_employed_by MI6 1.0
Sean has_status “007” 1.0
Sean starred_with Ursula 1.0
Sean starred_with Anya 0.7
Sean starred_with Teresa 0.3

RDF Triples - RYA

• See RYA project : http://www.usna.edu/Users/cs/adina/research/
Rya_CloudI2012.pdf

Additional Training

• Talked about the basics today

• 3 days of developer training with hands on examples covering

• installation, configuration, read / write API, MapReduce, security, table
configuration, indexing specific types, querying index tables, combiners,
custom iterators, table constraints, storing relational data, joins, high
performance considerations, document-partitioned indexing (text search),
machine learning, object persistence

• 2 days of administrator training covering

• hardware selection, process assignment, troubleshooting, maintenance,
replication and high availability, cluster modification, failure handling

Next Scheduled Training Sessions

• March 5-7 Columbia MD

• April 9-11 Columbia MD

• http://www.tetraconcepts.com/training

• aaron@tetraconcepts.com

• brian@tetraconcepts.com

Introduction to Apache Accumulo

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Introduction to Apache Accumulo

Similar to Introduction to Apache Accumulo (20)

Recently uploaded

Recently uploaded (20)

Introduction to Apache Accumulo