SlideShare a Scribd company logo
11
Apache Accumulo Overview
Bill Havanki
Solutions Architect, Cloudera Government Solutions
2 ©2014 Cloudera, Inc. All rights reserved.
2
•Quick History
•Storage Model
•Loading and Querying
•Daemons
•Getting Started, a.k.a., the Pitch
Agenda
3
A Quick History
3
4 ©2014 Cloudera, Inc. All rights reserved.
Google BigTable
Compressed, high-performance, scalable,
distributed sorted map
4
5 ©2014 Cloudera, Inc. All rights reserved.
Google BigTable
• Began development in 2004
• Built on Google File System
• Non-relational
• Byte-oriented and schemaless
• Stores data in the petabyte range
• Research paper published in 2006
5
6 ©2014 Cloudera, Inc. All rights reserved.
Child(ren) of BigTable
• Apache HBase (begun 2006, top-level 2010)
• Apache Cassandra (begun 2008-ish, top-level 2010)
• Apache Accumulo ...
6
7 ©2014 Cloudera, Inc. All rights reserved.
From Cloudbase to Accumulo
• Started in 2008 as National Security Agency project
• Submitted to Apache Incubator in 2011 (and renamed)
• Top-level project in 2012
7
8
Storage Model
8
9 ©2014 Cloudera, Inc. All rights reserved.
Key / Value Store
Accumulo stores tables of key / value pairs
9
10 ©2014 Cloudera, Inc. All rights reserved.
Key / Value Store
A row is a sorted sequence of key / value pairs
Each pair is a cell
10
11 ©2014 Cloudera, Inc. All rights reserved.
The Key
11
row
column
timestamp
family qualifier visibility
12 ©2014 Cloudera, Inc. All rights reserved.
An example key
12
bhavanki
column
1401041295
personal middle PII
13 ©2014 Cloudera, Inc. All rights reserved.
Another example key
13
brees
column
1401041296
employment salary FIN
14 ©2014 Cloudera, Inc. All rights reserved.
It’s all bytes
All key and value data are stored as bytes
except timestamp is a long
There are no built-in data types
but lexicoders help with common types
Key components are usually UTF-8 strings
14
15 ©2014 Cloudera, Inc. All rights reserved.
Some rows for you
15
row cf cq cv ts value
bhavanki job employer 2013-09-01 Cloudera
bhavanki personal beer 2013-09-15 Omission
bhavanki personal house NOMUGGL 2014-01-25 Ravenclaw
brees job employer 2013-10-01 White Cliffs
brees personal house NOMUGGL 2014-01-01 Hufflepuff
16 ©2014 Cloudera, Inc. All rights reserved.
Visibility Labels
Boolean expression
Specialist | (Management & SpecTraining)
Authorizations are provided in each scan
16
17 ©2014 Cloudera, Inc. All rights reserved.
Locality Groups
You can identify sets of one or more column families as
locality groups
Data in a locality group is stored together for improved
read performance
17
18 ©2014 Cloudera, Inc. All rights reserved.
Tablets
A table is comprised of one or more tablets
18
employeesemployees
employees;Semployees;Semployees;Hemployees;H employees;~employees;~
19 ©2014 Cloudera, Inc. All rights reserved.
Tablets
Tablets maps to data files in HDFS
19
employees;Semployees;Semployees;Hemployees;H employees;~employees;~
rfile 2rfile 2rfile 1rfile 1 rfile 3rfile 3
20 ©2014 Cloudera, Inc. All rights reserved.
Tablets
Data also kept in write-ahead logs and memtable
20
employees;Hemployees;H
rfile 1rfile 1
walogswalogs
memtablememtable
21
Loading and Querying
21
22 ©2014 Cloudera, Inc. All rights reserved.
Java Client API
22
23 ©2014 Cloudera, Inc. All rights reserved.
Java Client API
Read using scanners
Scanner s = conn.createScanner(“employees”, new
Authorizations());
s.setRange(“alice”, “eve”);
s.setColumnFamily(“personal”);
for (Entry<Key, Value> e : s)
employeeIds.add(e.getKey().getRow());
23
24 ©2014 Cloudera, Inc. All rights reserved.
Java Client API
Read access via iterator pattern
• server-side system iterators handle timestamps,
authorization checks, and lots more
• iterators almost always wrap other iterators, forming a
chain
• you can define your own, client-side or server-side
24
25 ©2014 Cloudera, Inc. All rights reserved.
Java Client API
Scanners fetch sorted rows from one range
Batch scanners fetch unsorted rows from multiple
ranges in parallel
Isolated scanners ensure that you do not see a row
mid-change
25
26 ©2014 Cloudera, Inc. All rights reserved.
MapReduce
AccumuloInputFormat
AccumuloOutputFormat
26
27 ©2014 Cloudera, Inc. All rights reserved.
MapReduce
AccumuloRowInputFormat
AccumuloRowOutputFormat
27
28 ©2014 Cloudera, Inc. All rights reserved.
Shell
Command-line / manual access to Accumulo data
• scan, insert, delete
• iterator management
• table management (creation, deletion, cloning)
• user and authorization management
• table splitting and merging
• ... more
28
29 ©2014 Cloudera, Inc. All rights reserved.
Bulk Import
Got lots of data to import quickly?
• Use MR job to format data using
AccumuloFileOutputFormat
• Import files using shell
Trade off latency / availablity for throughput
29
30
Daemons
30
31 ©2014 Cloudera, Inc. All rights reserved.
Tablet Server
Serves tablets (table data)
• writes data to walog, memtable; deals with compaction
• serves data for reads from files, memtable
• handles recovery from walogs in case of server failure
Most client calls go to tablet servers
31
32 ©2014 Cloudera, Inc. All rights reserved.
Master
• assigns tablets to tablet servers
• detects tablet server failures and reassigns tablets
• balances tablet assignments over time
• coordinates table operations
Multiple supported for failover, only one active
32
33 ©2014 Cloudera, Inc. All rights reserved.
Everybody Else in Accumulo
Garbage Collector (GC) - identifies and deletes files in
HDFS that are no longer needed
Tracer - listens for and stores distributed trace messages
using a special table
33
34 ©2014 Cloudera, Inc. All rights reserved.
Everybody Else in Accumulo
• Monitor - collects and serves status information
• server status
• log inspection
• performance data
• table inspection
34
35 ©2014 Cloudera, Inc. All rights reserved.
Everybody Else outside Accumulo
• HDFS (as part of Apache Hadoop)
• stores tablet files
• stores write-ahead logs (1.5+)
• MapReduce (Hadoop)
• bulk import
• batch processing
• Apache ZooKeeper
35
36
Getting Started
a.k.a. the Pitch
36
37 ©2014 Cloudera, Inc. All rights reserved.
Easy as 1-2-3?
1.Install Hadoop (HDFS and MapReduce)
2.Install ZooKeeper
3.Install Accumulo!
37
38 ©2014 Cloudera, Inc. All rights reserved.
Making Steps 1 and 2 Easier
Use a complete, pre-packaged Hadoop distribution
... like CDH!
a leading commercial distribution centered on Apache
Hadoop
•many ecosystem components
•configured / updated to work together
38
39 ©2014 Cloudera, Inc. All rights reserved.
Making Steps 1 and 2 Easier
Cloudera Manager
•deployment
•configuration
•operation
•security
39
40 ©2014 Cloudera, Inc. All rights reserved.
Making Step 3 Easier
Standard Apache Accumulo installation is via tarball
• no longer shipping RPM / DEB / ...
Using CDH/CM you can use:
• a tarball, RPM or DEB with Accumulo packaged for CDH
• a parcel (like RPM / ZIP) for easier upgrades
• 1.4.4 and 1.4.5 available now
• 1.6.0 soon
40
41 ©2014 Cloudera, Inc. All rights reserved.
Where to Go for More
• http://accumulo.apache.org/
• http://www.cloudera.com/content/cloudera/en/products-and-service
• http://www.cloudera.com/content/cloudera/en/products-and-service
• http://www.cloudera.com/content/cloudera/en/products-and-
services/cdh/accumulo.html
41
42 ©2014 Cloudera, Inc. All rights reserved.
Accumulo Summit
Join us on June 12
42
43 ©2014 Cloudera, Inc. All rights reserved.
Quick Thanks
• My slide reviewers
• Sean Busbey
• Mike Drob
• Accumulo community
• You all for listening
43
44 ©2014 Cloudera, Inc. All rights reserved.
Thank you!
Bill Havanki
bhavanki@clouderagovt.com
44

More Related Content

What's hot

One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
Cloudera, Inc.
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
 
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
DataWorks Summit
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
Rakesh Saha
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
YARN and the Docker container runtime
YARN and the Docker container runtimeYARN and the Docker container runtime
YARN and the Docker container runtime
DataWorks Summit/Hadoop Summit
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
Cloudera, Inc.
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
DataWorks Summit/Hadoop Summit
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to Production
Cloudera, Inc.
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
Chris Westin
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
 
Apache ignite v1.3
Apache ignite v1.3Apache ignite v1.3
Apache ignite v1.3
Klearchos Klearchou
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
 

What's hot (20)

One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
 
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
YARN and the Docker container runtime
YARN and the Docker container runtimeYARN and the Docker container runtime
YARN and the Docker container runtime
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to Production
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Apache ignite v1.3
Apache ignite v1.3Apache ignite v1.3
Apache ignite v1.3
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 

Viewers also liked

Stupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache AccumuloStupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache AccumuloCloudera, Inc.
 
Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411
Sqrrl
 
Introduction to Continuous Integration
Introduction to Continuous IntegrationIntroduction to Continuous Integration
Introduction to Continuous Integration
Bill Havanki
 
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit
 
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit
 
Accumulo design
Accumulo designAccumulo design
Accumulo design
scsorensen
 
Accumulo meetup 20130109
Accumulo meetup 20130109Accumulo meetup 20130109
Accumulo meetup 20130109Sqrrl
 
Apache Accumulo and the Data Lake
Apache Accumulo and the Data LakeApache Accumulo and the Data Lake
Apache Accumulo and the Data Lake
Aaron Cordova
 
Accumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the EnterpriseAccumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit
 
Large Scale Accumulo Clusters
Large Scale Accumulo ClustersLarge Scale Accumulo Clusters
Large Scale Accumulo Clusters
Aaron Cordova
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit
 
Software Team Hierarchy of Needs
Software Team Hierarchy of NeedsSoftware Team Hierarchy of Needs
Software Team Hierarchy of Needs
Bill Havanki
 
Accumulo: A Quick Introduction
Accumulo: A Quick IntroductionAccumulo: A Quick Introduction
Accumulo: A Quick Introduction
James Salter
 
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit
 
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit
 
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big DataOct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Yahoo Developer Network
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
CCRinc
 

Viewers also liked (20)

Stupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache AccumuloStupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache Accumulo
 
Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411
 
Introduction to Continuous Integration
Introduction to Continuous IntegrationIntroduction to Continuous Integration
Introduction to Continuous Integration
 
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
 
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
 
Accumulo design
Accumulo designAccumulo design
Accumulo design
 
Accumulo meetup 20130109
Accumulo meetup 20130109Accumulo meetup 20130109
Accumulo meetup 20130109
 
Apache Accumulo and the Data Lake
Apache Accumulo and the Data LakeApache Accumulo and the Data Lake
Apache Accumulo and the Data Lake
 
Accumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the EnterpriseAccumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the Enterprise
 
Large Scale Accumulo Clusters
Large Scale Accumulo ClustersLarge Scale Accumulo Clusters
Large Scale Accumulo Clusters
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
 
Software Team Hierarchy of Needs
Software Team Hierarchy of NeedsSoftware Team Hierarchy of Needs
Software Team Hierarchy of Needs
 
Accumulo: A Quick Introduction
Accumulo: A Quick IntroductionAccumulo: A Quick Introduction
Accumulo: A Quick Introduction
 
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
 
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
 
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big DataOct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 

Similar to Apache Accumulo Overview

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
markgrover
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
hadooparchbook
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Hadoop / Spark Conference Japan
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
hadooparchbook
 
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
ClouderaUserGroups
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
Joey Echeverria
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
hadooparchbook
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
Gwen (Chen) Shapira
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
Jianwei Li
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera, Inc.
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Andrei Savu
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
Cloudera, Inc.
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
GoDataDriven
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann
 

Similar to Apache Accumulo Overview (20)

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
 
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
 

Recently uploaded

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 

Recently uploaded (20)

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 

Apache Accumulo Overview

  • 1. 11 Apache Accumulo Overview Bill Havanki Solutions Architect, Cloudera Government Solutions
  • 2. 2 ©2014 Cloudera, Inc. All rights reserved. 2 •Quick History •Storage Model •Loading and Querying •Daemons •Getting Started, a.k.a., the Pitch Agenda
  • 4. 4 ©2014 Cloudera, Inc. All rights reserved. Google BigTable Compressed, high-performance, scalable, distributed sorted map 4
  • 5. 5 ©2014 Cloudera, Inc. All rights reserved. Google BigTable • Began development in 2004 • Built on Google File System • Non-relational • Byte-oriented and schemaless • Stores data in the petabyte range • Research paper published in 2006 5
  • 6. 6 ©2014 Cloudera, Inc. All rights reserved. Child(ren) of BigTable • Apache HBase (begun 2006, top-level 2010) • Apache Cassandra (begun 2008-ish, top-level 2010) • Apache Accumulo ... 6
  • 7. 7 ©2014 Cloudera, Inc. All rights reserved. From Cloudbase to Accumulo • Started in 2008 as National Security Agency project • Submitted to Apache Incubator in 2011 (and renamed) • Top-level project in 2012 7
  • 9. 9 ©2014 Cloudera, Inc. All rights reserved. Key / Value Store Accumulo stores tables of key / value pairs 9
  • 10. 10 ©2014 Cloudera, Inc. All rights reserved. Key / Value Store A row is a sorted sequence of key / value pairs Each pair is a cell 10
  • 11. 11 ©2014 Cloudera, Inc. All rights reserved. The Key 11 row column timestamp family qualifier visibility
  • 12. 12 ©2014 Cloudera, Inc. All rights reserved. An example key 12 bhavanki column 1401041295 personal middle PII
  • 13. 13 ©2014 Cloudera, Inc. All rights reserved. Another example key 13 brees column 1401041296 employment salary FIN
  • 14. 14 ©2014 Cloudera, Inc. All rights reserved. It’s all bytes All key and value data are stored as bytes except timestamp is a long There are no built-in data types but lexicoders help with common types Key components are usually UTF-8 strings 14
  • 15. 15 ©2014 Cloudera, Inc. All rights reserved. Some rows for you 15 row cf cq cv ts value bhavanki job employer 2013-09-01 Cloudera bhavanki personal beer 2013-09-15 Omission bhavanki personal house NOMUGGL 2014-01-25 Ravenclaw brees job employer 2013-10-01 White Cliffs brees personal house NOMUGGL 2014-01-01 Hufflepuff
  • 16. 16 ©2014 Cloudera, Inc. All rights reserved. Visibility Labels Boolean expression Specialist | (Management & SpecTraining) Authorizations are provided in each scan 16
  • 17. 17 ©2014 Cloudera, Inc. All rights reserved. Locality Groups You can identify sets of one or more column families as locality groups Data in a locality group is stored together for improved read performance 17
  • 18. 18 ©2014 Cloudera, Inc. All rights reserved. Tablets A table is comprised of one or more tablets 18 employeesemployees employees;Semployees;Semployees;Hemployees;H employees;~employees;~
  • 19. 19 ©2014 Cloudera, Inc. All rights reserved. Tablets Tablets maps to data files in HDFS 19 employees;Semployees;Semployees;Hemployees;H employees;~employees;~ rfile 2rfile 2rfile 1rfile 1 rfile 3rfile 3
  • 20. 20 ©2014 Cloudera, Inc. All rights reserved. Tablets Data also kept in write-ahead logs and memtable 20 employees;Hemployees;H rfile 1rfile 1 walogswalogs memtablememtable
  • 22. 22 ©2014 Cloudera, Inc. All rights reserved. Java Client API 22
  • 23. 23 ©2014 Cloudera, Inc. All rights reserved. Java Client API Read using scanners Scanner s = conn.createScanner(“employees”, new Authorizations()); s.setRange(“alice”, “eve”); s.setColumnFamily(“personal”); for (Entry<Key, Value> e : s) employeeIds.add(e.getKey().getRow()); 23
  • 24. 24 ©2014 Cloudera, Inc. All rights reserved. Java Client API Read access via iterator pattern • server-side system iterators handle timestamps, authorization checks, and lots more • iterators almost always wrap other iterators, forming a chain • you can define your own, client-side or server-side 24
  • 25. 25 ©2014 Cloudera, Inc. All rights reserved. Java Client API Scanners fetch sorted rows from one range Batch scanners fetch unsorted rows from multiple ranges in parallel Isolated scanners ensure that you do not see a row mid-change 25
  • 26. 26 ©2014 Cloudera, Inc. All rights reserved. MapReduce AccumuloInputFormat AccumuloOutputFormat 26
  • 27. 27 ©2014 Cloudera, Inc. All rights reserved. MapReduce AccumuloRowInputFormat AccumuloRowOutputFormat 27
  • 28. 28 ©2014 Cloudera, Inc. All rights reserved. Shell Command-line / manual access to Accumulo data • scan, insert, delete • iterator management • table management (creation, deletion, cloning) • user and authorization management • table splitting and merging • ... more 28
  • 29. 29 ©2014 Cloudera, Inc. All rights reserved. Bulk Import Got lots of data to import quickly? • Use MR job to format data using AccumuloFileOutputFormat • Import files using shell Trade off latency / availablity for throughput 29
  • 31. 31 ©2014 Cloudera, Inc. All rights reserved. Tablet Server Serves tablets (table data) • writes data to walog, memtable; deals with compaction • serves data for reads from files, memtable • handles recovery from walogs in case of server failure Most client calls go to tablet servers 31
  • 32. 32 ©2014 Cloudera, Inc. All rights reserved. Master • assigns tablets to tablet servers • detects tablet server failures and reassigns tablets • balances tablet assignments over time • coordinates table operations Multiple supported for failover, only one active 32
  • 33. 33 ©2014 Cloudera, Inc. All rights reserved. Everybody Else in Accumulo Garbage Collector (GC) - identifies and deletes files in HDFS that are no longer needed Tracer - listens for and stores distributed trace messages using a special table 33
  • 34. 34 ©2014 Cloudera, Inc. All rights reserved. Everybody Else in Accumulo • Monitor - collects and serves status information • server status • log inspection • performance data • table inspection 34
  • 35. 35 ©2014 Cloudera, Inc. All rights reserved. Everybody Else outside Accumulo • HDFS (as part of Apache Hadoop) • stores tablet files • stores write-ahead logs (1.5+) • MapReduce (Hadoop) • bulk import • batch processing • Apache ZooKeeper 35
  • 37. 37 ©2014 Cloudera, Inc. All rights reserved. Easy as 1-2-3? 1.Install Hadoop (HDFS and MapReduce) 2.Install ZooKeeper 3.Install Accumulo! 37
  • 38. 38 ©2014 Cloudera, Inc. All rights reserved. Making Steps 1 and 2 Easier Use a complete, pre-packaged Hadoop distribution ... like CDH! a leading commercial distribution centered on Apache Hadoop •many ecosystem components •configured / updated to work together 38
  • 39. 39 ©2014 Cloudera, Inc. All rights reserved. Making Steps 1 and 2 Easier Cloudera Manager •deployment •configuration •operation •security 39
  • 40. 40 ©2014 Cloudera, Inc. All rights reserved. Making Step 3 Easier Standard Apache Accumulo installation is via tarball • no longer shipping RPM / DEB / ... Using CDH/CM you can use: • a tarball, RPM or DEB with Accumulo packaged for CDH • a parcel (like RPM / ZIP) for easier upgrades • 1.4.4 and 1.4.5 available now • 1.6.0 soon 40
  • 41. 41 ©2014 Cloudera, Inc. All rights reserved. Where to Go for More • http://accumulo.apache.org/ • http://www.cloudera.com/content/cloudera/en/products-and-service • http://www.cloudera.com/content/cloudera/en/products-and-service • http://www.cloudera.com/content/cloudera/en/products-and- services/cdh/accumulo.html 41
  • 42. 42 ©2014 Cloudera, Inc. All rights reserved. Accumulo Summit Join us on June 12 42
  • 43. 43 ©2014 Cloudera, Inc. All rights reserved. Quick Thanks • My slide reviewers • Sean Busbey • Mike Drob • Accumulo community • You all for listening 43
  • 44. 44 ©2014 Cloudera, Inc. All rights reserved. Thank you! Bill Havanki bhavanki@clouderagovt.com 44