SlideShare a Scribd company logo
Submit Search
Upload
Login
Signup
Apache Accumulo Overview
Report
Bill Havanki
Follow
Software Engineer, Author at Segment
May. 30, 2014
•
0 likes
•
2,448 views
1
of
44
Apache Accumulo Overview
May. 30, 2014
•
0 likes
•
2,448 views
Download Now
Download to read offline
Report
Data & Analytics
An overview of the Apache Accumulo high-performance, scalable, distributed key/value store.
Bill Havanki
Follow
Software Engineer, Author at Segment
Recommended
Introduction to Apache Accumulo
busbey
876 views
•
88 slides
A Tour of Internal Accumulo Testing
Bill Havanki
1.7K views
•
27 slides
S3Guard: What's in your consistency model?
Hortonworks
2.3K views
•
20 slides
What's new in Hadoop Yarn- Dec 2014
InMobi Technology
896 views
•
18 slides
Running Hadoop as Service in AltiScale Platform
InMobi Technology
1.1K views
•
49 slides
Running a container cloud on YARN
DataWorks Summit
1.2K views
•
41 slides
More Related Content
What's hot
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
1.8K views
•
22 slides
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
3.8K views
•
47 slides
Hadoop Operations
Cloudera, Inc.
3.7K views
•
132 slides
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
1.6K views
•
42 slides
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
2.4K views
•
56 slides
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
DataWorks Summit
1.4K views
•
31 slides
What's hot
(20)
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
•
1.8K views
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
•
3.8K views
Hadoop Operations
Cloudera, Inc.
•
3.7K views
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
•
1.6K views
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
•
2.4K views
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
DataWorks Summit
•
1.4K views
Apache Ambari: Past, Present, Future
Hortonworks
•
3.2K views
Hadoop on Docker
Rakesh Saha
•
3.9K views
HPC and cloud distributed computing, as a journey
Peter Clapham
•
864 views
YARN and the Docker container runtime
DataWorks Summit/Hadoop Summit
•
3.7K views
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
•
2.9K views
Running Enterprise Workloads in the Cloud
DataWorks Summit
•
447 views
Intro to Apache Spark
Cloudera, Inc.
•
3.4K views
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
•
5.2K views
Cloudbreak - Technical Deep Dive
DataWorks Summit/Hadoop Summit
•
1.7K views
Getting Apache Spark Customers to Production
Cloudera, Inc.
•
1.6K views
Cluster management and automation with cloudera manager
Chris Westin
•
4.9K views
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
•
2.3K views
Apache ignite v1.3
Klearchos Klearchou
•
1.6K views
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
•
5.7K views
Viewers also liked
Stupid Shell Tricks with Apache Accumulo
Cloudera, Inc.
2K views
•
10 slides
Sqrrl real time_big_data_20130411
Sqrrl
865 views
•
21 slides
Introduction to Continuous Integration
Bill Havanki
1.9K views
•
38 slides
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit
663 views
•
172 slides
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit
428 views
•
20 slides
Accumulo design
scsorensen
1.2K views
•
54 slides
Viewers also liked
(20)
Stupid Shell Tricks with Apache Accumulo
Cloudera, Inc.
•
2K views
Sqrrl real time_big_data_20130411
Sqrrl
•
865 views
Introduction to Continuous Integration
Bill Havanki
•
1.9K views
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit
•
663 views
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit
•
428 views
Accumulo design
scsorensen
•
1.2K views
Accumulo meetup 20130109
Sqrrl
•
4.1K views
Apache Accumulo and the Data Lake
Aaron Cordova
•
1.7K views
Accumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit
•
201 views
Large Scale Accumulo Clusters
Aaron Cordova
•
2.9K views
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit
•
2K views
Software Team Hierarchy of Needs
Bill Havanki
•
3K views
Accumulo: A Quick Introduction
James Salter
•
500 views
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit
•
189 views
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit
•
441 views
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit
•
674 views
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
•
1.4K views
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Yahoo Developer Network
•
2.6K views
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit
•
366 views
GeoMesa LocationTech DC
CCRinc
•
2.1K views
Similar to Apache Accumulo Overview
Applications on Hadoop
markgrover
1.4K views
•
24 slides
Strata EU tutorial - Architectural considerations for hadoop applications
hadooparchbook
4.7K views
•
157 slides
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
399 views
•
27 slides
Part 2: A Visual Dive into Machine Learning and Deep Learning
Cloudera, Inc.
1.5K views
•
32 slides
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
369 views
•
30 slides
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Hadoop / Spark Conference Japan
2.5K views
•
39 slides
Similar to Apache Accumulo Overview
(20)
Applications on Hadoop
markgrover
•
1.4K views
Strata EU tutorial - Architectural considerations for hadoop applications
hadooparchbook
•
4.7K views
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
•
399 views
Part 2: A Visual Dive into Machine Learning and Deep Learning
Cloudera, Inc.
•
1.5K views
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
•
369 views
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Hadoop / Spark Conference Japan
•
2.5K views
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
hadooparchbook
•
9.5K views
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
ClouderaUserGroups
•
2K views
Building data pipelines with kite
Joey Echeverria
•
5.7K views
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
•
7.3K views
Spark One Platform Webinar
Cloudera, Inc.
•
2.5K views
Hadoop Application Architectures tutorial at Big DataService 2015
hadooparchbook
•
3.4K views
Building a Hadoop Data Warehouse with Impala
huguk
•
2K views
Kafka for DBAs
Gwen (Chen) Shapira
•
12.7K views
大数据数据治理及数据安全
Jianwei Li
•
271 views
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera, Inc.
•
2.1K views
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Andrei Savu
•
1K views
One Hadoop, Multiple Clouds
Cloudera, Inc.
•
1.1K views
Cloudera GoDataFest Deploying Cloudera in the Cloud
GoDataDriven
•
206 views
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
markgrover
•
1.2K views
Recently uploaded
BLOCK CHAIN TECHNOLOGY.pptx
Priyanka749523
22 views
•
11 slides
Prescriber's Guide: Stahl's Essential Psychopharmacology
DoloresLPerez
7 views
•
1 slide
Cybersecurity Awareness Overview.pptx
AfsanaMumal2
16 views
•
26 slides
G20.pptx
vazid ali khan
34 views
•
11 slides
Essential numpy before you start your Machine Learning journey in python.pdf
Smrati Kumar Katiyar
12 views
•
8 slides
Your Analytics does not have to be dramatic to be useful
Andrew Patricio
32 views
•
28 slides
Recently uploaded
(20)
BLOCK CHAIN TECHNOLOGY.pptx
Priyanka749523
•
22 views
Prescriber's Guide: Stahl's Essential Psychopharmacology
DoloresLPerez
•
7 views
Cybersecurity Awareness Overview.pptx
AfsanaMumal2
•
16 views
G20.pptx
vazid ali khan
•
34 views
Essential numpy before you start your Machine Learning journey in python.pdf
Smrati Kumar Katiyar
•
12 views
Your Analytics does not have to be dramatic to be useful
Andrew Patricio
•
32 views
Random Forests without the Randomness Sept_2023.pptx
KirkMonteverde1
•
13 views
Dunning - SIGMOD - Data Economy.pptx
Ted Dunning
•
17 views
Godrej - UL sales analysis
fireburner1
•
6 views
Interpreting the brief B2.pptx
Stephen266013
•
9 views
criteria of a good research design.pdf
MuthuLakshmi124949
•
54 views
Data processing.pdf
MuthuLakshmi124949
•
95 views
Wellbeing of Wales 2023
Statistics for Wales @ Welsh Government
•
90 views
Quantum Karnaugh map in NV-center Quantum Computer
ssuserb645ae
•
19 views
Data Warehouse with Fabric on data lakehouse
Marco Pozzan
•
99 views
research report.pdf
MuthuLakshmi124949
•
102 views
NAMEs Onesimus Equations.pdf
A-Square Technology Group/Nascent Applied Methods and Endeavors
•
6 views
International Observe the Moon Night 2023
VICTOR MAESTRE RAMIREZ
•
22 views
Keras_Core_introduction.pptx
GDSCBBIT
•
9 views
Data collection.pdf
MuthuLakshmi124949
•
71 views
Apache Accumulo Overview
1.
11 Apache Accumulo Overview Bill
Havanki Solutions Architect, Cloudera Government Solutions
2.
2 ©2014 Cloudera,
Inc. All rights reserved. 2 •Quick History •Storage Model •Loading and Querying •Daemons •Getting Started, a.k.a., the Pitch Agenda
3.
3 A Quick History 3
4.
4 ©2014 Cloudera,
Inc. All rights reserved. Google BigTable Compressed, high-performance, scalable, distributed sorted map 4
5.
5 ©2014 Cloudera,
Inc. All rights reserved. Google BigTable • Began development in 2004 • Built on Google File System • Non-relational • Byte-oriented and schemaless • Stores data in the petabyte range • Research paper published in 2006 5
6.
6 ©2014 Cloudera,
Inc. All rights reserved. Child(ren) of BigTable • Apache HBase (begun 2006, top-level 2010) • Apache Cassandra (begun 2008-ish, top-level 2010) • Apache Accumulo ... 6
7.
7 ©2014 Cloudera,
Inc. All rights reserved. From Cloudbase to Accumulo • Started in 2008 as National Security Agency project • Submitted to Apache Incubator in 2011 (and renamed) • Top-level project in 2012 7
8.
8 Storage Model 8
9.
9 ©2014 Cloudera,
Inc. All rights reserved. Key / Value Store Accumulo stores tables of key / value pairs 9
10.
10 ©2014 Cloudera,
Inc. All rights reserved. Key / Value Store A row is a sorted sequence of key / value pairs Each pair is a cell 10
11.
11 ©2014 Cloudera,
Inc. All rights reserved. The Key 11 row column timestamp family qualifier visibility
12.
12 ©2014 Cloudera,
Inc. All rights reserved. An example key 12 bhavanki column 1401041295 personal middle PII
13.
13 ©2014 Cloudera,
Inc. All rights reserved. Another example key 13 brees column 1401041296 employment salary FIN
14.
14 ©2014 Cloudera,
Inc. All rights reserved. It’s all bytes All key and value data are stored as bytes except timestamp is a long There are no built-in data types but lexicoders help with common types Key components are usually UTF-8 strings 14
15.
15 ©2014 Cloudera,
Inc. All rights reserved. Some rows for you 15 row cf cq cv ts value bhavanki job employer 2013-09-01 Cloudera bhavanki personal beer 2013-09-15 Omission bhavanki personal house NOMUGGL 2014-01-25 Ravenclaw brees job employer 2013-10-01 White Cliffs brees personal house NOMUGGL 2014-01-01 Hufflepuff
16.
16 ©2014 Cloudera,
Inc. All rights reserved. Visibility Labels Boolean expression Specialist | (Management & SpecTraining) Authorizations are provided in each scan 16
17.
17 ©2014 Cloudera,
Inc. All rights reserved. Locality Groups You can identify sets of one or more column families as locality groups Data in a locality group is stored together for improved read performance 17
18.
18 ©2014 Cloudera,
Inc. All rights reserved. Tablets A table is comprised of one or more tablets 18 employeesemployees employees;Semployees;Semployees;Hemployees;H employees;~employees;~
19.
19 ©2014 Cloudera,
Inc. All rights reserved. Tablets Tablets maps to data files in HDFS 19 employees;Semployees;Semployees;Hemployees;H employees;~employees;~ rfile 2rfile 2rfile 1rfile 1 rfile 3rfile 3
20.
20 ©2014 Cloudera,
Inc. All rights reserved. Tablets Data also kept in write-ahead logs and memtable 20 employees;Hemployees;H rfile 1rfile 1 walogswalogs memtablememtable
21.
21 Loading and Querying 21
22.
22 ©2014 Cloudera,
Inc. All rights reserved. Java Client API 22
23.
23 ©2014 Cloudera,
Inc. All rights reserved. Java Client API Read using scanners Scanner s = conn.createScanner(“employees”, new Authorizations()); s.setRange(“alice”, “eve”); s.setColumnFamily(“personal”); for (Entry<Key, Value> e : s) employeeIds.add(e.getKey().getRow()); 23
24.
24 ©2014 Cloudera,
Inc. All rights reserved. Java Client API Read access via iterator pattern • server-side system iterators handle timestamps, authorization checks, and lots more • iterators almost always wrap other iterators, forming a chain • you can define your own, client-side or server-side 24
25.
25 ©2014 Cloudera,
Inc. All rights reserved. Java Client API Scanners fetch sorted rows from one range Batch scanners fetch unsorted rows from multiple ranges in parallel Isolated scanners ensure that you do not see a row mid-change 25
26.
26 ©2014 Cloudera,
Inc. All rights reserved. MapReduce AccumuloInputFormat AccumuloOutputFormat 26
27.
27 ©2014 Cloudera,
Inc. All rights reserved. MapReduce AccumuloRowInputFormat AccumuloRowOutputFormat 27
28.
28 ©2014 Cloudera,
Inc. All rights reserved. Shell Command-line / manual access to Accumulo data • scan, insert, delete • iterator management • table management (creation, deletion, cloning) • user and authorization management • table splitting and merging • ... more 28
29.
29 ©2014 Cloudera,
Inc. All rights reserved. Bulk Import Got lots of data to import quickly? • Use MR job to format data using AccumuloFileOutputFormat • Import files using shell Trade off latency / availablity for throughput 29
30.
30 Daemons 30
31.
31 ©2014 Cloudera,
Inc. All rights reserved. Tablet Server Serves tablets (table data) • writes data to walog, memtable; deals with compaction • serves data for reads from files, memtable • handles recovery from walogs in case of server failure Most client calls go to tablet servers 31
32.
32 ©2014 Cloudera,
Inc. All rights reserved. Master • assigns tablets to tablet servers • detects tablet server failures and reassigns tablets • balances tablet assignments over time • coordinates table operations Multiple supported for failover, only one active 32
33.
33 ©2014 Cloudera,
Inc. All rights reserved. Everybody Else in Accumulo Garbage Collector (GC) - identifies and deletes files in HDFS that are no longer needed Tracer - listens for and stores distributed trace messages using a special table 33
34.
34 ©2014 Cloudera,
Inc. All rights reserved. Everybody Else in Accumulo • Monitor - collects and serves status information • server status • log inspection • performance data • table inspection 34
35.
35 ©2014 Cloudera,
Inc. All rights reserved. Everybody Else outside Accumulo • HDFS (as part of Apache Hadoop) • stores tablet files • stores write-ahead logs (1.5+) • MapReduce (Hadoop) • bulk import • batch processing • Apache ZooKeeper 35
36.
36 Getting Started a.k.a. the
Pitch 36
37.
37 ©2014 Cloudera,
Inc. All rights reserved. Easy as 1-2-3? 1.Install Hadoop (HDFS and MapReduce) 2.Install ZooKeeper 3.Install Accumulo! 37
38.
38 ©2014 Cloudera,
Inc. All rights reserved. Making Steps 1 and 2 Easier Use a complete, pre-packaged Hadoop distribution ... like CDH! a leading commercial distribution centered on Apache Hadoop •many ecosystem components •configured / updated to work together 38
39.
39 ©2014 Cloudera,
Inc. All rights reserved. Making Steps 1 and 2 Easier Cloudera Manager •deployment •configuration •operation •security 39
40.
40 ©2014 Cloudera,
Inc. All rights reserved. Making Step 3 Easier Standard Apache Accumulo installation is via tarball • no longer shipping RPM / DEB / ... Using CDH/CM you can use: • a tarball, RPM or DEB with Accumulo packaged for CDH • a parcel (like RPM / ZIP) for easier upgrades • 1.4.4 and 1.4.5 available now • 1.6.0 soon 40
41.
41 ©2014 Cloudera,
Inc. All rights reserved. Where to Go for More • http://accumulo.apache.org/ • http://www.cloudera.com/content/cloudera/en/products-and-service • http://www.cloudera.com/content/cloudera/en/products-and-service • http://www.cloudera.com/content/cloudera/en/products-and- services/cdh/accumulo.html 41
42.
42 ©2014 Cloudera,
Inc. All rights reserved. Accumulo Summit Join us on June 12 42
43.
43 ©2014 Cloudera,
Inc. All rights reserved. Quick Thanks • My slide reviewers • Sean Busbey • Mike Drob • Accumulo community • You all for listening 43
44.
44 ©2014 Cloudera,
Inc. All rights reserved. Thank you! Bill Havanki bhavanki@clouderagovt.com 44