SlideShare a Scribd company logo
Accumulo Tutorial
Up and Running (or at Least Walking) in 90 Minutes
Background
people. technology. integrity
Welcome!
● Accumulo is a sparse, distributed, sorted, multi-dimensional map
● Modeled after Google’s BigTable design, but with security features
built in
● Designed to scale linearly to trillions of records and 10s of
petabytes
● Features automatic load balancing, high-availability, fault-tolerance,
and dynamic data model
What is Accumulo?
15 October 2018 3
people. technology. integrity
Welcome!
● Data model features cell level security
● Stands up to sustained, high-volume ingest
● Data lifecycle management
● Powerful server-side programming features for data manipulation
(iterators)
Why Accumulo?
15 October 2018 4
people. technology. integrity
Welcome!
● Government Agencies
● Telecommunication
● Health Care
● Finance
Where is Accumulo used?
15 October 2018 5
Architecture
people. technology. integrity
Welcome!
● Stores data as key/value pairs, sorted in lexicographically
ascending order
● All elements are represented as byte arrays, except...
● Timestamp is a long, sorted in descending order
● Tables consist of a set of sorted key/value pairs
Data Model
15 October 2018 7
Key
Row ID
Column
Family Qualifier Visibility
Timestamp
Value
people. technology. integrity
Data Layout
15 October 2018 8
Row
ID
Column
Family
Column
Qualifier
Column
Visibility
Timestamp Value
bob attribute dateOfBirth public 1539193308999 May 9th, 1985
bob attribute height public 1539193309131 71”
bob insurance dental private 1539193309285 MetLife
jane attribute bloodType public 1539193309362 ab-
jane attribute surname public 1539193309512 doe
jane contact cellPhone public 1539193309622 (808) 345-9876
jane insurance vision private 1539193309635 VSP
john allergy major private 1539193309650 amoxicillin
john attribute weight public 1539193309673 180
john contact homeAddr public 1539193309683 34 Baker LN
people. technology. integrity
Hierarchical Decomposition
15 October 2018 9
Row ID bob jane
Column
Family
attribute insurance attribute contact insurance
Column
Qualifier
dateOfBirth height dental bloodType surname cellPhone vision
Value May 9, 1985 5’11” MetLife ab- doe
(808)
345-9876
VSP
Accumulo Shell
people. technology. integrity
Welcome!
Shell Commands
15 October 2018 11
Command Description
help Provide information about the available commands
tables Display a list of all existing tables
table Switch to the specified table
createtable Creates a new table with the specified name
deletetable Deletes the table with the specified name
insert Inserts a record
scan Scans the table and displays the resulting records
delete Deletes a record from the table
getauths Displays the maximum scan authorizations for a user
setauths sets the maximum scan authorizations for a user
quit Exits the shell
people. technology. integrity
Welcome!
● Start the shell
● Login as root
Shell Access
15 October 2018 12
people. technology. integrity
Welcome!
● Get users, tables
● Create some tables
● Select the table you want
● accumulo.* tables are managed and used internally by Accumulo,
not to be altered directly by users
Users/Tables
15 October 2018 13
people. technology. integrity
Welcome!
● Insert records into the current table
● See file containing insert statements on VM desktop, copy & paste
● Usage: insert row column_family column_qualifier value -l
column_visibility
Inserting Records
15 October 2018 14
people. technology. integrity
Welcome!
Column Visibility Syntax
15 October 2018 15
Label Description
A & B Both ‘A’ and ‘B’ are required
A | B Either ‘A’ or ‘B’ is required
A & (C | B) ‘A’ and ‘C’ or ‘A’ and ‘B’ is required
A | (B & C) ‘A’ or ‘B’ and ‘C’ is required
● Combinations of visibility labels can be set on records by
passing them into an insert with logical operators
people. technology. integrity
Welcome!
● Try a scan
● Nothing comes up: no authorizations have been set on the current
user
● Set authorizations on root user
● User’s auths are returned in sorted order
Scans/Authorizations
15 October 2018 16
people. technology. integrity
Welcome!
● Scan with authorizations set
● All inserted records show up
● displayed as: row_id col_family:col_qualifier [visibility] value
Scans/Authorizations
15 October 2018 17
people. technology. integrity
Welcome!
● Limit which of a user’s authorizations are used for a scan
● Scanning with different authorizations returns different result sets
Scans/Authorizations
15 October 2018 18
people. technology. integrity
Welcome!
● Pass -st to show timestamps in a scan
Scans/Authorizations
15 October 2018 19
people. technology. integrity
Welcome!
● Clear authorizations for a user
Scans/Authorizations
15 October 2018 20
people. technology. integrity
● Available at http://localhost:9995
● Interface for monitoring the health and status of Accumulo components
● Displays various metrics relating to function of master server, tablet
server, and tables
Accumulo Monitor Web Interface
15 October 2018 21
Back to Architecture
people. technology. integrity
Welcome!
● Written in Java and uses the
Hadoop Distributed File System
(HDFS) and Apache ZooKeeper
to store table data and
configuration data
● Supports fault tolerance through
replication of tables
● Supports high availability
through automatic failover
Architecture
15 October 2018 23
people. technology. integrity
Welcome!
Architecture
15 October 2018 24
people. technology. integrity
Welcome!
Tablet Writes
15 October 2018 25
people. technology. integrity
Welcome!
Minor Compaction
● Once the MemTable reaches a
certain size the tablet server
writes out the sorted key/value
pairs to a file in HDFS called an
RFile
● Each Minor Compaction creates
one new RFile
● After the RFile is written to HDFS,
a new MemTable is then created
and the compaction is recorded in
the Write-Ahead Log
Tablet Writes
15 October 2018 26
people. technology. integrity
Welcome!
Major Compaction
● Periodically, the tablet server
will combine a set of RFiles to
minimize the number of RFiles
tracked per tablet
● During the merge, any
key/value pairs suppressed by
a delete entry are omitted
● After a Major Compaction
occurs the previous RFiles are
set to be cleaned up by the
Garbage Collector
Tablet Writes
15 October 2018 27
Reading/Writing (Java Client API)
people. technology. integrity
Welcome!
● All operations on accumulo need a reference to the Accumulo
Connector
● Provides a way to perform actions as a user
● Connect to a running instance of ZooKeeper
Connector
15 October 2018 29
people. technology. integrity
Welcome!
● A mutation holds all changes to a table associated with a particular row
ID
● Add desired writes to a mutation
Mutation
15 October 2018 30
people. technology. integrity
Welcome!
● BatchWriter takes mutations and writes to accumulo via connector
● Typically add multiple mutations to a batchwriter at a time
○ Can add single mutation using writer.addMutation(m)
● Batches together data to amortize network communication overhead
BatchWriter
15 October 2018 31
people. technology. integrity
Welcome!
● Use a Scanner to read from a table
● Returns an iterable of results that match the scan
● BatchScanner will scan multiple ranges in parallel: faster for data
spread out on many servers, but doesn’t guarantee ordering of results
Scanner
15 October 2018 32
Exercise 1
people. technology. integrity
Welcome!
● VM Desktop → Exercises/exercise-1
○ Your choice of IDE
● Tasks:
○ Add a Mutation for each piece of information we want to
store from a tweet
○ Add mutations to BatchWriter
Accumulo Writer
15 October 2018 34
Exercise 2
people. technology. integrity
Welcome!
● VM Desktop → Exercises/exercise-2
● Tasks:
○ Configure a ZooKeeperInstance and Connector
○ Check to make sure TwitterData table exists
○ Create and configure a Scanner
○ Iterate over the Scanner and display each tweet’s ID and text
Accumulo Reader
15 October 2018 36
Wrap Up
people. technology. integrity
Welcome!
● Features
● Architecture
○ Data model
○ Authorizations
● Shell
● Reading & Writing
Recap
15 October 2018 38
people. technology. integrity
Welcome!
● Highly consistent
○ Tablets are assigned to exactly one tablet server at a time which
means updates are immediately reflected in reads
● Horizontally Scalable
○ Machines added to the cluster begin participating almost
immediately
● High Failure Tolerance
○ If a tablet server fails, the master just assigns the tablets to a new
node
Additional Features
15 October 2018 39
people. technology. integrity
Welcome!
● MapReduce
○ First class support for MapReduce
● Iterators
○ Powerful server-side programming construct
● Data Lifecycle Management
○ Advanced data lifecycle management for compliance with
legal/policy requirements
Additional Features
15 October 2018 40
people. technology. integrity
Welcome!
● Tweet Indexer
○ MapReduce program to create an inverted index using Twitter
table from exercise 1
○ /home/clearedge/Desktop/Exercises/exercise-3
● Tweet Searcher
○ Find all tweets containing words passed in as arguments using
inverted index from exercise 3
○ /home/clearedge/Desktop/Exercises/exercise-4
Take Home Exercises
15 October 2018 41
people. technology. integrity
Related Projects
15 October 2018 42
● Datawave – an ingest/query framework that leverages Accumulo to
provide fast, secure data access
○ https://code.nsa.gov/datawave/
● Nifi – automated data flow between systems with custom processing
capabilities
○ https://nifi.apache.org/
● Fluo – large scale incremental processing built atop Accumulo
○ https://fluo.apache.org
● Timely – time series database application that provides secure access to
time series data
○ https://code.nsa.gov/timely/

More Related Content

Similar to Accumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes

Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 review
ManageIQ
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
Anna Ossowski
 
Sprint 60
Sprint 60Sprint 60
Sprint 60
ManageIQ
 
Sprint 59
Sprint 59Sprint 59
Sprint 59
ManageIQ
 
Sprint 58
Sprint 58Sprint 58
Sprint 58
ManageIQ
 
Sprint 61
Sprint 61Sprint 61
Sprint 61
ManageIQ
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
Alex Rayón Jerez
 
Sprint 50 review
Sprint 50 reviewSprint 50 review
Sprint 50 review
ManageIQ
 
Sprint 31
Sprint 31Sprint 31
Sprint 31
ManageIQ
 
Tekton showcase - CDF Summit Kubecon Barcelona 2019
Tekton showcase - CDF Summit Kubecon Barcelona 2019Tekton showcase - CDF Summit Kubecon Barcelona 2019
Tekton showcase - CDF Summit Kubecon Barcelona 2019
Christie Wilson
 
Sprint 73
Sprint 73Sprint 73
Sprint 73
ManageIQ
 
Using TICK Stack For System and App Metrics
Using TICK Stack For System and App MetricsUsing TICK Stack For System and App Metrics
Using TICK Stack For System and App Metrics
Aayush Tuladhar
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
NETWAYS
 
Sprint 53
Sprint 53Sprint 53
Sprint 53
ManageIQ
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
Kevin Brockhoff
 
Sprint 43 Review
Sprint 43 ReviewSprint 43 Review
Sprint 43 Review
ManageIQ
 
Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is coming
Julien Pivotto
 
Sprint 77
Sprint 77Sprint 77
Sprint 77
ManageIQ
 
Sprint 72
Sprint 72Sprint 72
Sprint 72
ManageIQ
 

Similar to Accumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes (20)

Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 review
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
Sprint 60
Sprint 60Sprint 60
Sprint 60
 
Sprint 59
Sprint 59Sprint 59
Sprint 59
 
Sprint 58
Sprint 58Sprint 58
Sprint 58
 
Sprint 61
Sprint 61Sprint 61
Sprint 61
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
 
Sprint 50 review
Sprint 50 reviewSprint 50 review
Sprint 50 review
 
Sprint 31
Sprint 31Sprint 31
Sprint 31
 
Tekton showcase - CDF Summit Kubecon Barcelona 2019
Tekton showcase - CDF Summit Kubecon Barcelona 2019Tekton showcase - CDF Summit Kubecon Barcelona 2019
Tekton showcase - CDF Summit Kubecon Barcelona 2019
 
Sprint 73
Sprint 73Sprint 73
Sprint 73
 
Using TICK Stack For System and App Metrics
Using TICK Stack For System and App MetricsUsing TICK Stack For System and App Metrics
Using TICK Stack For System and App Metrics
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
 
Sprint 53
Sprint 53Sprint 53
Sprint 53
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Sprint 43 Review
Sprint 43 ReviewSprint 43 Review
Sprint 43 Review
 
Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is coming
 
Sprint 77
Sprint 77Sprint 77
Sprint 77
 
Sprint 72
Sprint 72Sprint 72
Sprint 72
 

Recently uploaded

Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 

Recently uploaded (20)

Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 

Accumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes

  • 1. Accumulo Tutorial Up and Running (or at Least Walking) in 90 Minutes
  • 3. people. technology. integrity Welcome! ● Accumulo is a sparse, distributed, sorted, multi-dimensional map ● Modeled after Google’s BigTable design, but with security features built in ● Designed to scale linearly to trillions of records and 10s of petabytes ● Features automatic load balancing, high-availability, fault-tolerance, and dynamic data model What is Accumulo? 15 October 2018 3
  • 4. people. technology. integrity Welcome! ● Data model features cell level security ● Stands up to sustained, high-volume ingest ● Data lifecycle management ● Powerful server-side programming features for data manipulation (iterators) Why Accumulo? 15 October 2018 4
  • 5. people. technology. integrity Welcome! ● Government Agencies ● Telecommunication ● Health Care ● Finance Where is Accumulo used? 15 October 2018 5
  • 7. people. technology. integrity Welcome! ● Stores data as key/value pairs, sorted in lexicographically ascending order ● All elements are represented as byte arrays, except... ● Timestamp is a long, sorted in descending order ● Tables consist of a set of sorted key/value pairs Data Model 15 October 2018 7 Key Row ID Column Family Qualifier Visibility Timestamp Value
  • 8. people. technology. integrity Data Layout 15 October 2018 8 Row ID Column Family Column Qualifier Column Visibility Timestamp Value bob attribute dateOfBirth public 1539193308999 May 9th, 1985 bob attribute height public 1539193309131 71” bob insurance dental private 1539193309285 MetLife jane attribute bloodType public 1539193309362 ab- jane attribute surname public 1539193309512 doe jane contact cellPhone public 1539193309622 (808) 345-9876 jane insurance vision private 1539193309635 VSP john allergy major private 1539193309650 amoxicillin john attribute weight public 1539193309673 180 john contact homeAddr public 1539193309683 34 Baker LN
  • 9. people. technology. integrity Hierarchical Decomposition 15 October 2018 9 Row ID bob jane Column Family attribute insurance attribute contact insurance Column Qualifier dateOfBirth height dental bloodType surname cellPhone vision Value May 9, 1985 5’11” MetLife ab- doe (808) 345-9876 VSP
  • 11. people. technology. integrity Welcome! Shell Commands 15 October 2018 11 Command Description help Provide information about the available commands tables Display a list of all existing tables table Switch to the specified table createtable Creates a new table with the specified name deletetable Deletes the table with the specified name insert Inserts a record scan Scans the table and displays the resulting records delete Deletes a record from the table getauths Displays the maximum scan authorizations for a user setauths sets the maximum scan authorizations for a user quit Exits the shell
  • 12. people. technology. integrity Welcome! ● Start the shell ● Login as root Shell Access 15 October 2018 12
  • 13. people. technology. integrity Welcome! ● Get users, tables ● Create some tables ● Select the table you want ● accumulo.* tables are managed and used internally by Accumulo, not to be altered directly by users Users/Tables 15 October 2018 13
  • 14. people. technology. integrity Welcome! ● Insert records into the current table ● See file containing insert statements on VM desktop, copy & paste ● Usage: insert row column_family column_qualifier value -l column_visibility Inserting Records 15 October 2018 14
  • 15. people. technology. integrity Welcome! Column Visibility Syntax 15 October 2018 15 Label Description A & B Both ‘A’ and ‘B’ are required A | B Either ‘A’ or ‘B’ is required A & (C | B) ‘A’ and ‘C’ or ‘A’ and ‘B’ is required A | (B & C) ‘A’ or ‘B’ and ‘C’ is required ● Combinations of visibility labels can be set on records by passing them into an insert with logical operators
  • 16. people. technology. integrity Welcome! ● Try a scan ● Nothing comes up: no authorizations have been set on the current user ● Set authorizations on root user ● User’s auths are returned in sorted order Scans/Authorizations 15 October 2018 16
  • 17. people. technology. integrity Welcome! ● Scan with authorizations set ● All inserted records show up ● displayed as: row_id col_family:col_qualifier [visibility] value Scans/Authorizations 15 October 2018 17
  • 18. people. technology. integrity Welcome! ● Limit which of a user’s authorizations are used for a scan ● Scanning with different authorizations returns different result sets Scans/Authorizations 15 October 2018 18
  • 19. people. technology. integrity Welcome! ● Pass -st to show timestamps in a scan Scans/Authorizations 15 October 2018 19
  • 20. people. technology. integrity Welcome! ● Clear authorizations for a user Scans/Authorizations 15 October 2018 20
  • 21. people. technology. integrity ● Available at http://localhost:9995 ● Interface for monitoring the health and status of Accumulo components ● Displays various metrics relating to function of master server, tablet server, and tables Accumulo Monitor Web Interface 15 October 2018 21
  • 23. people. technology. integrity Welcome! ● Written in Java and uses the Hadoop Distributed File System (HDFS) and Apache ZooKeeper to store table data and configuration data ● Supports fault tolerance through replication of tables ● Supports high availability through automatic failover Architecture 15 October 2018 23
  • 26. people. technology. integrity Welcome! Minor Compaction ● Once the MemTable reaches a certain size the tablet server writes out the sorted key/value pairs to a file in HDFS called an RFile ● Each Minor Compaction creates one new RFile ● After the RFile is written to HDFS, a new MemTable is then created and the compaction is recorded in the Write-Ahead Log Tablet Writes 15 October 2018 26
  • 27. people. technology. integrity Welcome! Major Compaction ● Periodically, the tablet server will combine a set of RFiles to minimize the number of RFiles tracked per tablet ● During the merge, any key/value pairs suppressed by a delete entry are omitted ● After a Major Compaction occurs the previous RFiles are set to be cleaned up by the Garbage Collector Tablet Writes 15 October 2018 27
  • 29. people. technology. integrity Welcome! ● All operations on accumulo need a reference to the Accumulo Connector ● Provides a way to perform actions as a user ● Connect to a running instance of ZooKeeper Connector 15 October 2018 29
  • 30. people. technology. integrity Welcome! ● A mutation holds all changes to a table associated with a particular row ID ● Add desired writes to a mutation Mutation 15 October 2018 30
  • 31. people. technology. integrity Welcome! ● BatchWriter takes mutations and writes to accumulo via connector ● Typically add multiple mutations to a batchwriter at a time ○ Can add single mutation using writer.addMutation(m) ● Batches together data to amortize network communication overhead BatchWriter 15 October 2018 31
  • 32. people. technology. integrity Welcome! ● Use a Scanner to read from a table ● Returns an iterable of results that match the scan ● BatchScanner will scan multiple ranges in parallel: faster for data spread out on many servers, but doesn’t guarantee ordering of results Scanner 15 October 2018 32
  • 34. people. technology. integrity Welcome! ● VM Desktop → Exercises/exercise-1 ○ Your choice of IDE ● Tasks: ○ Add a Mutation for each piece of information we want to store from a tweet ○ Add mutations to BatchWriter Accumulo Writer 15 October 2018 34
  • 36. people. technology. integrity Welcome! ● VM Desktop → Exercises/exercise-2 ● Tasks: ○ Configure a ZooKeeperInstance and Connector ○ Check to make sure TwitterData table exists ○ Create and configure a Scanner ○ Iterate over the Scanner and display each tweet’s ID and text Accumulo Reader 15 October 2018 36
  • 38. people. technology. integrity Welcome! ● Features ● Architecture ○ Data model ○ Authorizations ● Shell ● Reading & Writing Recap 15 October 2018 38
  • 39. people. technology. integrity Welcome! ● Highly consistent ○ Tablets are assigned to exactly one tablet server at a time which means updates are immediately reflected in reads ● Horizontally Scalable ○ Machines added to the cluster begin participating almost immediately ● High Failure Tolerance ○ If a tablet server fails, the master just assigns the tablets to a new node Additional Features 15 October 2018 39
  • 40. people. technology. integrity Welcome! ● MapReduce ○ First class support for MapReduce ● Iterators ○ Powerful server-side programming construct ● Data Lifecycle Management ○ Advanced data lifecycle management for compliance with legal/policy requirements Additional Features 15 October 2018 40
  • 41. people. technology. integrity Welcome! ● Tweet Indexer ○ MapReduce program to create an inverted index using Twitter table from exercise 1 ○ /home/clearedge/Desktop/Exercises/exercise-3 ● Tweet Searcher ○ Find all tweets containing words passed in as arguments using inverted index from exercise 3 ○ /home/clearedge/Desktop/Exercises/exercise-4 Take Home Exercises 15 October 2018 41
  • 42. people. technology. integrity Related Projects 15 October 2018 42 ● Datawave – an ingest/query framework that leverages Accumulo to provide fast, secure data access ○ https://code.nsa.gov/datawave/ ● Nifi – automated data flow between systems with custom processing capabilities ○ https://nifi.apache.org/ ● Fluo – large scale incremental processing built atop Accumulo ○ https://fluo.apache.org ● Timely – time series database application that provides secure access to time series data ○ https://code.nsa.gov/timely/