SlideShare a Scribd company logo

Introduction to Apache Accumulo

Presented at the Boulder/Denver BigData Meetup on March 21, 2012

1 of 42
Download to read offline
Introduction to Apache Accumulo




Boulder/Denver BigData Meetup - March 21,2012
Jared Winick
@jaredwinick
Accumulo             /əˈkjuˈmj ʊ/
                            ʊˈlo

1. Sorted, distributed key/value store with
 cell-based access control and
 customizable server-side processing
http://yourmotivational.com/uploads/8604.jpg
Annotation Added
Jeff Dean: Designs, Lessons and Advice from Building Large Distributed Systems
http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
Enables interactive access to…

       Trillions of records
   petabytes of indexed data
  across 100s-1000s of servers
Short Accumulo History Lesson




            http://www.flickr.com/photos/mr_t_in_dc/4249886990/sizes/l/in/photostream/

Recommended

HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupCloudera, Inc.
 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloHortonworks
 
Alternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIAlternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIJosh Elser
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
An Introduction to Accumulo
An Introduction to AccumuloAn Introduction to Accumulo
An Introduction to AccumuloDonald Miner
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 

More Related Content

What's hot

Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data PlatformRakuten Group, Inc.
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAsMicrosoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAsMark Kromer
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera, Inc.
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Scott Leberknight
 
HiveServer2 for Apache Hive
HiveServer2 for Apache HiveHiveServer2 for Apache Hive
HiveServer2 for Apache HiveCarl Steinbach
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprisesnvvrajesh
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...Yahoo Developer Network
 

What's hot (20)

Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
 
6.hive
6.hive6.hive
6.hive
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 
Introducing Kudu
Introducing KuduIntroducing Kudu
Introducing Kudu
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAsMicrosoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAs
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
HiveServer2 for Apache Hive
HiveServer2 for Apache HiveHiveServer2 for Apache Hive
HiveServer2 for Apache Hive
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
 
Apache hive
Apache hiveApache hive
Apache hive
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
 

Similar to Introduction to Apache Accumulo

GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoopfvanvollenhoven
 
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkSteve Loughran
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BIDenny Lee
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandRichard McDougall
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopStefano Paluello
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 

Similar to Introduction to Apache Accumulo (20)

RuG Guest Lecture
RuG Guest LectureRuG Guest Lecture
RuG Guest Lecture
 
Hadoop Inside
Hadoop InsideHadoop Inside
Hadoop Inside
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoop
 
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talk
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
HADOOP
HADOOPHADOOP
HADOOP
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 

Recently uploaded

Python For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emPython For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emNho Vĩnh
 
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerCentralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerSaiLinnThu2
 
Large Language Models and Applications in Healthcare
Large Language Models and Applications in HealthcareLarge Language Models and Applications in Healthcare
Large Language Models and Applications in HealthcareAsma Ben Abacha
 
Boosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUG
Boosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUGBoosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUG
Boosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUGRick Ossendrijver
 
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...Cprime
 
New ThousandEyes Product Features and Release Highlights: February 2024
New ThousandEyes Product Features and Release Highlights: February 2024New ThousandEyes Product Features and Release Highlights: February 2024
New ThousandEyes Product Features and Release Highlights: February 2024ThousandEyes
 
software-quality-assurance question paper 2023
software-quality-assurance question paper 2023software-quality-assurance question paper 2023
software-quality-assurance question paper 2023RohanMistry15
 
PrismCRM-RealEstate-SalesCRM_byCode5Company
PrismCRM-RealEstate-SalesCRM_byCode5CompanyPrismCRM-RealEstate-SalesCRM_byCode5Company
PrismCRM-RealEstate-SalesCRM_byCode5CompanyMustafa Kuğu
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxNeo4j
 
AGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdfAGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdfRodneyThomas28
 
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...
Building Bridges:  Merging RPA Processes, UiPath Apps, and Data Service to bu...Building Bridges:  Merging RPA Processes, UiPath Apps, and Data Service to bu...
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...DianaGray10
 
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...SearchNorwich
 
iOncologi_Pitch Deck_2024 slide show for hostinger
iOncologi_Pitch Deck_2024 slide show for hostingeriOncologi_Pitch Deck_2024 slide show for hostinger
iOncologi_Pitch Deck_2024 slide show for hostingerssuser9354ce
 
Artificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeArtificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeJosh Gellers
 
Achieving Excellence IESVE for HVAC Simulation.pdf
Achieving Excellence IESVE for HVAC Simulation.pdfAchieving Excellence IESVE for HVAC Simulation.pdf
Achieving Excellence IESVE for HVAC Simulation.pdfIES VE
 
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...Chris Bingham
 
My Journey towards Artificial Intelligence
My Journey towards Artificial IntelligenceMy Journey towards Artificial Intelligence
My Journey towards Artificial IntelligenceVijayananda Mohire
 
National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...MichaelBenis1
 
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31shyamraj55
 

Recently uploaded (20)

Python For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emPython For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ em
 
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerCentralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
 
Large Language Models and Applications in Healthcare
Large Language Models and Applications in HealthcareLarge Language Models and Applications in Healthcare
Large Language Models and Applications in Healthcare
 
Boosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUG
Boosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUGBoosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUG
Boosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUG
 
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
 
New ThousandEyes Product Features and Release Highlights: February 2024
New ThousandEyes Product Features and Release Highlights: February 2024New ThousandEyes Product Features and Release Highlights: February 2024
New ThousandEyes Product Features and Release Highlights: February 2024
 
software-quality-assurance question paper 2023
software-quality-assurance question paper 2023software-quality-assurance question paper 2023
software-quality-assurance question paper 2023
 
PrismCRM-RealEstate-SalesCRM_byCode5Company
PrismCRM-RealEstate-SalesCRM_byCode5CompanyPrismCRM-RealEstate-SalesCRM_byCode5Company
PrismCRM-RealEstate-SalesCRM_byCode5Company
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
AGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdfAGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdf
 
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...
Building Bridges:  Merging RPA Processes, UiPath Apps, and Data Service to bu...Building Bridges:  Merging RPA Processes, UiPath Apps, and Data Service to bu...
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...
 
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
 
iOncologi_Pitch Deck_2024 slide show for hostinger
iOncologi_Pitch Deck_2024 slide show for hostingeriOncologi_Pitch Deck_2024 slide show for hostinger
iOncologi_Pitch Deck_2024 slide show for hostinger
 
Artificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeArtificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human Justice
 
Achieving Excellence IESVE for HVAC Simulation.pdf
Achieving Excellence IESVE for HVAC Simulation.pdfAchieving Excellence IESVE for HVAC Simulation.pdf
Achieving Excellence IESVE for HVAC Simulation.pdf
 
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
 
My Journey towards Artificial Intelligence
My Journey towards Artificial IntelligenceMy Journey towards Artificial Intelligence
My Journey towards Artificial Intelligence
 
National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...
 
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
 

Introduction to Apache Accumulo

  • 1. Introduction to Apache Accumulo Boulder/Denver BigData Meetup - March 21,2012 Jared Winick @jaredwinick
  • 2. Accumulo /əˈkjuˈmj ʊ/ ʊˈlo 1. Sorted, distributed key/value store with cell-based access control and customizable server-side processing
  • 4. Annotation Added Jeff Dean: Designs, Lessons and Advice from Building Large Distributed Systems http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
  • 5. Enables interactive access to… Trillions of records petabytes of indexed data across 100s-1000s of servers
  • 6. Short Accumulo History Lesson http://www.flickr.com/photos/mr_t_in_dc/4249886990/sizes/l/in/photostream/
  • 10. 2012
  • 11. Uses of BigTable and Kin (BigTable) (HBase) •Google Analytics1 •Messages3,4,6 •Crawl1 •Insights5,6 •AppEngine Datastore2 •Many more1 (Cassandra) (Accumulo) •Rainbird (realtime analytics)7 •??? 1.) http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf 2.) http://code.google.com/appengine/articles/storage_breakdown.html 3.) http://www.facebook.com/note.php?note_id=454991608919 4.) http://mvdirona.com/jrh/TalksAndPapers/KannanMuthukkaruppan_StorageInfraBehindMessages.pdf 5.) http://www.facebook.com/note.php?note_id=10150103900258920 6.) http://borthakur.com/ftp/SIGMODRealtimeHadoopPresentation.pdf 7.) http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
  • 12. Accumulo /əˈkjuˈmj ʊ/ ʊˈlo 1. Sorted, distributed key/value store with cell-based access control and customizable server-side processing
  • 13. Multi-dimension Key Key Column Value Row ID Timestamp Family Qualifier Visibility http://incubator.apache.org/accumulo/user_manual_1.4-incubating/Accumulo_Design.html
  • 14. Keys Sorted Lexicographically Row ID, Column Family, Column Qualifier, Column Visibility, Timestamp Everything is a byte[] except the Timestamp which is a long
  • 15. Physical Layout Key Value Row ID Col Fam Col Qual Col Vis Time Value Alice properties age public March 2011 31 Alice properties phone private Feb 2011 555-1234 Alice purchases Xbox public Feb 2011 $299 Bob properties phone private March 2011 555-4321 Bob purchases iPhone Public Feb 2011 $399
  • 16. Queries •By exact Key or range of Keys •Data is always returned in sorted order Query Requirements Drive Data Model Design
  • 18. Hadoop Clients MapReduce Read/ Analytics Write Accumulo Configuration/ Storage State Hadoop HDFS Zookeeper
  • 19. Table Tablets Accumulo … Tablet Server … … Tablet Server … ... … Tablet Server … Master Data Node Data Node ... Data Node Name Node Hadoop HDFS
  • 20. Table Tablet Server Failure Tablets 1.) Detect Failure Accumulo Tablet Server Tablet Server ... Tablet Server Master 2.) Reassign Data Node Data Node ... Data Node Name Node Hadoop HDFS
  • 21. Writes Write- Ahead Accumulo Log (WAL) Tablet Server 1 Tablet 2 MemTable Client Data Node ... Data Node Data Node Hadoop HDFS
  • 22. Writes Write- Ahead Accumulo Log (WAL) Tablet Server 1 Tablet 2 MemTable Client 3 File 1 Data Node ... Data Node Data Node Hadoop HDFS
  • 23. Compactions Minor Major The process of flushing The process of a MemTable of a Tablet combining multiple files to a single file in HDFS into a single file
  • 24. Tablet Splits • Tablets are split when they reach a max size • Always split on row boundary • Master assigns a split Tablet to another Tablet server (no data is moved!)
  • 25. Reads Accumulo Tablet Server Tablet MemTable Client File 1 File 1
  • 26. Accumulo /əˈkjuˈmj ʊ/ ʊˈlo 1. Sorted, distributed key/value store with cell-based access control and customizable server-side processing
  • 28. Iterators Can be run at: Can do things like: •Scan Time •Aggregation (Combiners) •Minor Compaction •Age-Off •Major Compaction •Filtering (access control) •Transformation Push Processing to the Data
  • 29. Accumulo /əˈkjuˈmj ʊ/ ʊˈlo 1. Sorted, distributed key/value store with cell-based access control and customizable server-side processing
  • 30. Access Control • Every key-value has a visibility label • Label is defined with boolean operators • Label is arbitrary and ad-hoc Public Private | Admin Finance | (HR & Manager) • Authorizations presented at scan time • Data is filtered out automatically by system- level Iterator
  • 31. Access Control – Typical Architecture Trusted Zone 6.) Return Data 5.) Return Visible Data Web Server Accumulo 1.) Pass Credentials 4.) Proxy Authorization 3.) Return Authorizations 2.) Lookup User Enterprise Identity Management
  • 32. Access Control – Typical Architecture Trusted Zone Accumulo 6.) Return [6,8] 5.) Return [6,8] SECRET&PROJECT X, 6 Web Server SECRET&PROJECT Y, 8 1.) PKI Cert 4.) Proxy Bob’s Auths SECRET&PROJECT Z, 3 Bob 3.) Auths:[SECRET, UNCLASSIFIED, 2.) Lookup PROJECT X, PROJECT Y] Bob Enterprise Identity Management
  • 33. Demo
  • 34. Application Requirements Build an application to analyze trends in Twitter messages. •Query for word/phrase and view real-time activity in a time series graph •View at different time ranges (1 day, 7 days, 30 days, etc) •Allow multiple query terms to compare activity (ex. Breakfast,Lunch) •Automatically extract daily trends for the user
  • 35. Demo Setup/Data • Twitter Streaming API • US country codes only messages • 1,2,3-grams built • Data since Dec 24 – Live • Running on average workstation, 1 SATA disk, 6 GB memory. • 72GB, 2.6 billion entries and counting
  • 37. Data Model • Tweets table – Row ID: n-gram – Column Family: Date Granularity (DAY, HOUR) – Column Qual: Date Value – Value: Count – SummingCombiner (Iterator) used to update Count Row ID Col Fam Col Qual Value breakfast DAY 20120318 31 breakfast DAY 20120319 56 … … … … lunch HOUR 2012031801 3 lunch HOUR 2012031802 4
  • 38. Data Model • Trends table – Row ID: (Date Granularity + Date Value) – Column Family: (Integer.MAX_VALUE – trendScore) – Column Qual: n-gram – Value: [] Row ID Col Fam Col Qual Value DAY:20120318 2147483145 church DAY:20120318 2147483316 hangover … … … … DAY:20120319 2147476521 the broncos DAY:20120319 2147477704 tim tebow
  • 39. MapReduce Analytics • Utilize MapReduce for building trends • AccumuloInputFormat reads from tweets table • AccumuloOutputFormat writes to trends table • AccumuloStorage LoadFunc for Pig available on github
  • 41. Summary •Accumulo exploits locality to enable interactive access to huge data sets while adding cell-level access control and server- side programming •Nothing in life is free. Accumulo comes with the complexity and responsibility of managing a distributed system and designing indexes on your data
  • 42. References • Documentation, Mailing Lists, Links http://incubator.apache.org/accumulo/ • HBase Shootout http://www.slideshare.net/cloudera/h-base-and-accumulo-todd-lipcom-jan-25-2012 • Trendulo https://github.com/jaredwinick/trendulo

Editor's Notes

  1. Obviously most people’s data set isn’t this large. If you can fit your data into memory of a single large server, Accumulo probably isn’t for you.
  2. 20 billion events day for Insights6+ billion msg -> 75 billion rw operations/day
  3. Sparse,sorted
  4. Table is partitioned into tablets which are logically assigned to tablet servers (they are physically in HDFS). Tablet is a range of keys.
  5. Tablets are only logically assigned to tablet servers by theAccumulo Master. The are physically stored in HDFS. Tablet is one or more files.
  6. Data first written to WAL (outside of HDFS on a different machine), then inserted into sorted MemTable (balanced, sorted binary tree)
  7. When MemTable is full, it gets flushed to a file which is stored in HDFS (minor compaction). Writes to disk are sequential as MemTable is sorted
  8. All of these files are always sorted!
  9. TabletServer merges key-values from all its files and its MemTable to present a complete sorted view of data
  10. One of the most powerful features of Accumulo – a lot to learn. Come back to aggregation in demo
  11. Example: Trendistic (http://trendistic.indextank.com)
  12. Documentation is a work in progress…