Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Apache Accumulo

169,501 views

Published on

Presented at the Boulder/Denver BigData Meetup on March 21, 2012

Published in: Technology
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Introduction to Apache Accumulo

  1. 1. Introduction to Apache AccumuloBoulder/Denver BigData Meetup - March 21,2012Jared Winick@jaredwinick
  2. 2. Accumulo /əˈkjuˈmj ʊ/ ʊˈlo1. Sorted, distributed key/value store with cell-based access control and customizable server-side processing
  3. 3. http://yourmotivational.com/uploads/8604.jpg
  4. 4. Annotation AddedJeff Dean: Designs, Lessons and Advice from Building Large Distributed Systemshttp://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
  5. 5. Enables interactive access to… Trillions of records petabytes of indexed data across 100s-1000s of servers
  6. 6. Short Accumulo History Lesson http://www.flickr.com/photos/mr_t_in_dc/4249886990/sizes/l/in/photostream/
  7. 7. 2006
  8. 8. 2008http://upload.wikimedia.org/wikipedia/commons/8/84/National_Security_Agency_headquarters%2C_Fort_Meade%2C_Maryland.jpg
  9. 9. 2011
  10. 10. 2012
  11. 11. Uses of BigTable and Kin (BigTable) (HBase)•Google Analytics1 •Messages3,4,6•Crawl1 •Insights5,6•AppEngine Datastore2•Many more1 (Cassandra) (Accumulo) •Rainbird (realtime analytics)7 •???1.) http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf2.) http://code.google.com/appengine/articles/storage_breakdown.html3.) http://www.facebook.com/note.php?note_id=4549916089194.) http://mvdirona.com/jrh/TalksAndPapers/KannanMuthukkaruppan_StorageInfraBehindMessages.pdf5.) http://www.facebook.com/note.php?note_id=101501039002589206.) http://borthakur.com/ftp/SIGMODRealtimeHadoopPresentation.pdf7.) http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
  12. 12. Accumulo /əˈkjuˈmj ʊ/ ʊˈlo1. Sorted, distributed key/value store with cell-based access control and customizable server-side processing
  13. 13. Multi-dimension Key Key Column Value Row ID Timestamp Family Qualifier Visibilityhttp://incubator.apache.org/accumulo/user_manual_1.4-incubating/Accumulo_Design.html
  14. 14. Keys Sorted LexicographicallyRow ID, Column Family, Column Qualifier, Column Visibility, Timestamp Everything is a byte[] except the Timestamp which is a long
  15. 15. Physical Layout Key ValueRow ID Col Fam Col Qual Col Vis Time Value Alice properties age public March 2011 31 Alice properties phone private Feb 2011 555-1234 Alice purchases Xbox public Feb 2011 $299 Bob properties phone private March 2011 555-4321 Bob purchases iPhone Public Feb 2011 $399
  16. 16. Queries •By exact Key or range of Keys •Data is always returned in sorted orderQuery Requirements Drive Data Model Design
  17. 17. http://incubator.apache.org/accumulo/user_manual_1.4-incubating/Accumulo_Design.html
  18. 18. Hadoop Clients MapReduce Read/ Analytics Write Accumulo Configuration/ Storage StateHadoop HDFS Zookeeper
  19. 19. Table Tablets Accumulo… Tablet Server … … Tablet Server … ... … Tablet Server … Master Data Node Data Node ... Data Node Name Node Hadoop HDFS
  20. 20. Table Tablet Server Failure Tablets 1.) Detect FailureAccumulo Tablet Server Tablet Server ... Tablet Server Master 2.) Reassign Data Node Data Node ... Data Node Name Node Hadoop HDFS
  21. 21. Writes Write- Ahead Accumulo Log (WAL) Tablet Server 1 Tablet 2 MemTableClient Data Node ... Data Node Data Node Hadoop HDFS
  22. 22. Writes Write- Ahead Accumulo Log (WAL) Tablet Server 1 Tablet 2 MemTableClient 3 File 1 Data Node ... Data Node Data Node Hadoop HDFS
  23. 23. Compactions Minor MajorThe process of flushing The process ofa MemTable of a Tablet combining multiple filesto a single file in HDFS into a single file
  24. 24. Tablet Splits• Tablets are split when they reach a max size• Always split on row boundary• Master assigns a split Tablet to another Tablet server (no data is moved!)
  25. 25. Reads Accumulo Tablet Server Tablet MemTableClient File 1 File 1
  26. 26. Accumulo /əˈkjuˈmj ʊ/ ʊˈlo1. Sorted, distributed key/value store with cell-based access control and customizable server-side processing
  27. 27. Iterators: Server-side programminghttp://wiki.eeng.dcu.ie/ee557/287-EE/version/default/part/ImageData/data/server-side_intro.gif
  28. 28. IteratorsCan be run at: Can do things like:•Scan Time •Aggregation (Combiners)•Minor Compaction •Age-Off•Major Compaction •Filtering (access control) •TransformationPush Processing to the Data
  29. 29. Accumulo /əˈkjuˈmj ʊ/ ʊˈlo1. Sorted, distributed key/value store with cell-based access control and customizable server-side processing
  30. 30. Access Control• Every key-value has a visibility label• Label is defined with boolean operators• Label is arbitrary and ad-hoc Public Private | Admin Finance | (HR & Manager)• Authorizations presented at scan time• Data is filtered out automatically by system- level Iterator
  31. 31. Access Control – Typical Architecture Trusted Zone 6.) Return Data 5.) Return Visible Data Web Server Accumulo1.) Pass Credentials 4.) Proxy Authorization 3.) Return Authorizations 2.) Lookup User Enterprise Identity Management
  32. 32. Access Control – Typical Architecture Trusted Zone Accumulo 6.) Return [6,8] 5.) Return [6,8] SECRET&PROJECT X, 6 Web Server SECRET&PROJECT Y, 8 1.) PKI Cert 4.) Proxy Bob’s Auths SECRET&PROJECT Z, 3Bob 3.) Auths:[SECRET, UNCLASSIFIED, 2.) Lookup PROJECT X, PROJECT Y] Bob Enterprise Identity Management
  33. 33. Demo
  34. 34. Application RequirementsBuild an application to analyze trends in Twittermessages.•Query for word/phrase and view real-time activityin a time series graph•View at different time ranges (1 day, 7 days, 30days, etc)•Allow multiple query terms to compare activity (ex.Breakfast,Lunch)•Automatically extract daily trends for the user
  35. 35. Demo Setup/Data• Twitter Streaming API• US country codes only messages• 1,2,3-grams built• Data since Dec 24 – Live• Running on average workstation, 1 SATA disk, 6 GB memory.• 72GB, 2.6 billion entries and counting
  36. 36. Data Model• Tweets table – Row ID: n-gram – Column Family: Date Granularity (DAY, HOUR) – Column Qual: Date Value – Value: Count – SummingCombiner (Iterator) used to update Count Row ID Col Fam Col Qual Value breakfast DAY 20120318 31 breakfast DAY 20120319 56 … … … … lunch HOUR 2012031801 3 lunch HOUR 2012031802 4
  37. 37. Data Model• Trends table – Row ID: (Date Granularity + Date Value) – Column Family: (Integer.MAX_VALUE – trendScore) – Column Qual: n-gram – Value: [] Row ID Col Fam Col Qual ValueDAY:20120318 2147483145 churchDAY:20120318 2147483316 hangover … … … …DAY:20120319 2147476521 the broncosDAY:20120319 2147477704 tim tebow
  38. 38. MapReduce Analytics• Utilize MapReduce for building trends• AccumuloInputFormat reads from tweets table• AccumuloOutputFormat writes to trends table• AccumuloStorage LoadFunc for Pig available on github
  39. 39. Summary•Accumulo exploits locality to enableinteractive access to huge data sets whileadding cell-level access control and server-side programming•Nothing in life is free. Accumulo comes withthe complexity and responsibility ofmanaging a distributed system and designingindexes on your data
  40. 40. References• Documentation, Mailing Lists, Linkshttp://incubator.apache.org/accumulo/• HBase Shootouthttp://www.slideshare.net/cloudera/h-base-and-accumulo-todd-lipcom-jan-25-2012• Trendulohttps://github.com/jaredwinick/trendulo

×