Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Apache Accumulo

708 views

Published on

General overview of Apache Accumulo's use for scaling an interactive web application.

Published in: Technology
  • Be the first to comment

Introduction to Apache Accumulo

  1. 1. 1© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 How to use this presentation • Covered topics: Accumulo architecture, operational maintenance, fault handling • Intended Audience: Developers, supporters, PMs who are conversant in multi-component systems, i.e. involved in web services. • Presumes familiarity with RDBMS • Expected running time: 40 - 60 minutes • License: CC-BY-SA 2.0 • Please let me know if you find it useful and what it could use: busbey@cloudera.com
  2. 2. Introduction to Apache Accumulo Scaling a web application made easier Sean Busbey // Software Engineer
  3. 3. 3© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Let’s talk about Apache Accumulo…
  4. 4. 4© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 But in the context of a specific use case •I really like technology that solves a problem. •Keep in mind that this won’t be exhaustive. •YMMV, proof-of-concepts with metrics are better than slides.
  5. 5. 5© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Who am I? • Apache Accumulo PMC • Apache HBase committer • Software Engineer on Cloudera’s storage team
  6. 6. 6© 2015 Cloudera licensed CC-BY-SA 2.0 That is to say, I work for a vendor and no longer have operational scale problems of my own.
  7. 7. We’ll focus on an application that enables conversations centered on cute cats.
  8. 8. 8© 2015 Cloudera licensed CC-BY-SA 2.0
  9. 9. 9© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Simple sharing model built with privacy controls •User defines a group that may see their posting •User posts a picture to a given group •Members of the group may write short messages
  10. 10. 10© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Straight forward web architecture
  11. 11. 11© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Relational Data Model Will map user names to identifiers used elsewhere. Will track ownership and descriptive name. Will allow users to add and remove members. User table Group table Group membership table
  12. 12. 12© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Relational Data Model Tracks distribution group, owner, and topical image. Individual comments from users. Topic table Comment table
  13. 13. 13© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 First growth: robustness
  14. 14. 14© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 First growth: robustness
  15. 15. 15© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Second growth: application scale out
  16. 16. 16© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Scaling reads: what goes into this page?
  17. 17. 17© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Database reads eventually become a bottleneck
  18. 18. 18© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Scale by de-normalizing in favor of reads
  19. 19. 19© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Change to writes - original
  20. 20. 20© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Change to writes – de-normalized
  21. 21. Generally known as the fan-out pattern. 21© 2015 Cloudera licensed CC-BY-SA 2.0
  22. 22. 22© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 The trick is to not get crushed by the writes •Each poster now does a write for each member of the group a post goes to. •Removing access is now a much larger delete query. •Most databases are geared toward few writes and many reads; are we screwed?
  23. 23. 23© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Recall our access pattern
  24. 24. Basically one of these consumer boxes. 24© 2015 Cloudera licensed CC-BY-SA 2.0
  25. 25. 25© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Lines up very well with sharding •Divide the query space up by e.g. a hash of user id into n shards. •Store a copy of table on each shard, but just for user ids that hash to that shard. •Reads and writes are spread across instances.
  26. 26. 26© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Database shards Layout
  27. 27. 27© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 What were the nice-to-haves for the RDBMS again? • No longer leveraging relational data model. • Now running, backing up, and failing over num shards number of database instances. • Robustness in a shard has to be managed. • Sharding is essentially static; adding more resources with growth still painful.
  28. 28. 28© 2015 Cloudera licensed CC-BY-SA 2.0 Now we have some context for Accumulo. Our goal is to end up with less operational overhead.
  29. 29. 29© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 “The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.” Accumulo PMC via https://accumulo.apache.org/
  30. 30. 30© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Accumulo-based App Layout
  31. 31. 31© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 “The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.” Accumulo PMC via https://accumulo.apache.org/
  32. 32. 32© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 In Accumulo, you address cells rather than records Key Valu e
  33. 33. 33© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Keys are multi-dimensional Key Valu e Ro w Column Tim e
  34. 34. 34© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Keys are multi-dimensional Key Valu e Ro w Column Tim eFamily Qualifier Visibility
  35. 35. 35© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Accumulo doesn’t assume a schema •All key and value components, save time, are byte[] •The application is responsible for serialization •Common to use different serialization for the values in different columns.
  36. 36. 36© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Mapping records to cells •Treat a row as a database • Essentially each column is a record field •Treat each cell as a database record • Need to uniquely identify each record • Useful if you generally need the whole row and not a subset of columns • Can then treat each row as a shard of database records.
  37. 37. 37© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Let’s use a concrete example.
  38. 38. 38© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Already know our reads are within a shard.
  39. 39. 39© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Mapping our data into cells Key Value Row Column Family Column Qualifier Visibility author, image url, and comment reader id discussion id comment order group id
  40. 40. 40© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 We end up with something close to our original.
  41. 41. 41© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Note the use of visibility
  42. 42. 42© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Visibility enforcement •At scan time, our application will pass in the groups for the current user. •Accumulo will filter any cells that don’t match those groups. • Group removal is a simple update in the group management system again.
  43. 43. 43© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Sparse column storage •We are creating lots of columns: per discussion per group member. •Accumulo only stores columns that exist in a given row.
  44. 44. 44© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 “The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.” Accumulo PMC via https://accumulo.apache.org/
  45. 45. 45© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 All cells sorted according to key • Total ordering based on lex-sort of raw byte arrays of key components. • Time is sorted most-recent-first • Reads are done on a contiguous range of cells.
  46. 46. 46© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 When sorted our data looks like this….
  47. 47. 47© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 And the scan for a page is roughly…
  48. 48. 48© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Lexicoders • Turning different kinds of data into sortable bytes is painful • Accumulo ships implementations for several common Java types • Also for e.g. reversing the sort order and building compound keys.
  49. 49. 49© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Inefficiencies in our data model Key Value Row Column Family Column Qualifier Visibility author, image url, and comment reader id discussion id comment order group id
  50. 50. 50© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Two categories of data Key Value Row Column Family Column Qualifier Visibility author, image url reader id discussion id image group id Key Value Row Column Family Column Qualifier Visibility author, comment reader id discussion id text group id
  51. 51. 51© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 And now our data looks like this
  52. 52. 52© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 And the scan for a page covers less data
  53. 53. 53© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 “The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.” Accumulo PMC via https://accumulo.apache.org/
  54. 54. 54© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Our simplified diagram
  55. 55. 55© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Slightly less simplified
  56. 56. 56© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Back to the data model Key Valu e Ro w Column Tim eFamily Qualifier Visibility
  57. 57. 57© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Back to the data model Key Valu e Ro w Column Tim eFamily Qualifier Visibility
  58. 58. 58© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Rows are grouped into Tablets • Tablet is defined by a start and end row • All cells for a given row must be in the same Tablet.
  59. 59. 59© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Tablets are assigned to Tablet Servers • At any given point in time, a Tablet is serviced by a single Tablet Server
  60. 60. 60© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Slightly less simplified
  61. 61. 61© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Tablets are assigned to Tablet Servers • At any given point in time, a Tablet is serviced by a single Tablet Server • That server is responsible for client reads and writes to all hosted Tablets • Finding the proper server is handled by the Accumulo libraries • Proper key design means io load gets spread across multiple machines
  62. 62. 62© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 “The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.” Accumulo PMC via https://accumulo.apache.org/
  63. 63. 63© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Tablet assignment is not static • Assignment tend to have steady state • But can move in the event of new resources or failure
  64. 64. 64© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Remember our RDBMS scaling?
  65. 65. 65© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 New RDBMS shard 1. Provision hardware for service 2. Rewrite data under new sharding 3. Update application services • Doing this without an outage is hard work (and well paid if you can get it)
  66. 66. 66© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 New Accumulo Tablet Server 1. Provision hardware for service 2. Add server to cluster 3. Tablets automatically migrate from busier nodes to new node • No outage from client perspective.
  67. 67. 67© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 “The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.” Accumulo PMC via https://accumulo.apache.org/
  68. 68. 68© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 All distributed systems have communication failures In the face of such a failure you can either • remain available on remaining nodes to all clients • provide a consistent view of updates to a subset of clients
  69. 69. 69© 2015 Cloudera licensed CC-BY-SA 2.0 Now you know the basics of CAP Remember that you can’t give up partition tolerance
  70. 70. 70© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Remember our RDBMS robustness?
  71. 71. 71© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Accumulo is a CP system • Tablet Servers ensure that updates have been written to a distributed write-ahead-log before acknowledging • Tablet Server failures are automatically detected • Newly assigned hosts for recovered Tablets then replay edits up until last ack before serving new requests
  72. 72. 72© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
  73. 73. 73© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Client write
  74. 74. 74© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Write goals • Low latency ack • Don’t lose acked writes in face of node failure
  75. 75. 75© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Client write 1
  76. 76. 76© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Client write 1 2
  77. 77. 77© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Client write 1 2 3
  78. 78. 78© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
  79. 79. 79© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
  80. 80. 80© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
  81. 81. 81© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
  82. 82. 82© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0
  83. 83. 83© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Recovery timing • Tunable time to detection – increases network load • Size of outstanding write ahead logs
  84. 84. 84© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Client write 1 2 3 4
  85. 85. 85© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Accumulo-based App Layout
  86. 86. 86© 2015 Cloudera licensed CC-BY-SA 2.0 What’s the catch?
  87. 87. 87© 2015 Cloudera, Inc. licensed CC-BY-SA 2.0 Gaps • Still requires application updates to use API – no interactive SQL bindings* • No Disaster Recovery – coming in next minor release
  88. 88. Thank you. Mr. Mean photo from mockup is © 2004 Flickr user aznewbeginning; cc-by-sa 2.0 https://flic.kr/p/4uzdRc

×