Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

3,303 views
2,973 views

Published on

Apache Accumulo, originally developed by the National Security Agency and now an Apache Software Foundation project, builds upon Google's Bigtable design to provide a scalable, lightly-structured database capability complementing the ubiquitous Hadoop environment. The core capabilities of Accumulo include cell-level security, flexible schemas, real-time analytics, bulk I/O, and linear scalability beyond trillions of entries and petabytes of data. These new capabilities lead to techniques that unlock the power of Big Data, but don't fit into traditional database design patterns. Learn about the advantages of Apache Accumulo and how it fits into the Hadoop and NoSQL ecosystem.

Presenter: Adam Fuchs, CTO, sqrrl

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,303
On SlideShare
0
From Embeds
0
Number of Embeds
215
Actions
Shares
0
Downloads
101
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • Tablet Servers have 4 primary functions:Hosting RPCs (read, write, etc.)Managing resources (RAM, CPU, File I/O, etc.)Scheduling background tasks (compactions, caching, etc.)Handling key/value pairs (via Iterators)
  • Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

    1. 1. sqrrl data, INC. Secure. Scale. Adapt. Adam Fuchs, Chief Technology Officerinfo@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    2. 2. Secure. Scale. Adapt.Who We are is the commercial provider of Mature Database Technology - Apache Accumulo Fine-Grained Access Controls - Data Integration and Sharing Proven Performance - Petabytes and Beyond Advanced Analytics - Search, Statistics, and Graphs 2info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    3. 3. Secure. Scale. Adapt.Contents Core Philosophy Technology Techniques Application APIs 3info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    4. 4. Secure. Scale. Adapt.Apache Accumulo Perspective Data Data Data Integration across: Multiple business lines Multiple data sets Multiple applications Multiple security, privacy, legal, Application Application Application policy, regulatory, and compliance constraints New demands 4info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    5. 5. Secure. Scale. Adapt.Accumulo Design Drivers Cell-Level Security 1  Express common security requirements in the infrastructure, not just in the application  Data-centric approach encourages secure sharing Scalability 2  Near linear performance improvements at thousands of nodes  Durable and reliable under increased failures that come with scale Diverse, Interactive Analytics 3  Sorted key/value core performs well in a diverse set of domains  Information retrieval, statistics, graph analysis, geo indexing, and more Flexible, Adaptive Schema 4  Start with universal structures and indexing  Refine the schema over time 5info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    6. 6. Secure. Scale. Adapt.Contents Core Philosophy Technology Techniques Application APIs 6info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    7. 7. Secure. Scale. Adapt.Accumulo Key Structure An Accumulo key is a 5-tuple, consisting of: Row: Controls Atomicity Column Family: Controls Locality Column Qualifier: Controls Uniqueness Visibility Label: Controls Access Timestamp: Controls Versioning Row Col. Fam. Col. Qual. Visibility Timestamp Value Patient suffers John Doe Notes PCP PCP_JD 20120912 from an acute … John Doe Test Results Cholesterol JD|PCP_JD 20120912 183 John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100… Accumulo Key/Value Example 7info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    8. 8. Secure. Scale. Adapt.Visibility Syntax & Semantics 8info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    9. 9. Secure. Scale. Adapt.Tablets Well-Known Location (zookeeper) Collections of KV pairs form Tables Tables are partitioned into Tablets Root Tablet -∞ to ∞ Metadata tablets hold info about other tablets, forming a 3-level hierarchy Metadata Tablet 1 Metadata Tablet 2 A Tablet is a unit of work for a Tablet -∞ to “Encyclopedia:Ocelot” “Encyclopedia:Ocelot” to ∞ Server Table: Adam’s Table Table: Encyclopedia Table: Foo Data Tablet Data Tablet Data Tablet Data Tablet Data Tablet Data Tablet -∞ : thing thing : ∞ -∞ : Ocelot Ocelot : Yak Yak : ∞ -∞ to ∞ 9info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    10. 10. Secure. Scale. Adapt.Accumulo Architecture Delegate Zookeeper Authority Tablet Server Zookeeper Zookeeper Tablet Delegate Read/Write Application Authority Tablet Server Assign/Balance Master Application Tablet Store/Replicate Application Tablet Server Hadoop Tablet 10info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    11. 11. Secure. Scale. Adapt.Tablet Data Flow Tablet Scan In-Memory Iterator Reads Writes Iterator Tree Map Minor Tree Compaction Sorted, Ind Sorted, Ind exed File exed File Write Ahead Sorted, Ind Log Iterator exed File (For Recovery) Merging / Major Tree Compaction 11info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    12. 12. Secure. Scale. Adapt.Contents Core Philosophy Technology Techniques Application APIs 16info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    13. 13. Secure. Scale. Adapt.Hierarchical Decomposition Row: <person> Column Family: attribute purchases returns Column Qualifier: age discount sneakers hat Value: <age> <40%> <cost> <cost> 17info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    14. 14. Secure. Scale. Adapt.Materialized Table Key/Value Pair Row: bill george Column attribute purchases attribute purchases returns Family: Column age discount sneakers age sneakers hatQualifier: Value: 49 40% $100 27 $83 $42 18info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    15. 15. Secure. Scale. Adapt.Forward and Inverted Index Table: Forward Index Inverted Index Row: <UUID> <Term> Column Family: <Type> <Type> + <Field> Column Qualifier: <Field> <UUID> Value: <Term> <Digest of Event> 19info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    16. 16. Secure. Scale. Adapt.Forward and Inverted Index 20info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    17. 17. Secure. Scale. Adapt.Graph Analysis Table: Graph Table Row: <Node ID> Column Family: “Node Info” “Out Edges” “In Edges” Column Qualifier: <Field> <Node ID> <Node ID> (Tuples): <Edge ID> <Edge ID> Value: <Value> <Edge Info> <Edge Info> 21info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    18. 18. Secure. Scale. Adapt.Geospatial Queries Table: Geo Index Latitude Longitude Depth 10110101001 00111010010 11010110110 Row: <GeoHash> 101001110111010101011100001011100 Column Family: <Event Type>Column Qualifier: <UUID> Value: <Digest of Event> 22 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    19. 19. Secure. Scale. Adapt.Document Partitioning Table: Shard Table Row: <Partition ID> Column Family: “Docs” “Inv. Index” “Field Index” “Geo”Column Qualifier <UUID> <Term> <Field:Term> <Hash> (Tuples): <Field> <UUID> <UUID> <UUID> Value: <Value> 23info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    20. 20. Secure. Scale. Adapt.Document Partitioning 24info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    21. 21. Secure. Scale. Adapt.Intersecting Iterator ‘foo’ and (‘bar’ or ‘baz’) <Partition ID> “Docs” “Inv. Index” <UUID> <Term> <Field> <UUID> <Value> 26info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    22. 22. Secure. Scale. Adapt.Contents Core Philosophy Technology Techniques Application APIs 27info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    23. 23. Secure. Scale. Adapt.acorn Key/Value pairs are great! = How do I construct a document partitioning key again? Techniques should be built into an API Let the people have polyglot Lucene, SQL, SPARQL, JAQL, Matlab (not just Key, Value, Range) + + 28info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    24. 24. Secure. Scale. Adapt.Combined IR + Graph Search 29info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    25. 25. Secure. Scale. Adapt.Schema-less Stats 30info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    26. 26. Secure. Scale. Adapt.Get Involved http://accumulo.apache.org Help us make Accumulo even better! 31info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
    27. 27. Secure. Scale. Adapt.Contact Adam Fuchs, CTO sqrrl data, Inc. 617-520-4375 www.sqrrl.com @sqrrl_inc info@sqrrl.com 32info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

    ×