• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
 

Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

on

  • 2,732 views

Apache Accumulo, originally developed by the National Security Agency and now an Apache Software Foundation project, builds upon Google's Bigtable design to provide a scalable, lightly-structured ...

Apache Accumulo, originally developed by the National Security Agency and now an Apache Software Foundation project, builds upon Google's Bigtable design to provide a scalable, lightly-structured database capability complementing the ubiquitous Hadoop environment. The core capabilities of Accumulo include cell-level security, flexible schemas, real-time analytics, bulk I/O, and linear scalability beyond trillions of entries and petabytes of data. These new capabilities lead to techniques that unlock the power of Big Data, but don't fit into traditional database design patterns. Learn about the advantages of Apache Accumulo and how it fits into the Hadoop and NoSQL ecosystem.

Presenter: Adam Fuchs, CTO, sqrrl

Statistics

Views

Total Views
2,732
Views on SlideShare
2,519
Embed Views
213

Actions

Likes
3
Downloads
62
Comments
0

2 Embeds 213

http://www.scoop.it 212
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Tablet Servers have 4 primary functions:Hosting RPCs (read, write, etc.)Managing resources (RAM, CPU, File I/O, etc.)Scheduling background tasks (compactions, caching, etc.)Handling key/value pairs (via Iterators)

Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data Presentation Transcript

  • sqrrl data, INC. Secure. Scale. Adapt. Adam Fuchs, Chief Technology Officerinfo@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Who We are is the commercial provider of Mature Database Technology - Apache Accumulo Fine-Grained Access Controls - Data Integration and Sharing Proven Performance - Petabytes and Beyond Advanced Analytics - Search, Statistics, and Graphs 2info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Contents Core Philosophy Technology Techniques Application APIs 3info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Apache Accumulo Perspective Data Data Data Integration across: Multiple business lines Multiple data sets Multiple applications Multiple security, privacy, legal, Application Application Application policy, regulatory, and compliance constraints New demands 4info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Accumulo Design Drivers Cell-Level Security 1  Express common security requirements in the infrastructure, not just in the application  Data-centric approach encourages secure sharing Scalability 2  Near linear performance improvements at thousands of nodes  Durable and reliable under increased failures that come with scale Diverse, Interactive Analytics 3  Sorted key/value core performs well in a diverse set of domains  Information retrieval, statistics, graph analysis, geo indexing, and more Flexible, Adaptive Schema 4  Start with universal structures and indexing  Refine the schema over time 5info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Contents Core Philosophy Technology Techniques Application APIs 6info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Accumulo Key Structure An Accumulo key is a 5-tuple, consisting of: Row: Controls Atomicity Column Family: Controls Locality Column Qualifier: Controls Uniqueness Visibility Label: Controls Access Timestamp: Controls Versioning Row Col. Fam. Col. Qual. Visibility Timestamp Value Patient suffers John Doe Notes PCP PCP_JD 20120912 from an acute … John Doe Test Results Cholesterol JD|PCP_JD 20120912 183 John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100… Accumulo Key/Value Example 7info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Visibility Syntax & Semantics 8info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Tablets Well-Known Location (zookeeper) Collections of KV pairs form Tables Tables are partitioned into Tablets Root Tablet -∞ to ∞ Metadata tablets hold info about other tablets, forming a 3-level hierarchy Metadata Tablet 1 Metadata Tablet 2 A Tablet is a unit of work for a Tablet -∞ to “Encyclopedia:Ocelot” “Encyclopedia:Ocelot” to ∞ Server Table: Adam’s Table Table: Encyclopedia Table: Foo Data Tablet Data Tablet Data Tablet Data Tablet Data Tablet Data Tablet -∞ : thing thing : ∞ -∞ : Ocelot Ocelot : Yak Yak : ∞ -∞ to ∞ 9info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Accumulo Architecture Delegate Zookeeper Authority Tablet Server Zookeeper Zookeeper Tablet Delegate Read/Write Application Authority Tablet Server Assign/Balance Master Application Tablet Store/Replicate Application Tablet Server Hadoop Tablet 10info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Tablet Data Flow Tablet Scan In-Memory Iterator Reads Writes Iterator Tree Map Minor Tree Compaction Sorted, Ind Sorted, Ind exed File exed File Write Ahead Sorted, Ind Log Iterator exed File (For Recovery) Merging / Major Tree Compaction 11info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Contents Core Philosophy Technology Techniques Application APIs 16info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Hierarchical Decomposition Row: <person> Column Family: attribute purchases returns Column Qualifier: age discount sneakers hat Value: <age> <40%> <cost> <cost> 17info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Materialized Table Key/Value Pair Row: bill george Column attribute purchases attribute purchases returns Family: Column age discount sneakers age sneakers hatQualifier: Value: 49 40% $100 27 $83 $42 18info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Forward and Inverted Index Table: Forward Index Inverted Index Row: <UUID> <Term> Column Family: <Type> <Type> + <Field> Column Qualifier: <Field> <UUID> Value: <Term> <Digest of Event> 19info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Forward and Inverted Index 20info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Graph Analysis Table: Graph Table Row: <Node ID> Column Family: “Node Info” “Out Edges” “In Edges” Column Qualifier: <Field> <Node ID> <Node ID> (Tuples): <Edge ID> <Edge ID> Value: <Value> <Edge Info> <Edge Info> 21info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Geospatial Queries Table: Geo Index Latitude Longitude Depth 10110101001 00111010010 11010110110 Row: <GeoHash> 101001110111010101011100001011100 Column Family: <Event Type>Column Qualifier: <UUID> Value: <Digest of Event> 22 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Document Partitioning Table: Shard Table Row: <Partition ID> Column Family: “Docs” “Inv. Index” “Field Index” “Geo”Column Qualifier <UUID> <Term> <Field:Term> <Hash> (Tuples): <Field> <UUID> <UUID> <UUID> Value: <Value> 23info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Document Partitioning 24info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Intersecting Iterator ‘foo’ and (‘bar’ or ‘baz’) <Partition ID> “Docs” “Inv. Index” <UUID> <Term> <Field> <UUID> <Value> 26info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Contents Core Philosophy Technology Techniques Application APIs 27info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.acorn Key/Value pairs are great! = How do I construct a document partitioning key again? Techniques should be built into an API Let the people have polyglot Lucene, SQL, SPARQL, JAQL, Matlab (not just Key, Value, Range) + + 28info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Combined IR + Graph Search 29info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Schema-less Stats 30info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Get Involved http://accumulo.apache.org Help us make Accumulo even better! 31info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • Secure. Scale. Adapt.Contact Adam Fuchs, CTO sqrrl data, Inc. 617-520-4375 www.sqrrl.com @sqrrl_inc info@sqrrl.com 32info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved