sqrrl data, INC.
                                                        Secure. Scale. Adapt.


                                                                        Adam Fuchs, Chief Technology Officer




info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Who We are



                                     is the commercial
                                         provider of


                Mature Database Technology - Apache Accumulo
                Fine-Grained Access Controls - Data Integration and Sharing
                Proven Performance - Petabytes and Beyond
                Advanced Analytics - Search, Statistics, and Graphs


                                                                                                      2
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Contents


                    Core Philosophy
                    Technology
                    Techniques
                    Application APIs




                                                                                                      3
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Apache Accumulo Perspective

         Data                  Data             Data
                                                                           Integration across:

                                                                                Multiple business lines
                                                                                Multiple data sets
                                                                                Multiple applications
                                                                                Multiple security, privacy, legal,
     Application          Application        Application
                                                                                policy, regulatory, and
                                                                                compliance constraints
                                                                                New demands




                                                                                                              4
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Accumulo Design Drivers

                       Cell-Level Security
       1                Express common security requirements in the infrastructure, not just in the application
                        Data-centric approach encourages secure sharing



                      Scalability
       2               Near linear performance improvements at thousands of nodes
                       Durable and reliable under increased failures that come with scale



                      Diverse, Interactive Analytics
       3               Sorted key/value core performs well in a diverse set of domains
                       Information retrieval, statistics, graph analysis, geo indexing, and more


                      Flexible, Adaptive Schema
       4               Start with universal structures and indexing
                       Refine the schema over time


                                                                                                                   5
info@sqrrl.com | @sqrrl_inc | 617.520.4375    sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Contents


                    Core Philosophy
                    Technology
                    Techniques
                    Application APIs




                                                                                                      6
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Accumulo Key Structure

      An Accumulo key is a 5-tuple, consisting of:
           Row: Controls Atomicity
           Column Family: Controls Locality
           Column Qualifier: Controls Uniqueness
           Visibility Label: Controls Access
           Timestamp: Controls Versioning


          Row             Col. Fam.             Col. Qual.              Visibility      Timestamp          Value
                                                                                                    Patient suffers
      John Doe         Notes                 PCP                    PCP_JD              20120912
                                                                                                    from an acute …
      John Doe         Test Results          Cholesterol            JD|PCP_JD           20120912    183
      John Doe         Test Results          Mental Health          JD|PSYCH_JD         20120801    Pass
      John Doe         Test Results          X-Ray                  JD|PHYS_JD          20120513    1010110110100…

                                              Accumulo Key/Value Example

                                                                                                                   7
info@sqrrl.com | @sqrrl_inc | 617.520.4375      sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Visibility Syntax & Semantics




                                                                                                      8
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Tablets
         Well-Known
           Location
         (zookeeper)
                                                                             Collections of KV pairs form Tables
                                                                             Tables are partitioned into Tablets
                           Root Tablet
                            -∞ to ∞                                          Metadata tablets hold info about
                                                                             other tablets, forming a 3-level
                                                                             hierarchy
         Metadata Tablet 1            Metadata Tablet 2                      A Tablet is a unit of work for a Tablet
        -∞ to “Encyclopedia:Ocelot”   “Encyclopedia:Ocelot” to ∞             Server


      Table: Adam’s Table                                          Table: Encyclopedia                     Table: Foo

      Data Tablet         Data Tablet                 Data Tablet        Data Tablet        Data Tablet     Data Tablet
       -∞ : thing          thing : ∞                  -∞ : Ocelot        Ocelot : Yak        Yak : ∞         -∞ to ∞

                                                                                                                       9
info@sqrrl.com | @sqrrl_inc | 617.520.4375          sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Accumulo Architecture
                                             Delegate
                     Zookeeper               Authority      Tablet Server
                     Zookeeper
                     Zookeeper
                                                                     Tablet
           Delegate                                                                      Read/Write
                                                                                                       Application
           Authority                                        Tablet Server
                                      Assign/Balance


                        Master                                                                         Application
                                                                     Tablet

                                      Store/Replicate                                                  Application
                                                            Tablet Server


                     Hadoop
                                                                     Tablet


                                                                                                                10
info@sqrrl.com | @sqrrl_inc | 617.520.4375       sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Tablet Data Flow


                                                          Tablet
                                                                                     Scan
                                 In-Memory                                                  Iterator
                                                                                                           Reads
       Writes                                                  Iterator                       Tree
                                    Map             Minor        Tree

                                                  Compaction


                                                          Sorted, Ind        Sorted, Ind
                                                           exed File          exed File

                             Write Ahead                                                     Sorted, Ind
                                  Log                                          Iterator       exed File
                            (For Recovery)                   Merging /    Major Tree
                                                              Compaction




                                                                                                               11
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Contents


                    Core Philosophy
                    Technology
                    Techniques
                    Application APIs




                                                                                                     16
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Hierarchical Decomposition

                          Row:                                                  <person>



      Column Family:                               attribute                   purchases               returns



 Column Qualifier:                            age          discount sneakers                             hat



                        Value:               <age>           <40%>                   <cost>            <cost>

                                                                                                               17
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Materialized Table
                                                              Key/Value Pair
       Row:                                   bill                                             george




   Column                           attribute        purchases                   attribute purchases returns
   Family:



 Column                age          discount          sneakers                        age     sneakers     hat
Qualifier:



     Value:             49              40%              $100                         27        $83        $42

                                                                                                            18
info@sqrrl.com | @sqrrl_inc | 617.520.4375    sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Forward and Inverted Index

                        Table:               Forward Index                            Inverted Index

                          Row:                      <UUID>                               <Term>


      Column Family:                                <Type>                           <Type> + <Field>


 Column Qualifier:                                  <Field>                              <UUID>


                        Value:                      <Term>                           <Digest of Event>

                                                                                                         19
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Forward and Inverted Index




                                                                                                     20
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Graph Analysis

                        Table:                                               Graph Table

                          Row:                                                 <Node ID>

      Column Family:                            “Node Info”                  “Out Edges”     “In Edges”

 Column Qualifier:                                  <Field>                    <Node ID>     <Node ID>
        (Tuples):
                                                                               <Edge ID>     <Edge ID>

                        Value:                      <Value>                   <Edge Info>   <Edge Info>

                                                                                                      21
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Geospatial Queries
                 Table:                 Geo Index                    Latitude    Longitude   Depth
                                                                     10110101001 00111010010 11010110110

                   Row:               <GeoHash>
                                                                    101001110111010101011100001011100


  Column Family:                     <Event Type>



Column Qualifier:                         <UUID>



                 Value:           <Digest of Event>

                                                                                                      22
 info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Document Partitioning

                   Table:                                       Shard Table

                     Row:                                     <Partition ID>

 Column Family:                              “Docs” “Inv. Index” “Field Index”                      “Geo”

Column Qualifier                         <UUID>                <Term>               <Field:Term> <Hash>
       (Tuples):
                                         <Field>               <UUID>                   <UUID>     <UUID>

                   Value:               <Value>

                                                                                                            23
info@sqrrl.com | @sqrrl_inc | 617.520.4375      sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Document Partitioning




                                                                                                     24
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Intersecting Iterator
                                                                        ‘foo’ and (‘bar’ or ‘baz’)


                 <Partition ID>

            “Docs” “Inv. Index”

           <UUID>             <Term>

            <Field>           <UUID>

           <Value>




                                                                                                          26
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Contents


                    Core Philosophy
                    Technology
                    Techniques
                    Application APIs




                                                                                                     27
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
acorn

    Key/Value pairs are great!                                                                       =
    How do I construct a document
    partitioning key again?
           Techniques should be built into an API
           Let the people have polyglot
           Lucene, SQL, SPARQL, JAQL, Matlab
           (not just Key, Value, Range)
                                                                                      +
                                                                                      +
                                                                                                     28
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Combined IR + Graph Search




                                                                                                     29
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Schema-less Stats




                                                                                                     30
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Get Involved

                           http://accumulo.apache.org
                Help us make Accumulo even better!




                                                                                                     31
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Contact




                                              Adam Fuchs, CTO

                                                  sqrrl data, Inc.
                                                  617-520-4375
                                                 www.sqrrl.com
                                                    @sqrrl_inc
                                                 info@sqrrl.com

                                                                                                     32
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved

Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

  • 1.
    sqrrl data, INC. Secure. Scale. Adapt. Adam Fuchs, Chief Technology Officer info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 2.
    Secure. Scale. Adapt. WhoWe are is the commercial provider of Mature Database Technology - Apache Accumulo Fine-Grained Access Controls - Data Integration and Sharing Proven Performance - Petabytes and Beyond Advanced Analytics - Search, Statistics, and Graphs 2 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 3.
    Secure. Scale. Adapt. Contents Core Philosophy Technology Techniques Application APIs 3 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 4.
    Secure. Scale. Adapt. ApacheAccumulo Perspective Data Data Data Integration across: Multiple business lines Multiple data sets Multiple applications Multiple security, privacy, legal, Application Application Application policy, regulatory, and compliance constraints New demands 4 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 5.
    Secure. Scale. Adapt. AccumuloDesign Drivers Cell-Level Security 1  Express common security requirements in the infrastructure, not just in the application  Data-centric approach encourages secure sharing Scalability 2  Near linear performance improvements at thousands of nodes  Durable and reliable under increased failures that come with scale Diverse, Interactive Analytics 3  Sorted key/value core performs well in a diverse set of domains  Information retrieval, statistics, graph analysis, geo indexing, and more Flexible, Adaptive Schema 4  Start with universal structures and indexing  Refine the schema over time 5 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 6.
    Secure. Scale. Adapt. Contents Core Philosophy Technology Techniques Application APIs 6 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 7.
    Secure. Scale. Adapt. AccumuloKey Structure An Accumulo key is a 5-tuple, consisting of: Row: Controls Atomicity Column Family: Controls Locality Column Qualifier: Controls Uniqueness Visibility Label: Controls Access Timestamp: Controls Versioning Row Col. Fam. Col. Qual. Visibility Timestamp Value Patient suffers John Doe Notes PCP PCP_JD 20120912 from an acute … John Doe Test Results Cholesterol JD|PCP_JD 20120912 183 John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100… Accumulo Key/Value Example 7 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 8.
    Secure. Scale. Adapt. VisibilitySyntax & Semantics 8 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 9.
    Secure. Scale. Adapt. Tablets Well-Known Location (zookeeper) Collections of KV pairs form Tables Tables are partitioned into Tablets Root Tablet -∞ to ∞ Metadata tablets hold info about other tablets, forming a 3-level hierarchy Metadata Tablet 1 Metadata Tablet 2 A Tablet is a unit of work for a Tablet -∞ to “Encyclopedia:Ocelot” “Encyclopedia:Ocelot” to ∞ Server Table: Adam’s Table Table: Encyclopedia Table: Foo Data Tablet Data Tablet Data Tablet Data Tablet Data Tablet Data Tablet -∞ : thing thing : ∞ -∞ : Ocelot Ocelot : Yak Yak : ∞ -∞ to ∞ 9 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 10.
    Secure. Scale. Adapt. AccumuloArchitecture Delegate Zookeeper Authority Tablet Server Zookeeper Zookeeper Tablet Delegate Read/Write Application Authority Tablet Server Assign/Balance Master Application Tablet Store/Replicate Application Tablet Server Hadoop Tablet 10 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 11.
    Secure. Scale. Adapt. TabletData Flow Tablet Scan In-Memory Iterator Reads Writes Iterator Tree Map Minor Tree Compaction Sorted, Ind Sorted, Ind exed File exed File Write Ahead Sorted, Ind Log Iterator exed File (For Recovery) Merging / Major Tree Compaction 11 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 12.
    Secure. Scale. Adapt. Contents Core Philosophy Technology Techniques Application APIs 16 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 13.
    Secure. Scale. Adapt. HierarchicalDecomposition Row: <person> Column Family: attribute purchases returns Column Qualifier: age discount sneakers hat Value: <age> <40%> <cost> <cost> 17 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 14.
    Secure. Scale. Adapt. MaterializedTable Key/Value Pair Row: bill george Column attribute purchases attribute purchases returns Family: Column age discount sneakers age sneakers hat Qualifier: Value: 49 40% $100 27 $83 $42 18 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 15.
    Secure. Scale. Adapt. Forwardand Inverted Index Table: Forward Index Inverted Index Row: <UUID> <Term> Column Family: <Type> <Type> + <Field> Column Qualifier: <Field> <UUID> Value: <Term> <Digest of Event> 19 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 16.
    Secure. Scale. Adapt. Forwardand Inverted Index 20 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 17.
    Secure. Scale. Adapt. GraphAnalysis Table: Graph Table Row: <Node ID> Column Family: “Node Info” “Out Edges” “In Edges” Column Qualifier: <Field> <Node ID> <Node ID> (Tuples): <Edge ID> <Edge ID> Value: <Value> <Edge Info> <Edge Info> 21 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 18.
    Secure. Scale. Adapt. GeospatialQueries Table: Geo Index Latitude Longitude Depth 10110101001 00111010010 11010110110 Row: <GeoHash> 101001110111010101011100001011100 Column Family: <Event Type> Column Qualifier: <UUID> Value: <Digest of Event> 22 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 19.
    Secure. Scale. Adapt. DocumentPartitioning Table: Shard Table Row: <Partition ID> Column Family: “Docs” “Inv. Index” “Field Index” “Geo” Column Qualifier <UUID> <Term> <Field:Term> <Hash> (Tuples): <Field> <UUID> <UUID> <UUID> Value: <Value> 23 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 20.
    Secure. Scale. Adapt. DocumentPartitioning 24 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 21.
    Secure. Scale. Adapt. IntersectingIterator ‘foo’ and (‘bar’ or ‘baz’) <Partition ID> “Docs” “Inv. Index” <UUID> <Term> <Field> <UUID> <Value> 26 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 22.
    Secure. Scale. Adapt. Contents Core Philosophy Technology Techniques Application APIs 27 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 23.
    Secure. Scale. Adapt. acorn Key/Value pairs are great! = How do I construct a document partitioning key again? Techniques should be built into an API Let the people have polyglot Lucene, SQL, SPARQL, JAQL, Matlab (not just Key, Value, Range) + + 28 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 24.
    Secure. Scale. Adapt. CombinedIR + Graph Search 29 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 25.
    Secure. Scale. Adapt. Schema-lessStats 30 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 26.
    Secure. Scale. Adapt. GetInvolved http://accumulo.apache.org Help us make Accumulo even better! 31 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 27.
    Secure. Scale. Adapt. Contact Adam Fuchs, CTO sqrrl data, Inc. 617-520-4375 www.sqrrl.com @sqrrl_inc info@sqrrl.com 32 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Editor's Notes

  • #10 Tablet Servers have 4 primary functions:Hosting RPCs (read, write, etc.)Managing resources (RAM, CPU, File I/O, etc.)Scheduling background tasks (compactions, caching, etc.)Handling key/value pairs (via Iterators)