Sqrrl real time_big_data_20130411
Upcoming SlideShare
Loading in...5
×
 

Sqrrl real time_big_data_20130411

on

  • 346 views

Sqrrl CTO, Adam Fuchs, discusses Sqrrl and Accumulo at April 2013 Boston Hadoop User Group

Sqrrl CTO, Adam Fuchs, discusses Sqrrl and Accumulo at April 2013 Boston Hadoop User Group

Statistics

Views

Total Views
346
Views on SlideShare
334
Embed Views
12

Actions

Likes
0
Downloads
23
Comments
0

1 Embed 12

http://192.168.6.179 12

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Sqrrl real time_big_data_20130411 Sqrrl real time_big_data_20130411 Presentation Transcript

  • sqrrl  Secure.  Scale.  Adapt  Sqrrl  Data,  Inc.    All  Rights  Reserved  sqrrl  Secure.  Scale.  Adapt.  Adam  Fuchs,  CTO  11  April,  2013  
  • 2  Sqrrl  Data,  Inc.    All  Rights  Reserved  ManagementEly Kahnsqrrl VP BizDev,White HouseInvestorsAdamFuchssqrrl CTO, NSAWho  We  Are  20+  years  of  combined  Apache  Accumulo  engineering  exper9se  MarkTerenzonisqrrl CEO, F5•  Founded  July  2012  •  Funded  August  2012  •  Team  includes  former  Tech  Director  of  Accumulo  at  NSA  and  6  commiDers/contributors    
  • 3  Sqrrl  Data,  Inc.    All  Rights  Reserved  3  Our  Mission  Security  AdapGvity  Scalability  
  • 4  Sqrrl  Data,  Inc.    All  Rights  Reserved  4  Apache  Accumulo  "   Sorted, Distributed Key/Value Store"   Based on Google’s Big Table Design"   Built on Top of Apache Hadoop and Apache Zookeeper"   Augments and Integrates With the Hadoop ecosystem"   Originally developed at the National Security Agency, nowan Apache Software Foundation project
  • 5  Sqrrl  Data,  Inc.    All  Rights  Reserved  5  Applica9ons  Analy9cs  APIs  Security  &  Access  Controls  Data  Integra9on  Search,  Sta*s*cs,  Graph,  Lucene,  SQL,  Custom  Extensions  IAM,  Encryp*on,  DAM,  Secure  Code  ETL,  Hadoop  Accumulo  Sqrrl  Enterprise  Architecture  
  • 6  Sqrrl  Data,  Inc.    All  Rights  Reserved  "   Start  small,  but  design  for  scalability  –  One  applicaGon  first,  then  grow  to  hundreds  –  One  gigabyte  first,  then  grow  to  petabytes  "   Itera*ve  schema  refinement  –  IniGally,  let  the  data  define  the  schema  –  Refine  the  schema  in  bulk  as  you  beDer  understand  the  data  –  Middle  ground  between  flat  files  and  complete  ontologies  "   Discovery  analy*cs  as  applica*on  building  blocks  –  Universal  search:  structured  and  unstructured  data,  across  data  sets,  low  latency  –  Basic  staGsGcs:  aggregaGons  of  query  results,  parallelized,  low  latency,  to  support  big  picture  analysis  –  Graphs:  scalable  graph  analyGcs  for  analyzing  how  everything  is  connected  "   Data-­‐centric  security  –  Separate  modeling  of  security  and  analysis  –  Simplifies  mulG-­‐tenancy  and  applicaGon  accreditaGon  Big  Data  Lessons  Learned  
  • 7  Sqrrl  Data,  Inc.    All  Rights  Reserved  7  Schema  Discovery  
  • 8  Sqrrl  Data,  Inc.    All  Rights  Reserved  The  future  of  Big  Data  innovaGon  is  Apps,  built  on:  •  Universal  Search  •  Schema-­‐less  StaGsGcs  •  Graphs  •  IntuiGve  Languages  •  Secure,  Scalable,  and  Adaptable  plaorms  Lightweight  Apps  
  • 9  Sqrrl  Data,  Inc.    All  Rights  Reserved  9  Targeted  Analysis  
  • 10  Sqrrl  Data,  Inc.    All  Rights  Reserved  10  Big-Picture  Analytics  
  • 11  Sqrrl  Data,  Inc.    All  Rights  Reserved  DefiniGon:  A  form  of  security  in  which  data  carries  with  it  the  elements  of  provenance  that  are  required  to  make  policy  decisions  on  its  releasability.  •  Separate  data  modeling  for  Security  and  Analysis  •  Reusability  of  applicaGons  across  security  domains  •  Distributed  development  of  ingest  and  query  applicaGons  •  Supported  by  Accumulo’s  cell-­‐level  security  Data-Centric  Security  
  • 12  Sqrrl  Data,  Inc.    All  Rights  Reserved  12  Cell-Level  Security  
  • 13  Sqrrl  Data,  Inc.    All  Rights  Reserved  13  Scalable  Data-Centric  Security  Data   Labeler   Accumulo   Apps  User  ACributes  Audits  Policies  HDFS,  Zookeeper  End  Users  Auth.  Service  Policy  Engine  
  • 14  Sqrrl  Data,  Inc.    All  Rights  Reserved  14  Accumulo’s  Strengths  "   Security  –  Cell-­‐level  security  reduces  the  cost  of  applicaGon  development  in  the  presence  of  complex  legal  or  policy  restricGons  on  data  use  –  IAM  and  encrypGon  Ges  into  enterprise  security  standards    "   Scalability  –  Proven  reliability  and  performance  at  the  mulG-­‐petabyte  scale  –  High-­‐performance  parallel  I/O  library    "   Adap9vity  –  Flexible  schema  support  to  quickly  ingest  new  data  sources  –  Sorted  key/value  paradigm  supports  a  mulGtude  of  search  and  analysis  applicaGons  –  Server-­‐side  programming  framework  “iterator  trees”  support  best-­‐in-­‐class  aggregaGon,  filtering,  and  complex  query  semanGcs  
  • 15  Sqrrl  Data,  Inc.    All  Rights  Reserved  15  An  Accumulo  key  is  a  5-­‐tuple,  consis9ng  of:      "   Row:  Controls  Atomicity  "   Column  Family:  Controls  Locality    "   Column  Qualifier:    Controls  Uniqueness  "   Visibility  Label:    Controls  Access  "   Timestamp:    Controls  Versioning  Row   Col.  Fam.   Col.  Qual.   Visibility   Timestamp   Value  John  Doe   Notes   PCP   PCP_JD   20120912  PaGent  suffers  from  an  acute  …  John  Doe   Test  Results   Cholesterol   JD|PCP_JD   20120912   183  John  Doe   Test  Results   Mental  Health   JD|PSYCH_JD   20120801   Pass  John  Doe   Test  Results   X-­‐Ray   JD|PHYS_JD   20120513   1010110110100…  Accumulo  Key/Value  Example  Accumulo  Key  Structure  
  • 16  Sqrrl  Data,  Inc.    All  Rights  Reserved  16  Accumulo  Architecture  Tablet  Server  Tablet  Tablet  Server  Tablet  Tablet  Server  Tablet  ApplicaGon  Zookeeper  Zookeeper  Zookeeper  Master  HDFS  Read/Write  Store/Replicate  Assign/Balance  Delegate  Authority  Delegate  Authority  ApplicaGon  ApplicaGon  
  • 17  Sqrrl  Data,  Inc.    All  Rights  Reserved  17  Tablet  Data  Flow  In-­‐Memory  Map  Write  Ahead  Log  (For  Recovery)  Sorted,  Indexed  File  Sorted,  Indexed  File  Sorted,  Indexed  File  Tablet  Reads  Iterator  Tree  Minor  Compac<on  Merging  /  Major  Compac<on  Iterator  Tree  Writes   Iterator  Tree  Scan  
  • Iterator  Framework  18  Secure.        Scale.        Adapt.  Iterator  Opera9ons:    "   File  Reads  "   Block  Caching  "   Merging  "   DeleGon  "   IsolaGon  "   Locality  Groups  "   Range  SelecGon  "   Column  SelecGon  "   Cell-­‐level  Security  "   Versioning  "   Filtering  "   AggregaGon  "   ParGGoned  Joins  info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 19  Sqrrl  Data,  Inc.    All  Rights  Reserved  •  No  built-­‐in  secondary  indices  •  Sort  Order  ó  Index  •  Balance  between  ingest  and  query  •  Avoid  introducing  boDlenecks  •  Preserve  cell-­‐level  security  and  scalability  Table  Design  Table:  Row:  Column  Family:  Column  Qualifier:  Value:  Forward  Index  <UUID>  <Type>  <Field>  <Term>  Inverted  Index  <Term>  <Type>  +  <Field>  <UUID>  <Digest  of  Event>  
  • 20  Sqrrl  Data,  Inc.    All  Rights  Reserved  20  Ecosystem  Architecture  Apache  HDFS  Apache  Accumulo  Sqrrl  Enterprise  Custom  Ingester  Web  Server    Custom  AnalyGc  Map/Reduce  Task  Sqrrl  API  over  Apache  Thrip  RPC  :    Hierarchical  Documents  +  Graphs,  Lucene  +  SQL  +  more  Accumulo  RPC  :  Sorted  Key/Value  I/O  Hadoop  RPC  :  File  I/O    
  • 21  Sqrrl  Data,  Inc.    All  Rights  Reserved  21  sqrrl  data,  inc.  275  Third  St.  Cambridge,  MA  02142    617-­‐902-­‐0784  www.sqrrl.com  @sqrrl_inc  info@sqrrl.com  Contact