sqrrl  data,  INC.  
Secure.	
  Scale.	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Adam	
  Fuchs,	
  Chief	
  Technology	
  Officer	
  
Who  We  are  
2	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
is	
  the	
  commercial	
  
provider	
  of	
  
	
  
!   Mature	
  Database	
  Technology	
  -­‐	
  Apache	
  Accumulo	
  
!   Fine-­‐Grained	
  Access	
  Controls	
  -­‐	
  Data	
  Integra7on	
  and	
  Sharing	
  
!   Proven	
  Performance	
  -­‐	
  Petabytes	
  and	
  Beyond	
  
!   Advanced	
  AnalyBcs	
  -­‐	
  Search,	
  Sta7s7cs,	
  and	
  Graphs	
  
Contents  
Core	
  Philosophy	
  
Technology	
  
Techniques	
  
ApplicaBon	
  APIs	
  
3	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Integra7on	
  across:	
  
	
  
!   Mul7ple	
  business	
  lines	
  
!   Mul7ple	
  data	
  sets	
  
!   Mul7ple	
  applica7ons	
  
!   Mul7ple	
  security,	
  privacy,	
  legal,	
  
policy,	
  regulatory,	
  and	
  
compliance	
  constraints	
  
!   New	
  demands	
  
  
Apache  Accumulo  Perspective  
Applica7on	
  
Data	
   Data	
   Data	
  
Applica7on	
   Applica7on	
  
4	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Accumulo  Design  Drivers  
Scalability	
  
„  Near	
  linear	
  performance	
  improvements	
  at	
  thousands	
  of	
  nodes	
  
„  Durable	
  and	
  reliable	
  under	
  increased	
  failures	
  that	
  come	
  with	
  scale	
  
2	
Diverse,	
  InteracBve	
  AnalyBcs	
  
„  Sorted	
  key/value	
  core	
  performs	
  well	
  in	
  a	
  diverse	
  set	
  of	
  domains	
  	
  
„  Informa7on	
  retrieval,	
  sta7s7cs,	
  graph	
  analysis,	
  geo	
  indexing,	
  and	
  more	
  
3	
Cell-­‐Level	
  Security	
  	
  
„  Express	
  common	
  security	
  requirements	
  in	
  the	
  infrastructure,	
  not	
  just	
  in	
  the	
  applica7on	
  
„  Data-­‐centric	
  approach	
  encourages	
  secure	
  sharing	
  
1	
5	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
Flexible,	
  AdapBve	
  Schema	
  
„  Start	
  with	
  universal	
  structures	
  and	
  indexing	
  
„  Refine	
  the	
  schema	
  over	
  7me	
  
4	
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Contents  
Core	
  Philosophy	
  
Technology	
  
Techniques	
  
ApplicaBon	
  APIs	
  
6	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Accumulo  Key  Structure  
An	
  Accumulo	
  key	
  is	
  a	
  5-­‐tuple,	
  consisBng	
  of:	
  	
  
	
  
!   Row:	
  Controls	
  Atomicity	
  
!   Column	
  Family:	
  Controls	
  Locality	
  	
  
!   Column	
  Qualifier:	
  	
  Controls	
  Uniqueness	
  
!   Visibility	
  Label:	
  	
  Controls	
  Access	
  
!   Timestamp:	
  	
  Controls	
  Versioning	
  
Row	
   Col.	
  Fam.	
   Col.	
  Qual.	
   Visibility	
   Timestamp	
   Value	
  
John	
  Doe	
   Notes	
   PCP	
   PCP_JD	
   20120912	
  
Pa7ent	
  suffers	
  
from	
  an	
  acute	
  …	
  
John	
  Doe	
   Test	
  Results	
   Cholesterol	
   JD|PCP_JD	
   20120912	
   183	
  
John	
  Doe	
   Test	
  Results	
   Mental	
  Health	
   JD|PSYCH_JD	
   20120801	
   Pass	
  
John	
  Doe	
   Test	
  Results	
   X-­‐Ray	
   JD|PHYS_JD	
   20120513	
   1010110110100…	
  
Accumulo	
  Key/Value	
  Example	
  
7	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Visibility  Syntax  &  Semantics  
8	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Tablets  
9	
  
!   Collec7ons	
  of	
  KV	
  pairs	
  form	
  Tables	
  
!   Tables	
  are	
  par77oned	
  into	
  Tablets	
  
!   Metadata	
  tablets	
  hold	
  info	
  about	
  
other	
  tablets,	
  forming	
  a	
  3-­‐level	
  
hierarchy	
  
!   A	
  Tablet	
  is	
  a	
  unit	
  of	
  work	
  for	
  a	
  Tablet	
  
Server	
  
Root	
  Tablet	
  
-­‐∞	
  to	
  ∞	
  	
  
Metadata	
  Tablet	
  1	
  
-­‐∞	
  to	
  “Encyclopedia:Ocelot”	
  
Data	
  Tablet	
  
-­‐∞	
  :	
  thing	
  
Data	
  Tablet	
  
thing	
  :	
  ∞	
  	
  
Data	
  Tablet	
  
-­‐∞	
  :	
  Ocelot	
  	
  
Data	
  Tablet	
  
Ocelot	
  :	
  Yak	
  	
  
Data	
  Tablet	
  
Yak	
  :	
  ∞	
  	
  
Data	
  Tablet	
  
-­‐∞	
  to	
  ∞	
  	
  
Metadata	
  Tablet	
  2	
  
“Encyclopedia:Ocelot”	
  to	
  ∞	
  
Well-­‐Known	
  
Loca7on	
  
(zookeeper)	
  
Table:	
  	
  Adam’s	
  Table	
   Table:	
  	
  Encyclopedia	
   Table:	
  	
  Foo	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Accumulo  Architecture  
Tablet	
  Server	
  
Tablet	
  
Tablet	
  Server	
  
Tablet	
  
Tablet	
  Server	
  
Tablet	
  
Applica7on	
  
Zookeeper	
  
Zookeeper	
  
Zookeeper	
  
Master	
  
Hadoop	
  
Read/Write	
  
Store/Replicate	
  
Assign/Balance	
  
Delegate	
  
Authority	
  
Delegate	
  
Authority	
  
Applica7on	
  
Applica7on	
  
10	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Tablet  Data  Flow  
In-­‐Memory	
  
Map	
  
Write	
  Ahead	
  
Log	
  
(For	
  Recovery)	
  
Sorted,	
  
Indexed	
  
File	
  
Sorted,	
  
Indexed	
  
File	
  
Sorted,	
  
Indexed	
  
File	
  
Tablet	
  
Reads	
  
Iterator	
  
Tree	
  
Minor	
  
Compac<on	
  
Merging	
  /	
  Major	
  
Compac<on	
  
Iterator	
  
Tree	
  
Writes	
  
11	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
Iterator	
  
Tree	
  
Scan	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Iterator  Framework  
12	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
Iterator	
  OperaBons:	
  
	
  
!   File	
  Reads	
  
!   Block	
  Caching	
  
!   Merging	
  
!   Dele7on	
  
!   Isola7on	
  
!   Locality	
  Groups	
  
!   Range	
  Selec7on	
  
!   Column	
  Selec7on	
  
!   Cell-­‐level	
  Security	
  
!   Versioning	
  
!   Filtering	
  
!   Aggrega7on	
  
!   Par77oned	
  Joins	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
13	
  
Testing  Procedures  
Frameworks	
  
!   Unit:	
  Verify	
  correct	
  func7oning	
  of	
  
each	
  module	
  separately	
  
!   System:	
  Perform	
  correctness	
  and	
  
performance	
  tests	
  on	
  a	
  small	
  running	
  
instance	
  
!   Load/Scale:	
  Generate	
  high	
  loads	
  at	
  
scale,	
  measure	
  performance	
  and	
  
correctness	
  
!   Random	
  Walk:	
  Randomly,	
  
repeatedly,	
  and	
  concurrently	
  execute	
  
a	
  variety	
  of	
  test	
  modules	
  
representa7ve	
  of	
  user	
  ac7vity	
  
!   SimulaBon:	
  Evaluate	
  a	
  model	
  for	
  
correctness	
  and	
  performance	
  
TesBng	
  Complexity	
  
!   Scope	
  of	
  test	
  may	
  include/exclude	
  
server-­‐side,	
  client-­‐side,	
  dependent	
  
processes,	
  etc.	
  
!   Simula7ng	
  failures	
  of	
  distributed	
  
components	
  
!   Strange	
  failure	
  modes	
  (hardware/
physics	
  related)	
  
!   Sta7c	
  vs.	
  dynamic	
  analysis	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
FATE:  Fault-Tolerant  Executor  
14	
  
!   Logically	
  Atomic	
  distributed	
  opera7ons	
  
!   No	
  single	
  point	
  of	
  failure	
  
!   Builds	
  on	
  quorum	
  consistency	
  model	
  of	
  
Zookeeper	
  
!   All-­‐or-­‐nothing	
  transac7onal	
  seman7cs	
  to	
  handle	
  
failures	
  (opera7ons	
  must	
  be	
  idempotent	
  and	
  
reversible)	
  
!   Used	
  by	
  administra7ve	
  opera7ons	
  and	
  bulk	
  
table	
  opera7ons	
  to	
  survive	
  our	
  toughest	
  tests	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Verified  State  Models  
15	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
Many	
  of	
  Accumulo’s	
  subsystems	
  use	
  models	
  
that	
  are	
  mathema7cally	
  proven	
  correct.	
  
Fate	
  States	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Contents  
Core	
  Philosophy	
  
Technology	
  
Techniques	
  
ApplicaBon	
  APIs	
  
16	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Hierarchical  Decomposition  
17	
  
Row:	
  
Column	
  Family:	
  
Column	
  Qualifier:	
  
Value:	
  
<person>	
  
aYribute	
   purchases	
   returns	
  
age	
  
<age>	
  
discount	
  
<cost>	
  
hat	
  
<cost>	
  
sneakers	
  
<40%>	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Materialized  Table  
18	
  
Row:	
  
Column	
  
Family:	
  
Column	
  
Qualifier:	
  
Value:	
  
george	
  
aYribute	
   purchases	
   returns	
  
age	
  
27	
   $83	
  
hat	
  
$42	
  
sneakers	
  
bill	
  
aYribute	
   purchases	
  
40%	
  
sneakers	
  
$100	
  
discount	
  
49	
  
age	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
Key/Value	
  Pair	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Forward  and  Inverted  Index  
19	
  
Table:	
  
Row:	
  
Column	
  Family:	
  
Column	
  Qualifier:	
  
Value:	
  
Forward	
  Index	
  
<UUID>	
  
<Type>	
  
<Field>	
  
<Term>	
  
Inverted	
  Index	
  
<Term>	
  
<Type>	
  +	
  <Field>	
  
<UUID>	
  
<Digest	
  of	
  Event>	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Forward  and  Inverted  Index  
20	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Graph  Analysis  
21	
  
Table:	
  
Row:	
  
Column	
  Family:	
  
Column	
  Qualifier:	
  
(Tuples):	
  
Value:	
  
Graph	
  Table	
  
<Node	
  ID>	
  
“Node	
  Info”	
   “Out	
  Edges”	
   “In	
  Edges”	
  
<Field>	
  
<Value>	
  
<Node	
  ID>	
  
<Edge	
  ID>	
  
<Edge	
  Info>	
  
<Node	
  ID>	
  
<Edge	
  ID>	
  
<Edge	
  Info>	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Geospatial  Queries  
22	
  
Table:	
  
Row:	
  
Column	
  Family:	
  
Column	
  Qualifier:	
  
Value:	
  
Geo	
  Index	
  
<GeoHash>	
  
<Event	
  Type>	
  
<UUID>	
  
<Digest	
  of	
  Event>	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
La7tude	
  
10110101001	
  
Longitude	
  
00111010010	
  
101001110111010101011100001011100	
  
Depth	
  
11010110110	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Document  Partitioning  
23	
  
Table:	
  
Row:	
  
Column	
  Family:	
  
Column	
  Qualifier	
  
(Tuples):	
  
Value:	
  
Shard	
  Table	
  
<ParBBon	
  ID>	
  
“Docs”	
   “Inv.	
  Index”	
  “Field	
  Index”	
  
<UUID>	
  
<Value>	
  
<Term>	
  
<UUID>	
  
<Field:Term>	
  
<UUID>	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
<Field>	
  
“Geo”	
  
<Hash>	
  
<UUID>	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Document  Partitioning  
24	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Intersecting  Iterator  
25	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Intersecting  Iterator  
26	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
‘foo’	
  and	
  (‘bar’	
  or	
  ‘baz’)	
  
<ParBBon	
  ID>	
  
“Docs”	
   “Inv.	
  Index”	
  
<UUID>	
  
<Value>	
  
<Term>	
  
<UUID>	
  <Field>	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Contents  
Core	
  Philosophy	
  
Technology	
  
Techniques	
  
ApplicaBon	
  APIs	
  
27	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
acorn  
28	
  
Key/Value	
  pairs	
  are	
  great!	
  	
  
How	
  do	
  I	
  construct	
  a	
  document	
  
par<<oning	
  key	
  again?	
  
!   Techniques	
  should	
  be	
  built	
  into	
  an	
  API	
  
!   Let	
  the	
  people	
  have	
  polyglot	
  
!   Lucene,	
  SQL,	
  SPARQL,	
  JAQL,	
  Matlab	
  
(not	
  just	
  Key,	
  Value,	
  Range)	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
=	
  
+	
  
+	
  
Combined  IR  +  Graph  Search  
29	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Schema-less  Stats  
30	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Get  Involved  
hYp://accumulo.apache.org	
  
Help	
  us	
  make	
  Accumulo	
  even	
  beYer!	
  
31	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
Contact  
32	
  
Adam	
  Fuchs,	
  CTO	
  
	
  
sqrrl	
  data,	
  Inc.	
  
617-­‐520-­‐4375	
  
www.sqrrl.com	
  
@sqrrl_inc	
  
info@sqrrl.com	
  
Secure.	
  	
  	
  	
  Scale.	
  	
  	
  	
  Adapt.	
  
info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  

Hugaccumulo 121018192044-phpapp02

  • 1.
    sqrrl  data,  INC.  Secure.  Scale.  Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved   Adam  Fuchs,  Chief  Technology  Officer  
  • 2.
    Who  We  are  2   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved   is  the  commercial   provider  of     !   Mature  Database  Technology  -­‐  Apache  Accumulo   !   Fine-­‐Grained  Access  Controls  -­‐  Data  Integra7on  and  Sharing   !   Proven  Performance  -­‐  Petabytes  and  Beyond   !   Advanced  AnalyBcs  -­‐  Search,  Sta7s7cs,  and  Graphs  
  • 3.
    Contents   Core  Philosophy   Technology   Techniques   ApplicaBon  APIs   3   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 4.
    Integra7on  across:     !   Mul7ple  business  lines   !   Mul7ple  data  sets   !   Mul7ple  applica7ons   !   Mul7ple  security,  privacy,  legal,   policy,  regulatory,  and   compliance  constraints   !   New  demands     Apache  Accumulo  Perspective   Applica7on   Data   Data   Data   Applica7on   Applica7on   4   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 5.
    Accumulo  Design  Drivers  Scalability   „  Near  linear  performance  improvements  at  thousands  of  nodes   „  Durable  and  reliable  under  increased  failures  that  come  with  scale   2 Diverse,  InteracBve  AnalyBcs   „  Sorted  key/value  core  performs  well  in  a  diverse  set  of  domains     „  Informa7on  retrieval,  sta7s7cs,  graph  analysis,  geo  indexing,  and  more   3 Cell-­‐Level  Security     „  Express  common  security  requirements  in  the  infrastructure,  not  just  in  the  applica7on   „  Data-­‐centric  approach  encourages  secure  sharing   1 5   Secure.        Scale.        Adapt.   Flexible,  AdapBve  Schema   „  Start  with  universal  structures  and  indexing   „  Refine  the  schema  over  7me   4 info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 6.
    Contents   Core  Philosophy   Technology   Techniques   ApplicaBon  APIs   6   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 7.
    Accumulo  Key  Structure  An  Accumulo  key  is  a  5-­‐tuple,  consisBng  of:       !   Row:  Controls  Atomicity   !   Column  Family:  Controls  Locality     !   Column  Qualifier:    Controls  Uniqueness   !   Visibility  Label:    Controls  Access   !   Timestamp:    Controls  Versioning   Row   Col.  Fam.   Col.  Qual.   Visibility   Timestamp   Value   John  Doe   Notes   PCP   PCP_JD   20120912   Pa7ent  suffers   from  an  acute  …   John  Doe   Test  Results   Cholesterol   JD|PCP_JD   20120912   183   John  Doe   Test  Results   Mental  Health   JD|PSYCH_JD   20120801   Pass   John  Doe   Test  Results   X-­‐Ray   JD|PHYS_JD   20120513   1010110110100…   Accumulo  Key/Value  Example   7   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 8.
    Visibility  Syntax  & Semantics   8   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 9.
    Tablets   9   !  Collec7ons  of  KV  pairs  form  Tables   !   Tables  are  par77oned  into  Tablets   !   Metadata  tablets  hold  info  about   other  tablets,  forming  a  3-­‐level   hierarchy   !   A  Tablet  is  a  unit  of  work  for  a  Tablet   Server   Root  Tablet   -­‐∞  to  ∞     Metadata  Tablet  1   -­‐∞  to  “Encyclopedia:Ocelot”   Data  Tablet   -­‐∞  :  thing   Data  Tablet   thing  :  ∞     Data  Tablet   -­‐∞  :  Ocelot     Data  Tablet   Ocelot  :  Yak     Data  Tablet   Yak  :  ∞     Data  Tablet   -­‐∞  to  ∞     Metadata  Tablet  2   “Encyclopedia:Ocelot”  to  ∞   Well-­‐Known   Loca7on   (zookeeper)   Table:    Adam’s  Table   Table:    Encyclopedia   Table:    Foo   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 10.
    Accumulo  Architecture   Tablet  Server   Tablet   Tablet  Server   Tablet   Tablet  Server   Tablet   Applica7on   Zookeeper   Zookeeper   Zookeeper   Master   Hadoop   Read/Write   Store/Replicate   Assign/Balance   Delegate   Authority   Delegate   Authority   Applica7on   Applica7on   10   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 11.
    Tablet  Data  Flow  In-­‐Memory   Map   Write  Ahead   Log   (For  Recovery)   Sorted,   Indexed   File   Sorted,   Indexed   File   Sorted,   Indexed   File   Tablet   Reads   Iterator   Tree   Minor   Compac<on   Merging  /  Major   Compac<on   Iterator   Tree   Writes   11   Secure.        Scale.        Adapt.   Iterator   Tree   Scan   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 12.
    Iterator  Framework   12   Secure.        Scale.        Adapt.   Iterator  OperaBons:     !   File  Reads   !   Block  Caching   !   Merging   !   Dele7on   !   Isola7on   !   Locality  Groups   !   Range  Selec7on   !   Column  Selec7on   !   Cell-­‐level  Security   !   Versioning   !   Filtering   !   Aggrega7on   !   Par77oned  Joins   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 13.
    13   Testing  Procedures  Frameworks   !   Unit:  Verify  correct  func7oning  of   each  module  separately   !   System:  Perform  correctness  and   performance  tests  on  a  small  running   instance   !   Load/Scale:  Generate  high  loads  at   scale,  measure  performance  and   correctness   !   Random  Walk:  Randomly,   repeatedly,  and  concurrently  execute   a  variety  of  test  modules   representa7ve  of  user  ac7vity   !   SimulaBon:  Evaluate  a  model  for   correctness  and  performance   TesBng  Complexity   !   Scope  of  test  may  include/exclude   server-­‐side,  client-­‐side,  dependent   processes,  etc.   !   Simula7ng  failures  of  distributed   components   !   Strange  failure  modes  (hardware/ physics  related)   !   Sta7c  vs.  dynamic  analysis   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 14.
    FATE:  Fault-Tolerant  Executor  14   !   Logically  Atomic  distributed  opera7ons   !   No  single  point  of  failure   !   Builds  on  quorum  consistency  model  of   Zookeeper   !   All-­‐or-­‐nothing  transac7onal  seman7cs  to  handle   failures  (opera7ons  must  be  idempotent  and   reversible)   !   Used  by  administra7ve  opera7ons  and  bulk   table  opera7ons  to  survive  our  toughest  tests   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 15.
    Verified  State  Models  15   Secure.        Scale.        Adapt.   Many  of  Accumulo’s  subsystems  use  models   that  are  mathema7cally  proven  correct.   Fate  States   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 16.
    Contents   Core  Philosophy   Technology   Techniques   ApplicaBon  APIs   16   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 17.
    Hierarchical  Decomposition   17   Row:   Column  Family:   Column  Qualifier:   Value:   <person>   aYribute   purchases   returns   age   <age>   discount   <cost>   hat   <cost>   sneakers   <40%>   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 18.
    Materialized  Table   18   Row:   Column   Family:   Column   Qualifier:   Value:   george   aYribute   purchases   returns   age   27   $83   hat   $42   sneakers   bill   aYribute   purchases   40%   sneakers   $100   discount   49   age   Secure.        Scale.        Adapt.   Key/Value  Pair   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 19.
    Forward  and  Inverted Index   19   Table:   Row:   Column  Family:   Column  Qualifier:   Value:   Forward  Index   <UUID>   <Type>   <Field>   <Term>   Inverted  Index   <Term>   <Type>  +  <Field>   <UUID>   <Digest  of  Event>   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 20.
    Forward  and  Inverted Index   20   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 21.
    Graph  Analysis   21   Table:   Row:   Column  Family:   Column  Qualifier:   (Tuples):   Value:   Graph  Table   <Node  ID>   “Node  Info”   “Out  Edges”   “In  Edges”   <Field>   <Value>   <Node  ID>   <Edge  ID>   <Edge  Info>   <Node  ID>   <Edge  ID>   <Edge  Info>   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 22.
    Geospatial  Queries   22   Table:   Row:   Column  Family:   Column  Qualifier:   Value:   Geo  Index   <GeoHash>   <Event  Type>   <UUID>   <Digest  of  Event>   Secure.        Scale.        Adapt.   La7tude   10110101001   Longitude   00111010010   101001110111010101011100001011100   Depth   11010110110   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 23.
    Document  Partitioning   23   Table:   Row:   Column  Family:   Column  Qualifier   (Tuples):   Value:   Shard  Table   <ParBBon  ID>   “Docs”   “Inv.  Index”  “Field  Index”   <UUID>   <Value>   <Term>   <UUID>   <Field:Term>   <UUID>   Secure.        Scale.        Adapt.   <Field>   “Geo”   <Hash>   <UUID>   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 24.
    Document  Partitioning   24   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 25.
    Intersecting  Iterator   25   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 26.
    Intersecting  Iterator   26   Secure.        Scale.        Adapt.   ‘foo’  and  (‘bar’  or  ‘baz’)   <ParBBon  ID>   “Docs”   “Inv.  Index”   <UUID>   <Value>   <Term>   <UUID>  <Field>   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 27.
    Contents   Core  Philosophy   Technology   Techniques   ApplicaBon  APIs   27   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 28.
    acorn   28   Key/Value  pairs  are  great!     How  do  I  construct  a  document   par<<oning  key  again?   !   Techniques  should  be  built  into  an  API   !   Let  the  people  have  polyglot   !   Lucene,  SQL,  SPARQL,  JAQL,  Matlab   (not  just  Key,  Value,  Range)   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved   =   +   +  
  • 29.
    Combined  IR  + Graph  Search   29   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 30.
    Schema-less  Stats   30   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 31.
    Get  Involved   hYp://accumulo.apache.org   Help  us  make  Accumulo  even  beYer!   31   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  
  • 32.
    Contact   32   Adam  Fuchs,  CTO     sqrrl  data,  Inc.   617-­‐520-­‐4375   www.sqrrl.com   @sqrrl_inc   info@sqrrl.com   Secure.        Scale.        Adapt.   info@sqrrl.com  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved