Apache Accumulo and Cloudera

3,228
-1

Published on

Published in: Technology
1 Comment
8 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,228
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
117
Comments
1
Likes
8
Embeds 0
No embeds

No notes for slide

Apache Accumulo and Cloudera

  1. 1. Apache  Accumulo  and  Cloudera   Hadoop-­‐DC,  July  2013   Joey  Echeverria  |  Director,  Federal  FTS   joey@cloudera.com  |  @fwiffo   ©2013  Cloudera,  Inc.  All  Rights  Reserved.   1
  2. 2. Apache  Accumulo  and  Cloudera   HADOOP  101   2  
  3. 3. OperaNng  Systems   •  Manage  and  schedule  machine  resources   •  CPU   •  RAM   •  Memory   •  Provide  abstracNons  and  APIs   •  Files  =  stream  of  bytes   •  Process  =  instrucNons  +  private  memory  space   3
  4. 4. Distributed  OperaNng  System   •  Same  thing,  but  over  a  cluster  of  networked  servers   •  AddiNonal  concerns:   •  Inter-­‐process  and  inter-­‐machine  communicaNon   •  Data  locality   •  Data  availability   •  Data  processing  availability   4
  5. 5. Hadoop   •  Defacto  Distributed  OperaNng  System   •  Apache  HDFS   •  Apache  MapReduce  and  Apache  YARN   5
  6. 6. Ecosystem   6 Key  Value  Stores   High  Level  Batch  Languages   Low  Latency  SQL  Engine  Graph  Processing  
  7. 7. Cloudera   7
  8. 8. CDH  History   8 CDH1     *HDFS   *MR   *Hive   *Pig   CDH2     *HDFS   *MR   *Hive   *Pig   CDH3     *HDFS   *MR   *Hive   *Pig   *Flume   *HBase   Hue   *Mahout   *Oozie   *Sqoop   *Whirr   *Zookeeper   *Avro   CDH4     *HDFS   *MR   *YARN   *Hive   *Pig   *Flume   *HBase   Hue   *Mahout   *Oozie   *Sqoop   *Whirr   *Zookeeper   *Avro   DataFu   HCatalog   Impala   *Solr   *BigTop   Sentry  
  9. 9. Apache  Accumulo  and  Cloudera   ACCUMULO  101  AND  201   9  
  10. 10. BigTable   10
  11. 11. Accumulo  Data  Model   •  MulJ-­‐dimensional  sorted  map   row id -> [ family -> [ qualifier -> [ visibility -> [ timestamp -> value ] ] ] ] 11
  12. 12. Accumulo  Storage  Model   •  key  -­‐>  value   •  key  =  <row  id><column><Nmestamp>   •  column  =  <family><qualifier><visibility>   12 Key   Value   Row  ID   Column   Timestamp   Family   Qualifier   Visibility  
  13. 13. 13  
  14. 14. Other  Concerns   •  Write-­‐ahead  log   •  Tablet  server  failure  handling   •  Versioning   •  Iterators   •  Cell-­‐level  security   14
  15. 15. Apache  Accumulo  and  Cloudera   PROJECT  HISTORY   15  
  16. 16. Pre-­‐Apache   16
  17. 17. Apache   17
  18. 18. RelaNonship  to  Hadoop  Releases   •  1.3.x  -­‐>  Hadoop  0.20.2   •  1.4.x  -­‐>  Hadoop  0.20.2,  Hadoop  0.20.203   •  1.5.x  -­‐>  Hadoop  1.0.4,  Hadoop  2.0.4-­‐alpha   18
  19. 19. Accumulo  and  Cloudera  Releases   •  Accumulo  1.3.x,  1.4.x,  and  1.5.x  all  work  with  CDH3   •  Accumulo  1.5.x  should  work  with  CDH4…   •  Limited  tesNng   19
  20. 20. Apache  Accumulo  and  Cloudera   ANNOUNCEMENT   20  
  21. 21. Apache  Accumulo  and  Cloudera   CLOUDERA  SUPPORT  OF  APACHE   ACCUMULO  ON  CDH4   21  
  22. 22. Apache  Accumulo  and  Cloudera   DEMO   22  
  23. 23. System  Logs   •  Id   •  Unique  id  for  an  acNon   •  Timestamp   •  Time  the  acNon  occured   •  Actor   •  User  or  system  performing  the  acNon   •  AcNon   •  The  acNon  taken   •  Object   •  The  object  of  the  acNon   •  Info   •  Free  form  informaNon  (e.g.  success/failure,  alribute  value,  etc.)   23
  24. 24. AcNons   •  created_user   •  deleted_user   •  set_password   •  logged_in   •  logged_out   •  read   •  modified   24
  25. 25. Roles   •  system   •  Any  user  on  the  system   •  admin   •  Administrators   •  audit   •  Auditors   25
  26. 26. Accumulo  Data  Model   26 Key   Value   Row  ID   Column   Timestamp   Family   Qualifier   Visibility   <ts>-­‐<id>   <actor>   <acNon>:<object>           <info>  
  27. 27. Apache  Accumulo  and  Cloudera   DEMO   27  
  28. 28. Logs  Demo   28 Row  key   Column   Visibility   Value   201307241535-­‐1   root:created_user:sean   audit   succeeded   201307241535-­‐1   root:set_password:sean   admin&audit   password   201307241537-­‐2   sean:logged_in:host   system   succeeded   201307241538-­‐3     sean:read:/tmp/a   audit   succeeded   201307241539-­‐4     sean:modified:/tmp/a   audit   failed   201307241540-­‐5     sean:logged_out:host   system   succeeded  
  29. 29. Apache  Accumulo  and  Cloudera   VERSIONS  REDUX   29  
  30. 30. Recap   •  Accumulo  1.3.x,  1.4.x,  and  1.5.x  all  work  with  CDH3   •  Accumulo  1.5.x  should  work  with  CDH4   30
  31. 31. Cloudera  Support   •  Naturally,  Cloudera  has  tested  and  packaged   Accumulo  1.5…   •  But  1.5  is  rather  bleeding  edge…   •  So,  we  instead  back  ported  Hadoop  2.0  support  from   1.5  onto  1.4.3   31
  32. 32. Apache  Accumulo  and  Cloudera   ECOSYSTEM  INTEGRATION   32  
  33. 33. Apache  Nutch   33
  34. 34. Apache  Pig   34
  35. 35. Apache  Accumulo  and  Cloudera   DEMO   35  
  36. 36. Apache  Accumulo  and  Cloudera   NEXT  STEPS   36  
  37. 37. Recap   •  What’s  available  today   •  Beta  release  of  Accumulo  1.4.3  on  CDH4.3   •  Beta  release  of  Accumulo  1.4.3  Pig  integraNon   •  Semi-­‐private  beta   •  Contact  me  (joey@cloudera.com)  if  you’re  interested  in   trying  out  the  bits   37
  38. 38. Future  Ideas  (not  promises  ;)   •  Cloudera  Manager  integraNon   •  Flume  integraNon   •  Sqoop  integraNon   •  Hive  integraNon   •  Impala  integraNon   38
  39. 39. What  next?   •  Download  Hadoop!   •  CDH  available  at  www.cloudera.com   •  Cloudera  provides  pre-­‐loaded  VMs   •  hlps://ccp.cloudera.com/display/SUPPORT/Cloudera +QuickStart+VM   •  Reach  out  to  me  (joey@cloudera.com)  if  you  want  to   try  out  the  Accumulo  beta   •  InstrucNons  to  replicate  the  demos  pending  
  40. 40. My  personal  preference   •  Cloudera  Manager   •  hlps://ccp.cloudera.com/display/SUPPORT/Downloads   •  Free  up  to  unlimited  nodes!  
  41. 41. Shout  Out   •  Jason  Trost   •  @jason_trost   •  covert.io  blog  posts   •  hlp://www.covert.io/post/18414889381/accumulo-­‐ nutch-­‐and-­‐gora   •  hlp://www.covert.io/post/18605091231/accumulo-­‐and-­‐ pig  
  42. 42. QuesNons?   •  Contact  me!   •  Joey  Echeverria   •  joey@cloudera.com   •  @fwiffo   •  We’re  hiring!  
  43. 43. ©2013  Cloudera,  Inc.  All  Rights  Reserved.   43
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×