Your SlideShare is downloading. ×
Apache Accumulo and Cloudera
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Apache Accumulo and Cloudera

2,866
views

Published on

Published in: Technology

1 Comment
8 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,866
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
111
Comments
1
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Apache  Accumulo  and  Cloudera   Hadoop-­‐DC,  July  2013   Joey  Echeverria  |  Director,  Federal  FTS   joey@cloudera.com  |  @fwiffo   ©2013  Cloudera,  Inc.  All  Rights  Reserved.   1
  • 2. Apache  Accumulo  and  Cloudera   HADOOP  101   2  
  • 3. OperaNng  Systems   •  Manage  and  schedule  machine  resources   •  CPU   •  RAM   •  Memory   •  Provide  abstracNons  and  APIs   •  Files  =  stream  of  bytes   •  Process  =  instrucNons  +  private  memory  space   3
  • 4. Distributed  OperaNng  System   •  Same  thing,  but  over  a  cluster  of  networked  servers   •  AddiNonal  concerns:   •  Inter-­‐process  and  inter-­‐machine  communicaNon   •  Data  locality   •  Data  availability   •  Data  processing  availability   4
  • 5. Hadoop   •  Defacto  Distributed  OperaNng  System   •  Apache  HDFS   •  Apache  MapReduce  and  Apache  YARN   5
  • 6. Ecosystem   6 Key  Value  Stores   High  Level  Batch  Languages   Low  Latency  SQL  Engine  Graph  Processing  
  • 7. Cloudera   7
  • 8. CDH  History   8 CDH1     *HDFS   *MR   *Hive   *Pig   CDH2     *HDFS   *MR   *Hive   *Pig   CDH3     *HDFS   *MR   *Hive   *Pig   *Flume   *HBase   Hue   *Mahout   *Oozie   *Sqoop   *Whirr   *Zookeeper   *Avro   CDH4     *HDFS   *MR   *YARN   *Hive   *Pig   *Flume   *HBase   Hue   *Mahout   *Oozie   *Sqoop   *Whirr   *Zookeeper   *Avro   DataFu   HCatalog   Impala   *Solr   *BigTop   Sentry  
  • 9. Apache  Accumulo  and  Cloudera   ACCUMULO  101  AND  201   9  
  • 10. BigTable   10
  • 11. Accumulo  Data  Model   •  MulJ-­‐dimensional  sorted  map   row id -> [ family -> [ qualifier -> [ visibility -> [ timestamp -> value ] ] ] ] 11
  • 12. Accumulo  Storage  Model   •  key  -­‐>  value   •  key  =  <row  id><column><Nmestamp>   •  column  =  <family><qualifier><visibility>   12 Key   Value   Row  ID   Column   Timestamp   Family   Qualifier   Visibility  
  • 13. 13  
  • 14. Other  Concerns   •  Write-­‐ahead  log   •  Tablet  server  failure  handling   •  Versioning   •  Iterators   •  Cell-­‐level  security   14
  • 15. Apache  Accumulo  and  Cloudera   PROJECT  HISTORY   15  
  • 16. Pre-­‐Apache   16
  • 17. Apache   17
  • 18. RelaNonship  to  Hadoop  Releases   •  1.3.x  -­‐>  Hadoop  0.20.2   •  1.4.x  -­‐>  Hadoop  0.20.2,  Hadoop  0.20.203   •  1.5.x  -­‐>  Hadoop  1.0.4,  Hadoop  2.0.4-­‐alpha   18
  • 19. Accumulo  and  Cloudera  Releases   •  Accumulo  1.3.x,  1.4.x,  and  1.5.x  all  work  with  CDH3   •  Accumulo  1.5.x  should  work  with  CDH4…   •  Limited  tesNng   19
  • 20. Apache  Accumulo  and  Cloudera   ANNOUNCEMENT   20  
  • 21. Apache  Accumulo  and  Cloudera   CLOUDERA  SUPPORT  OF  APACHE   ACCUMULO  ON  CDH4   21  
  • 22. Apache  Accumulo  and  Cloudera   DEMO   22  
  • 23. System  Logs   •  Id   •  Unique  id  for  an  acNon   •  Timestamp   •  Time  the  acNon  occured   •  Actor   •  User  or  system  performing  the  acNon   •  AcNon   •  The  acNon  taken   •  Object   •  The  object  of  the  acNon   •  Info   •  Free  form  informaNon  (e.g.  success/failure,  alribute  value,  etc.)   23
  • 24. AcNons   •  created_user   •  deleted_user   •  set_password   •  logged_in   •  logged_out   •  read   •  modified   24
  • 25. Roles   •  system   •  Any  user  on  the  system   •  admin   •  Administrators   •  audit   •  Auditors   25
  • 26. Accumulo  Data  Model   26 Key   Value   Row  ID   Column   Timestamp   Family   Qualifier   Visibility   <ts>-­‐<id>   <actor>   <acNon>:<object>           <info>  
  • 27. Apache  Accumulo  and  Cloudera   DEMO   27  
  • 28. Logs  Demo   28 Row  key   Column   Visibility   Value   201307241535-­‐1   root:created_user:sean   audit   succeeded   201307241535-­‐1   root:set_password:sean   admin&audit   password   201307241537-­‐2   sean:logged_in:host   system   succeeded   201307241538-­‐3     sean:read:/tmp/a   audit   succeeded   201307241539-­‐4     sean:modified:/tmp/a   audit   failed   201307241540-­‐5     sean:logged_out:host   system   succeeded  
  • 29. Apache  Accumulo  and  Cloudera   VERSIONS  REDUX   29  
  • 30. Recap   •  Accumulo  1.3.x,  1.4.x,  and  1.5.x  all  work  with  CDH3   •  Accumulo  1.5.x  should  work  with  CDH4   30
  • 31. Cloudera  Support   •  Naturally,  Cloudera  has  tested  and  packaged   Accumulo  1.5…   •  But  1.5  is  rather  bleeding  edge…   •  So,  we  instead  back  ported  Hadoop  2.0  support  from   1.5  onto  1.4.3   31
  • 32. Apache  Accumulo  and  Cloudera   ECOSYSTEM  INTEGRATION   32  
  • 33. Apache  Nutch   33
  • 34. Apache  Pig   34
  • 35. Apache  Accumulo  and  Cloudera   DEMO   35  
  • 36. Apache  Accumulo  and  Cloudera   NEXT  STEPS   36  
  • 37. Recap   •  What’s  available  today   •  Beta  release  of  Accumulo  1.4.3  on  CDH4.3   •  Beta  release  of  Accumulo  1.4.3  Pig  integraNon   •  Semi-­‐private  beta   •  Contact  me  (joey@cloudera.com)  if  you’re  interested  in   trying  out  the  bits   37
  • 38. Future  Ideas  (not  promises  ;)   •  Cloudera  Manager  integraNon   •  Flume  integraNon   •  Sqoop  integraNon   •  Hive  integraNon   •  Impala  integraNon   38
  • 39. What  next?   •  Download  Hadoop!   •  CDH  available  at  www.cloudera.com   •  Cloudera  provides  pre-­‐loaded  VMs   •  hlps://ccp.cloudera.com/display/SUPPORT/Cloudera +QuickStart+VM   •  Reach  out  to  me  (joey@cloudera.com)  if  you  want  to   try  out  the  Accumulo  beta   •  InstrucNons  to  replicate  the  demos  pending  
  • 40. My  personal  preference   •  Cloudera  Manager   •  hlps://ccp.cloudera.com/display/SUPPORT/Downloads   •  Free  up  to  unlimited  nodes!  
  • 41. Shout  Out   •  Jason  Trost   •  @jason_trost   •  covert.io  blog  posts   •  hlp://www.covert.io/post/18414889381/accumulo-­‐ nutch-­‐and-­‐gora   •  hlp://www.covert.io/post/18605091231/accumulo-­‐and-­‐ pig  
  • 42. QuesNons?   •  Contact  me!   •  Joey  Echeverria   •  joey@cloudera.com   •  @fwiffo   •  We’re  hiring!  
  • 43. ©2013  Cloudera,  Inc.  All  Rights  Reserved.   43