Your SlideShare is downloading. ×
0
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Apache Accumulo and Cloudera
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apache Accumulo and Cloudera

3,013

Published on

Published in: Technology
1 Comment
8 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,013
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
111
Comments
1
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Apache  Accumulo  and  Cloudera   Hadoop-­‐DC,  July  2013   Joey  Echeverria  |  Director,  Federal  FTS   joey@cloudera.com  |  @fwiffo   ©2013  Cloudera,  Inc.  All  Rights  Reserved.   1
  • 2. Apache  Accumulo  and  Cloudera   HADOOP  101   2  
  • 3. OperaNng  Systems   •  Manage  and  schedule  machine  resources   •  CPU   •  RAM   •  Memory   •  Provide  abstracNons  and  APIs   •  Files  =  stream  of  bytes   •  Process  =  instrucNons  +  private  memory  space   3
  • 4. Distributed  OperaNng  System   •  Same  thing,  but  over  a  cluster  of  networked  servers   •  AddiNonal  concerns:   •  Inter-­‐process  and  inter-­‐machine  communicaNon   •  Data  locality   •  Data  availability   •  Data  processing  availability   4
  • 5. Hadoop   •  Defacto  Distributed  OperaNng  System   •  Apache  HDFS   •  Apache  MapReduce  and  Apache  YARN   5
  • 6. Ecosystem   6 Key  Value  Stores   High  Level  Batch  Languages   Low  Latency  SQL  Engine  Graph  Processing  
  • 7. Cloudera   7
  • 8. CDH  History   8 CDH1     *HDFS   *MR   *Hive   *Pig   CDH2     *HDFS   *MR   *Hive   *Pig   CDH3     *HDFS   *MR   *Hive   *Pig   *Flume   *HBase   Hue   *Mahout   *Oozie   *Sqoop   *Whirr   *Zookeeper   *Avro   CDH4     *HDFS   *MR   *YARN   *Hive   *Pig   *Flume   *HBase   Hue   *Mahout   *Oozie   *Sqoop   *Whirr   *Zookeeper   *Avro   DataFu   HCatalog   Impala   *Solr   *BigTop   Sentry  
  • 9. Apache  Accumulo  and  Cloudera   ACCUMULO  101  AND  201   9  
  • 10. BigTable   10
  • 11. Accumulo  Data  Model   •  MulJ-­‐dimensional  sorted  map   row id -> [ family -> [ qualifier -> [ visibility -> [ timestamp -> value ] ] ] ] 11
  • 12. Accumulo  Storage  Model   •  key  -­‐>  value   •  key  =  <row  id><column><Nmestamp>   •  column  =  <family><qualifier><visibility>   12 Key   Value   Row  ID   Column   Timestamp   Family   Qualifier   Visibility  
  • 13. 13  
  • 14. Other  Concerns   •  Write-­‐ahead  log   •  Tablet  server  failure  handling   •  Versioning   •  Iterators   •  Cell-­‐level  security   14
  • 15. Apache  Accumulo  and  Cloudera   PROJECT  HISTORY   15  
  • 16. Pre-­‐Apache   16
  • 17. Apache   17
  • 18. RelaNonship  to  Hadoop  Releases   •  1.3.x  -­‐>  Hadoop  0.20.2   •  1.4.x  -­‐>  Hadoop  0.20.2,  Hadoop  0.20.203   •  1.5.x  -­‐>  Hadoop  1.0.4,  Hadoop  2.0.4-­‐alpha   18
  • 19. Accumulo  and  Cloudera  Releases   •  Accumulo  1.3.x,  1.4.x,  and  1.5.x  all  work  with  CDH3   •  Accumulo  1.5.x  should  work  with  CDH4…   •  Limited  tesNng   19
  • 20. Apache  Accumulo  and  Cloudera   ANNOUNCEMENT   20  
  • 21. Apache  Accumulo  and  Cloudera   CLOUDERA  SUPPORT  OF  APACHE   ACCUMULO  ON  CDH4   21  
  • 22. Apache  Accumulo  and  Cloudera   DEMO   22  
  • 23. System  Logs   •  Id   •  Unique  id  for  an  acNon   •  Timestamp   •  Time  the  acNon  occured   •  Actor   •  User  or  system  performing  the  acNon   •  AcNon   •  The  acNon  taken   •  Object   •  The  object  of  the  acNon   •  Info   •  Free  form  informaNon  (e.g.  success/failure,  alribute  value,  etc.)   23
  • 24. AcNons   •  created_user   •  deleted_user   •  set_password   •  logged_in   •  logged_out   •  read   •  modified   24
  • 25. Roles   •  system   •  Any  user  on  the  system   •  admin   •  Administrators   •  audit   •  Auditors   25
  • 26. Accumulo  Data  Model   26 Key   Value   Row  ID   Column   Timestamp   Family   Qualifier   Visibility   <ts>-­‐<id>   <actor>   <acNon>:<object>           <info>  
  • 27. Apache  Accumulo  and  Cloudera   DEMO   27  
  • 28. Logs  Demo   28 Row  key   Column   Visibility   Value   201307241535-­‐1   root:created_user:sean   audit   succeeded   201307241535-­‐1   root:set_password:sean   admin&audit   password   201307241537-­‐2   sean:logged_in:host   system   succeeded   201307241538-­‐3     sean:read:/tmp/a   audit   succeeded   201307241539-­‐4     sean:modified:/tmp/a   audit   failed   201307241540-­‐5     sean:logged_out:host   system   succeeded  
  • 29. Apache  Accumulo  and  Cloudera   VERSIONS  REDUX   29  
  • 30. Recap   •  Accumulo  1.3.x,  1.4.x,  and  1.5.x  all  work  with  CDH3   •  Accumulo  1.5.x  should  work  with  CDH4   30
  • 31. Cloudera  Support   •  Naturally,  Cloudera  has  tested  and  packaged   Accumulo  1.5…   •  But  1.5  is  rather  bleeding  edge…   •  So,  we  instead  back  ported  Hadoop  2.0  support  from   1.5  onto  1.4.3   31
  • 32. Apache  Accumulo  and  Cloudera   ECOSYSTEM  INTEGRATION   32  
  • 33. Apache  Nutch   33
  • 34. Apache  Pig   34
  • 35. Apache  Accumulo  and  Cloudera   DEMO   35  
  • 36. Apache  Accumulo  and  Cloudera   NEXT  STEPS   36  
  • 37. Recap   •  What’s  available  today   •  Beta  release  of  Accumulo  1.4.3  on  CDH4.3   •  Beta  release  of  Accumulo  1.4.3  Pig  integraNon   •  Semi-­‐private  beta   •  Contact  me  (joey@cloudera.com)  if  you’re  interested  in   trying  out  the  bits   37
  • 38. Future  Ideas  (not  promises  ;)   •  Cloudera  Manager  integraNon   •  Flume  integraNon   •  Sqoop  integraNon   •  Hive  integraNon   •  Impala  integraNon   38
  • 39. What  next?   •  Download  Hadoop!   •  CDH  available  at  www.cloudera.com   •  Cloudera  provides  pre-­‐loaded  VMs   •  hlps://ccp.cloudera.com/display/SUPPORT/Cloudera +QuickStart+VM   •  Reach  out  to  me  (joey@cloudera.com)  if  you  want  to   try  out  the  Accumulo  beta   •  InstrucNons  to  replicate  the  demos  pending  
  • 40. My  personal  preference   •  Cloudera  Manager   •  hlps://ccp.cloudera.com/display/SUPPORT/Downloads   •  Free  up  to  unlimited  nodes!  
  • 41. Shout  Out   •  Jason  Trost   •  @jason_trost   •  covert.io  blog  posts   •  hlp://www.covert.io/post/18414889381/accumulo-­‐ nutch-­‐and-­‐gora   •  hlp://www.covert.io/post/18605091231/accumulo-­‐and-­‐ pig  
  • 42. QuesNons?   •  Contact  me!   •  Joey  Echeverria   •  joey@cloudera.com   •  @fwiffo   •  We’re  hiring!  
  • 43. ©2013  Cloudera,  Inc.  All  Rights  Reserved.   43

×