Storm	  and	  Hadoop:	  	  Convergence	  of	  Big-­‐Data	  and	  Low-­‐Latency	  Processing	  	  	  Andy	  Feng	  (afeng@y...
Yahoo!:	  Personalized	  Web	  
Delivering	  Personalized	  Web	  
Convergence:	  Big	  Data	  +	  Low	  Latency	  
Storm:	  Distributed	  Stream	  Processing	  hNps://github.com/nathanmarz/storm	  X	  Example	  of	  Streams	  •  User	  a...
Storm	  API:	  Illustrated	  public	  class	  DoubleAndTripleBolt	  extends	  BaseRichBolt	  {	  	  	  private	  OutputCol...
Storm	  on	  Grid	  @	  Yahoo!	  
Storm	  Dashboard	  on	  Grid	  •  Container	  level	  •  ApplicaRon	  level	  
Hadoop	  YARN:	  MapReduce	  &	  Beyond	  hNp://hadoop.apache.org/docs/r0.23.6/hadoop-­‐yarn/hadoop-­‐yarn-­‐site/YARN.htm...
Storm	  Enhancement	  by	  Yahoo!	  •  YARN	  IntegraRon	  	  –  Enable	  Storm	  topologies	  to	  leverage	  Hadoop	  re...
Storm-­‐on-­‐YARN:	  Set	  Up	  Cluster	  
Storm-­‐on-­‐YARN:	  Launch	  Storm	  App	  
Storm-­‐on-­‐YARN:	  Expand	  Cluster	  
AuthenRcaRon/AuthorizaRon/Audit	  •  AuthenRcaRon	  plugins	  –  Kerberos	  (soon)	  –  Digest	  –  None	  –  Bring	  your...
Tuple	  SerializaRon	  &	  Transport	  •  Tuple	  serializaRon	  plugins	  –  Default	  serializer	  –  Blowfish	  serializ...
Conclusion	  •  Yahoo!	  is	  leading	  the	  emergence	  of	  big-­‐data	  &	  low-­‐latency	  processing	  via	  open	  ...
We	  Are	  Hiring!	  Please	  reach	  out	  to	  Michael	  Grossmann	  <grossman@yahoo-­‐inc.com>	  
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Processing
Upcoming SlideShare
Loading in …5
×

April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Processing

2,187 views
2,106 views

Published on

At Yahoo!, Hadoop plays a central role in providing personalized experiences for our users and creating value for our advertisers. In this talk, we will discuss the convergence of low-latency processing and Hadoop platform. Through a collection of use cases, we will explain how Yahoo! delivers personalized user experience through Hadoop and Storm. We have developed Storm-on-YARN to enable Storm streaming/micro-batch applications and Hadoop batch applications hosted on a single cluster. Storm applications could leverage YARN for resource management, and apply Hadoop style security to Hadoop datasets on HDFS and HBase. Yahoo! has recently released our Storm enhancement as open source.

Presenter(s):
Andy Feng, Distinguished Architect, Cloud Engineering Group, Yahoo!
Bobby Evans, Tech Yahoo!, Apache Hadoop PMC and Committer

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,187
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Processing

  1. 1. Storm  and  Hadoop:    Convergence  of  Big-­‐Data  and  Low-­‐Latency  Processing      Andy  Feng  (afeng@yahoo-­‐inc.com)  Robert  Evans  (evans@yahoo-­‐inc.com)  Yahoo!  Inc.      
  2. 2. Yahoo!:  Personalized  Web  
  3. 3. Delivering  Personalized  Web  
  4. 4. Convergence:  Big  Data  +  Low  Latency  
  5. 5. Storm:  Distributed  Stream  Processing  hNps://github.com/nathanmarz/storm  X  Example  of  Streams  •  User  acRviRes  •  Ad  beacons  •  Content  feeds  •  Social  feeds  •  …  spout  bolt   bolt  bolt  
  6. 6. Storm  API:  Illustrated  public  class  DoubleAndTripleBolt  extends  BaseRichBolt  {      private  OutputCollectorBase  _collector;      …    public  void  execute(Tuple  input)  {        int  val  =  input.getInteger(0);        _collector.emit(input,  new  Values(val*2,  val*3));        _collector.ack(input);                    }  }  
  7. 7. Storm  on  Grid  @  Yahoo!  
  8. 8. Storm  Dashboard  on  Grid  •  Container  level  •  ApplicaRon  level  
  9. 9. Hadoop  YARN:  MapReduce  &  Beyond  hNp://hadoop.apache.org/docs/r0.23.6/hadoop-­‐yarn/hadoop-­‐yarn-­‐site/YARN.html  Yahoo!  has  deployed  Hadoop  YARN  into  over  40k  machines  in  producRon.    
  10. 10. Storm  Enhancement  by  Yahoo!  •  YARN  IntegraRon    –  Enable  Storm  topologies  to  leverage  Hadoop  resources  •  Coming  soon  at  github.com/yahoo/storm-­‐yarn  •  Storm  enhancement  •  Contributed  to  Storm  via  pull  requests  –  Security  •  AuthenRcaRon,  AuthorizaRon,  Audit  (Pull  #469,  #511,  #528)  •  SerializaRon  (Pull  #461,  #472,  #473)  •  UI  (Pull  #488)  –  Message  Transport  •  0MQ  replacement  (Pull  #518)  –  Reliability  •  Zookeeper  client  exponenRal  back-­‐off  (Pull  #471)  •  Bug  fix  (Pull  #476)  •  Many  test  cases  
  11. 11. Storm-­‐on-­‐YARN:  Set  Up  Cluster  
  12. 12. Storm-­‐on-­‐YARN:  Launch  Storm  App  
  13. 13. Storm-­‐on-­‐YARN:  Expand  Cluster  
  14. 14. AuthenRcaRon/AuthorizaRon/Audit  •  AuthenRcaRon  plugins  –  Kerberos  (soon)  –  Digest  –  None  –  Bring  your  own  •  AuthorizaRon  plugins  –  Accept  all  –  Limited  operaRons  only  –  User  whitelist  –  Bring  your  own  •  Audit  –  Access  log  on  Nimbus/DRPC  servers  
  15. 15. Tuple  SerializaRon  &  Transport  •  Tuple  serializaRon  plugins  –  Default  serializer  –  Blowfish  serializer  (encrypRon)  –  Bring  your  own  •  Transport  Plugins  –  0MQ  –  Java  Nio.2  –  Bring  your  own  Tuple  
  16. 16. Conclusion  •  Yahoo!  is  leading  the  emergence  of  big-­‐data  &  low-­‐latency  processing  via  open  source  collaboraRon.  –  Join  us  at  Hadoop  Summit  2013  for  update  
  17. 17. We  Are  Hiring!  Please  reach  out  to  Michael  Grossmann  <grossman@yahoo-­‐inc.com>  

×