Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Storm-on-YARN: Convergence of Low-Latency and Big-Data

7,150 views

Published on

adoop plays a central role for Yahoo! to provide personalized experiences for our users and create value for our advertisers. In this talk, we will discuss the convergence of low-latency processing and Hadoop platform. To enable the convergence, we have developed Storm-on-YARN to enable Storm streaming/microbatch applications and Hadoop batch applications hosted in a single cluster. Storm applications could leverage YARN for resource management, and apply Hadoop style security to Hadoop datasets on HDFS and HBase. In Storm-on-YARN, YARN is used to launch Storm application master (Nimbus), and enable Nimbus to request resources for Storm workers (Supervisors). YARN resource manager and Storm scheduler work together to support multi-tenancy and high availability. HDFS enables Storm to achieve higher availability of Nimbus itself. We are introducing Hadoop style security into Storm through JAAS authentication (Kerberos and Digest). Storm servers (Nimbus and DRPC) will be configured with authorization plugins for access control and audit. The security context enables Storm applications to access authorized datasets only (including those created by Hadoop applications). Yahoo! is making our contribution on Storm and YARN available as open source. We will work with industry partners to foster the convergence of low-latency processing and big-data.

Published in: Technology, Business
  • hi feng! how to deploy storm-yarn , when i run
    [hd@master bin]$ ./storm-yarn launch
    storm is not installed
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Storm-on-YARN: Convergence of Low-Latency and Big-Data

  1. 1. Storm-on-YARN: Convergence of Low-Latency and Big-Data Andrew Feng
  2. 2. Self Introduction • Current – Distinguished Architect, Yahoo! Hadoop Team – Core contributor at Storm project • Past – Online advertisement – Personalization – Serving containers – Cloud services – NoSQL database – Application server
  3. 3. Agenda • Business motivation • Technical overview • Open source
  4. 4. Yahoo!: Personalized Web
  5. 5. Personalization w/ Hadoop Understand user & content/ads Select relevant content & ads
  6. 6. Personalization w/ Low-Latency Latest content per current interests
  7. 7. Big Data + Low Latency: Design Pattern • Personalization • Ad targeting • Reporting • Ad budgeting • Fraud detection • Trending topics
  8. 8. Agenda • Business motivation • Technical overview • Open source
  9. 9. Hadoop YARN: MapReduce & Beyond • Yahoo! deployed YARN into 30k+ nodes in production. • YARN Apps … MapReduce, Storm, etc.
  10. 10. Storm: Distributed Stream Processing https://github.com/nathanmarz/storm X Streams • User activities • Ad beacons • Content feeds • Social feeds • …
  11. 11. Storm Clusters on Hadoop Grid
  12. 12. Storm-YARN: Launch Cluster • Result: <appID> of the newly launched Storm master • storm-yarn launch <conf> – Initial # of supervisors – memory size of allocated container
  13. 13. Storm-YARN: Manage Cluster 1. addSupervisors <appID> <count> 2. getStormConfig <appID> 3. setStormConfig <appID> 4. startNimbus <appID> 5. stopNimbus <appID> 6. startUI <appID> 7. stopUI <appID> 8. startSupervisors <appID> 9. stopSupervisors <appID>
  14. 14. Storm-YARN: Deploy Apps storm jar <appJar>
  15. 15. Authentication/Authorization/Audit • Authentication plugins – Digest – Kerberos (soon) – None – Bring your own • Authorization plugins – Accept all – Limited operations only – User whitelist – Bring your own • Audit – Access log
  16. 16. Agenda • Business motivation • Technical overview • Open source
  17. 17. Storm-YARN: Open Source • Code released for early access – under the Apache 2.0 License – move to apache.org later • Welcome contribution! – Submit proposals – Sign Apache style CLA – Submit git pull requests https://github.com/yahoo/storm-yarn
  18. 18. Storm-YARN: mvn test 1. storm-yarn launch – ./conf/storm.yaml -- stormZip lib/storm.zip -- appname storm-on-yarn-test -- output target/appId.txt 2. storm-yarn getStormConfig – ./conf/storm.yaml -- appId application_1372121842369_ 0001 --output ./lib/storm/storm.yaml 3. storm jar – lib/storm-starter-0.0.1- SNAPSHOT.jar – storm.starter.WordCountTopology – word-count-topology 4. storm kill – word-count-topology 5. storm-yarn shutdown – ./conf/storm.yaml -- appId application_1372121842369_ 0001
  19. 19. Storm-YARN: Deployment Install Storm S/W 1. hadoop fs –put storm.zip /lib/storm/<version>/stor m.zip Apply Storm-YARN 2. storm-yarn launch  <appID> 3. storm-yarn getStormConfig <appID>  <storm.yaml> 4. storm jar <appJar>
  20. 20. Conclusion • YARN empowers the emergence of big-data & low-latency processing • Yahoo! open source: – Storm-yarn @ github/yahoo – Spark-yarn @ spark- project.org
  21. 21. ?Questions

×