Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee

43,951 views

Published on

Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Are you looking for IT Training with job placements? Search more than 5000 IT Certified Consultants here Register IT Courses at http://www.todaycourses.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee

  1. 1. Data science lifecycle with Apache Zeppelin And Spark 2015 Spark Summit Amsterdam Moon moon@nflabs.comNFLabs www.nflabs.com
  2. 2. Data science lifecycle
  3. 3. Data Science: process https://en.wikipedia.org/wiki/Data_analysis
  4. 4. Data Science: tools MLlib
  5. 5. Data Science: people Engineer Data Scientist DevOps Business http://aarondavis.design/
  6. 6. Hadoop Landscape Cloudera-ML ML-base MRQL Shark ?
  7. 7. Project Timeline ASF Incubation12.2014 08.2014 Started getting adoption http://zeppelin.incubator.apache.org 12.2012 Commercial Product for data analysis 10.2013 Open sourced a single feature
  8. 8. Commercial Product 12.2012
  9. 9. Zeppelin 10.2013
  10. 10. Zeppelin 10.2013
  11. 11. Zeppelin 08.2014
  12. 12. Zeppelin 08.2014
  13. 13. Third-party Products 10.2014
  14. 14. Apache Incubation Proposal 11.2014
  15. 15. Acceptance by Incubator 23.12.2014
  16. 16. Current Status 1 Release 71 Contributors worldwide 766 Stars on GH 300/900 Emails at users/dev @i.a.o
  17. 17. Interactive Notebooks
  18. 18. Interactive Visualization
  19. 19. Multiple Backends
  20. 20. Interpreter http://zeppelin.incubator.apache.org/docs/development/writingzeppelininterpreter.html
  21. 21. Writing an Interpreter public abstract void open(); public abstract void close(); public abstract InterpreterResult interpret(String st, InterpreterContext context); public abstract void cancel(InterpreterContext context); public abstract int getProgress(InterpreterContext context); public abstract List<String> completion(String buf, int cursor); public abstract FormType getFormType(); public Scheduler getScheduler(); Must have Good to have Advanced
  22. 22. Display System Zeppelin Server Spark Interpreter Other Interpreter Zeppelin webapp Websocket, REST Text Html Table Angular
  23. 23. Display System Select display system through output
  24. 24. Built in scheduler Built-in scheduler runs your notebook with cron expression.
  25. 25. Flexible layout Flexible layout
  26. 26. DEMO
  27. 27. Zeppelin & Friends Z-Manager ZeppelinHub … Collaboration/Sharing Packaging & Deployment Zeppelin + Full stack on a cloud Packages Backend Integration
  28. 28. Z-Manager installer
  29. 29. Deployment https://github.com/hortonworks-gallery/ambari-zeppelin-service
  30. 30. Deployment
  31. 31. As a Service
  32. 32. AWS EMR /aws.amazon.com/blogs/aws/amazon-emr-release-4-1-0-spark-1-5-0-hue-3-7-1-hdfs-encryption-presto-oozie-zeppelin-improved-resizing
  33. 33. Online Viewer
  34. 34. Zeppelin for organizations
  35. 35. An Engineer engineer by http://aarondavis.design/
  36. 36. A Team engineer by http://aarondavis.design/
  37. 37. An Organization engineer by http://aarondavis.design/
  38. 38. That’s too many! engineer by http://aarondavis.design/
  39. 39. What is the problem? Too much: Install Configure Cluster resources
  40. 40. Solution? We have containers + reverse proxy
  41. 41. Z Manager http://github.com/NFLabs/z-manager Apache 2.0 Licence Containerized deployment per user Reverse proxy Single binary Simple web application Z Manager SGA to ASF coming *
  42. 42. Z Manager Auto-update engineer by http://aarondavis.design/ Linux box go + react :) Z Manager process
  43. 43. Z Manager
  44. 44. ZeppelinHub https://www.zeppelinhub.com Sharing notebooks with access control
  45. 45. Zeppelin http://aarondavis.design/ Shares Notebook Provides multi-tenant environment z-manager ZeppelinHub
  46. 46. Data Science: people Engineer Data Scientist DevOps Business http://aarondavis.design/
  47. 47. Before Cloudera-ML ML-base MRQL Shark ?
  48. 48. After Cloudera-ML ML-base MRQL Shark
  49. 49. Project roadmap
  50. 50. Helium
  51. 51. People do the similar work with different data New visualization Model & Algorithm Data process pipeline engineer by http://aarondavis.design/
  52. 52. Package and distribute work New visualization Model & Algorithm Data process pipeline Pkg Repo engineer by http://aarondavis.design/
  53. 53. Helium https://s.apache.org/helium Platform for on top of Apache Zeppelin Data Analytics Application
  54. 54. Helium Application = + View Algorithm Zeppelin provided Resources
  55. 55. Resources Data Computing Any java object - Result of last execution - JDBC connection (from JDBC Interpreter)* - SparkContext (from SparkInterpreter) - Flink environment (from FlinkInterpreter)* - Provided by user created Interpreter - Provided by user created Helium application
  56. 56. Application Examples Data Computing - ex) get git commit log data https://github.com/Leemoonsoo/zeppelin-gitcommitdata Visualization - ex) run cpu usage monitoring code across spark cluster, using SparkContext https://github.com/Leemoonsoo/zeppelin-sparkmon - ex) display result data as a wordcloud https://github.com/Leemoonsoo/zeppelin-wordcloud
  57. 57. How it works Zeppelin Server Web browser View Interpreter Process Algorithm Resource pool Resource pool Resource pools are connected “Algorithm runs where resource exists”
  58. 58. API class YourApplication extends org.apache.zeppelin.helium.Application { @Override public void run(ApplicationArgument arg, InterpreterContext context) { ….. } } Easy API Just extend helium.Application
  59. 59. Application Spec { mavenArtifact : "groupId:artifactId:version", className : "your.helium.application.Class", icon : "fa fa-cloud", name : "My app name", description : “some description", consume : [ "org.apache.spark.SparkContext" ] } Simple Writing a spec file allow Zeppelin load application
  60. 60. Deploy Public Repository Private Repository Handy Private Public Packaged to Jar and Distributed through Maven Downloaded on the fly and run when user selects it
  61. 61. Thank you Q & A Moon moon@nflabs.com NFLabs www.nflabs.com http://zeppelin.incubator.apache.org/

×