Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

4,198 views

Published on

This talk shows how we can use Apache Flink and Apache Zeppelin to do interactive data analysis. The examples show the usage of FlinkML to solve a linear regression and classification problem.

Published in: Technology

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

  1. 1. Till Rohrmann Flink PMC member trohrmann@apache.org @stsffap Interactive Data Analysis with Apache Flink
  2. 2. Data Analysis 1
  3. 3. Exploratory Data Analysis §  Visualize data §  Calculate main characteristics §  Understand data and find possibly new hypothesis 2
  4. 4. Data Analysts 3
  5. 5. Read-Evaluate-Print Loop §  New Scala shell offers REPL §  Interactive queries §  Let’s you explore data quickly 4
  6. 6. Scala Shell 5
  7. 7. Simple Scala Shell Example 6
  8. 8. Problems §  No visualization §  No saving or replaying of written code §  No assistance à Bad IDE 7
  9. 9. Notebooks §  Web-based interactive computation environment §  Combines rich text, execution code, plots and rich media §  Storytelling 8
  10. 10. Apache Zeppelin §  Web-based REPL with pluggable interpreters §  Since 2014 in the Apache Incubator §  Supported interpreters: •  Flink •  Spark •  Python •  Markdown •  Many more … 9
  11. 11. Word Count with Zeppelin §  Find the 10 most frequent words with more than 4 letters in the King James version of the bible. 10
  12. 12. 11
  13. 13. 12
  14. 14. 13
  15. 15. 14
  16. 16. Linear regression §  Let’s predict the influence of advertisement spending on sales §  Input data set: http://www-bcf.usc.edu/~gareth/ISL/ Advertising.csv §  Features: •  TV advertisement money •  Radio advertisement money •  Newspaper advertisement money §  Response: •  Sales 15
  17. 17. 16
  18. 18. 17
  19. 19. 18
  20. 20. 19
  21. 21. 20
  22. 22. 21
  23. 23. 22
  24. 24. 23
  25. 25. 24
  26. 26. Classification §  Let’s build a classifier for insult detection §  Kaggle challenge https://www.kaggle.com/c/detecting- insults-in-social-commentary §  Label: 1 – Insult, 0 – No insult §  Feature: Comment text 25
  27. 27. 26
  28. 28. 27
  29. 29. Conclusion §  Interactive data analysis is really easy with Apache Flink §  Apache Zeppelin is great interactive notebook §  Zeppelin and Flink play well together to solve machine learning tasks and more 28
  30. 30. 29
  31. 31. flink.apache.org @ApacheFlink

×