Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Till Rohrmann
Flink PMC member
trohrmann@apache.org
@stsffap
Interactive Data Analysis
with Apache Flink
Data Analysis
1
Exploratory Data Analysis
§  Visualize data
§  Calculate main
characteristics
§  Understand data and
find possibly new
h...
Data Analysts
3
Read-Evaluate-Print Loop
§  New Scala shell offers REPL
§  Interactive queries
§  Let’s you explore data quickly
4
Scala Shell
5
Simple Scala Shell Example
6
Problems
§  No visualization
§  No saving or replaying of written code
§  No assistance à Bad IDE
7
Notebooks
§  Web-based interactive
computation
environment
§  Combines rich text,
execution code, plots
and rich media
§...
Apache Zeppelin
§  Web-based REPL with pluggable
interpreters
§  Since 2014 in the Apache Incubator
§  Supported interp...
Word Count with Zeppelin
§  Find the 10 most frequent words with
more than 4 letters in the King James
version of the bib...
11
12
13
14
Linear regression
§  Let’s predict the influence of advertisement
spending on sales
§  Input data set:
http://www-bcf.usc...
16
17
18
19
20
21
22
23
24
Classification
§  Let’s build a classifier for insult detection
§  Kaggle challenge
https://www.kaggle.com/c/detecting-
in...
26
27
Conclusion
§  Interactive data analysis is really easy with
Apache Flink
§  Apache Zeppelin is great interactive
noteboo...
29
flink.apache.org
@ApacheFlink
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
Big Data visualization with Apache Spark and Zeppelin
Next
Download to read offline and view in fullscreen.

15

Share

Download to read offline

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

Download to read offline

This talk shows how we can use Apache Flink and Apache Zeppelin to do interactive data analysis. The examples show the usage of FlinkML to solve a linear regression and classification problem.

Related Books

Free with a 30 day trial from Scribd

See all

Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin

  1. 1. Till Rohrmann Flink PMC member trohrmann@apache.org @stsffap Interactive Data Analysis with Apache Flink
  2. 2. Data Analysis 1
  3. 3. Exploratory Data Analysis §  Visualize data §  Calculate main characteristics §  Understand data and find possibly new hypothesis 2
  4. 4. Data Analysts 3
  5. 5. Read-Evaluate-Print Loop §  New Scala shell offers REPL §  Interactive queries §  Let’s you explore data quickly 4
  6. 6. Scala Shell 5
  7. 7. Simple Scala Shell Example 6
  8. 8. Problems §  No visualization §  No saving or replaying of written code §  No assistance à Bad IDE 7
  9. 9. Notebooks §  Web-based interactive computation environment §  Combines rich text, execution code, plots and rich media §  Storytelling 8
  10. 10. Apache Zeppelin §  Web-based REPL with pluggable interpreters §  Since 2014 in the Apache Incubator §  Supported interpreters: •  Flink •  Spark •  Python •  Markdown •  Many more … 9
  11. 11. Word Count with Zeppelin §  Find the 10 most frequent words with more than 4 letters in the King James version of the bible. 10
  12. 12. 11
  13. 13. 12
  14. 14. 13
  15. 15. 14
  16. 16. Linear regression §  Let’s predict the influence of advertisement spending on sales §  Input data set: http://www-bcf.usc.edu/~gareth/ISL/ Advertising.csv §  Features: •  TV advertisement money •  Radio advertisement money •  Newspaper advertisement money §  Response: •  Sales 15
  17. 17. 16
  18. 18. 17
  19. 19. 18
  20. 20. 19
  21. 21. 20
  22. 22. 21
  23. 23. 22
  24. 24. 23
  25. 25. 24
  26. 26. Classification §  Let’s build a classifier for insult detection §  Kaggle challenge https://www.kaggle.com/c/detecting- insults-in-social-commentary §  Label: 1 – Insult, 0 – No insult §  Feature: Comment text 25
  27. 27. 26
  28. 28. 27
  29. 29. Conclusion §  Interactive data analysis is really easy with Apache Flink §  Apache Zeppelin is great interactive notebook §  Zeppelin and Flink play well together to solve machine learning tasks and more 28
  30. 30. 29
  31. 31. flink.apache.org @ApacheFlink
  • Wavyx

    Jul. 27, 2018
  • XinWang65

    Jan. 16, 2018
  • SanketDeshmukh5

    Jan. 8, 2018
  • SeanHester

    Dec. 17, 2017
  • themoff

    Jan. 12, 2017
  • DouglasMoore

    Oct. 12, 2016
  • liubai39

    May. 11, 2016
  • jee1033

    Apr. 26, 2016
  • manuzhang

    Apr. 18, 2016
  • ChristianMeyer15

    Feb. 15, 2016
  • caidong

    Jan. 27, 2016
  • markosovs

    Jan. 16, 2016
  • mring33

    Dec. 25, 2015
  • liews

    Oct. 10, 2015
  • bunkertor

    Aug. 1, 2015

This talk shows how we can use Apache Flink and Apache Zeppelin to do interactive data analysis. The examples show the usage of FlinkML to solve a linear regression and classification problem.

Views

Total views

5,564

On Slideshare

0

From embeds

0

Number of embeds

170

Actions

Downloads

124

Shares

0

Comments

0

Likes

15

×