© 2018 KNIME AG. All Right Reserved.
From Raw Data to Deployment
Paolo Tamagnini: paolo.tamagnini@knime.com
Maarit Widmann: maarit.widmann@knime.com
Rosaria Silipo: rosaria.silipo@knime.com
@KNIME
© 2018 KNIME AG. All Rights Reserved.
Do you recognize this?
2
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
© 2018 KNIME AG. All Rights Reserved.
Let’s unroll it!
It always starts
with some data …
3
Data
Preparation
Model
Training
Model
Optimization
Deployment
Data Manipulation
Data Blending
Missing Values Handling
Feature Generation
Dimensionality Reduction
Feature Selection
Outlier Removal
Normalization
Partitioning
…
Model Training
Bag of Models
Model Selection
Ensemble Models
Own Ensemble Model
External Models
Import Existing Models
Model Factory
…
Parameter Tuning
Parameter Optimization
Regularization
Model Size
No. Iterations
…
Performance Measures
Accuracy
ROC Curve
Cross-Validation
…
Files & DBs
Dashboards
REST API
SQL Code Export
Reporting
…
Model
Evaluation
© 2018 KNIME AG. All Rights Reserved.
The many Lives of a Dataset
4
Data
Preparation
Model
Training
Model
Optimization
Model
Evaluation
Deployment
Partitioning:
• Training Set
• Validation Set
• Test Set
Training Set Validation Set Test Set New Data from Real
World Applications
Original Data
Set with Past
Observations
© 2018 KNIME AG. All Rights Reserved.
Data Exploration
• Sometimes in between Data Access and Data
Preparation there is a Data Exploration phase
• The Data Exploration phase is useful to get to
know the data
• KNIME offers a few visualization nodes to build
dashboards to explore the data
5
© 2018 KNIME AG. All Rights Reserved.
What about Big Data?
• Big Data serves Scalability
• The whole Analytics Process is no different on
Big Data
• You need:
– a Big Data Platform
– The KNIME Big Data (Spark & Hive) Extension
6
© 2018 KNIME AG. All Rights Reserved.
One Example for Every Need
The KNIME EXAMPLES Server
7
50_Applications
© 2018 KNIME AG. All Rights Reserved.
Classification Problem & Data Set
• Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html
• Smaller dataset (Jan 2007) (AirlineDataset.table)
• Challenge:
Predict Departure Delays
If on original airline dataset, only flights from airport ORD
Output Class = “delay” if depdelay > 15min
otherwise “no delay”
Available features: date, dep time, arr time, carrier, destination, cancelled, …
8
© 2018 KNIME AG. All Rights Reserved.
Challenges
• Group 1. Data Access and Data Preparation
• Group 2. ML Model Training
• Group 3. Model Deployment
• Import file Learnathon_2018.knar into your workspace
9
© 2018 KNIME AG. All Rights Reserved.
Group 1. Data Access and Data Preparation
10
© 2018 KNIME AG. All Rights Reserved.
Group 2. Model Training & Optimization
11
© 2018 KNIME AG. All Rights Reserved.
Group 3. Deployment
12
© 2018 KNIME AG. All Rights Reserved.
KNIME Fall Summit 2018
November 6 – 9 at AT&T Executive Education and Conference Center,
Austin, Texas
• Tuesday & Wednesday: One-day courses
• Thursday & Friday: Summit sessions
Use the code
LEARNATHON
for 10% off tickets!
Register at:
knime.com/fall-summit2018
13
© 2018 KNIME AG. All Rights Reserved.
Save the Date: KNIME Spring Summit 2019
March 18 – 22 at bcc Berlin Congress Center, Berlin
• Monday & Tuesday: One-day courses
• Wednesday & Thursday: Summit sessions
• Friday: Workshops
Use the code
LEARNATHON
for 10% off tickets!
Tickets available soon at
www.knime.com
© 2018 KNIME AG. All Rights Reserved.
KNIME Beginner’s Luck
Free Copy of KNIME Beginner’s Luck Book from KNIME Press
https://www.knime.com/knimepress
with this code: HAMBURG-0918
15
© 2018 KNIME AG. All Rights Reserved.
You can find KNIMers here!
16
• KNIME (www.knime.com)
• BLOG for news, tips and tricks(www.knime.com/blog)
• FORUM for questions and answers (tech.knime.com/forum)
• EXAMPLE SERVER for example workflows
• LEARNING HUB (www.knime.com/learning-hub)
• KNIME TV channel on
• KNIME on @KNIME
• KNIME on https://www.facebook.com/KNIMEanalytics
• On
© 2018 KNIME AG. All Rights Reserved. 17
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH,
and are registered in the United States. KNIME® is also registered in Germany.
Thank You!

KNIME Data Science Learnathon: From Raw Data To Deployment

  • 1.
    © 2018 KNIMEAG. All Right Reserved. From Raw Data to Deployment Paolo Tamagnini: paolo.tamagnini@knime.com Maarit Widmann: maarit.widmann@knime.com Rosaria Silipo: rosaria.silipo@knime.com @KNIME
  • 2.
    © 2018 KNIMEAG. All Rights Reserved. Do you recognize this? 2 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 3.
    © 2018 KNIMEAG. All Rights Reserved. Let’s unroll it! It always starts with some data … 3 Data Preparation Model Training Model Optimization Deployment Data Manipulation Data Blending Missing Values Handling Feature Generation Dimensionality Reduction Feature Selection Outlier Removal Normalization Partitioning … Model Training Bag of Models Model Selection Ensemble Models Own Ensemble Model External Models Import Existing Models Model Factory … Parameter Tuning Parameter Optimization Regularization Model Size No. Iterations … Performance Measures Accuracy ROC Curve Cross-Validation … Files & DBs Dashboards REST API SQL Code Export Reporting … Model Evaluation
  • 4.
    © 2018 KNIMEAG. All Rights Reserved. The many Lives of a Dataset 4 Data Preparation Model Training Model Optimization Model Evaluation Deployment Partitioning: • Training Set • Validation Set • Test Set Training Set Validation Set Test Set New Data from Real World Applications Original Data Set with Past Observations
  • 5.
    © 2018 KNIMEAG. All Rights Reserved. Data Exploration • Sometimes in between Data Access and Data Preparation there is a Data Exploration phase • The Data Exploration phase is useful to get to know the data • KNIME offers a few visualization nodes to build dashboards to explore the data 5
  • 6.
    © 2018 KNIMEAG. All Rights Reserved. What about Big Data? • Big Data serves Scalability • The whole Analytics Process is no different on Big Data • You need: – a Big Data Platform – The KNIME Big Data (Spark & Hive) Extension 6
  • 7.
    © 2018 KNIMEAG. All Rights Reserved. One Example for Every Need The KNIME EXAMPLES Server 7 50_Applications
  • 8.
    © 2018 KNIMEAG. All Rights Reserved. Classification Problem & Data Set • Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html • Smaller dataset (Jan 2007) (AirlineDataset.table) • Challenge: Predict Departure Delays If on original airline dataset, only flights from airport ORD Output Class = “delay” if depdelay > 15min otherwise “no delay” Available features: date, dep time, arr time, carrier, destination, cancelled, … 8
  • 9.
    © 2018 KNIMEAG. All Rights Reserved. Challenges • Group 1. Data Access and Data Preparation • Group 2. ML Model Training • Group 3. Model Deployment • Import file Learnathon_2018.knar into your workspace 9
  • 10.
    © 2018 KNIMEAG. All Rights Reserved. Group 1. Data Access and Data Preparation 10
  • 11.
    © 2018 KNIMEAG. All Rights Reserved. Group 2. Model Training & Optimization 11
  • 12.
    © 2018 KNIMEAG. All Rights Reserved. Group 3. Deployment 12
  • 13.
    © 2018 KNIMEAG. All Rights Reserved. KNIME Fall Summit 2018 November 6 – 9 at AT&T Executive Education and Conference Center, Austin, Texas • Tuesday & Wednesday: One-day courses • Thursday & Friday: Summit sessions Use the code LEARNATHON for 10% off tickets! Register at: knime.com/fall-summit2018 13
  • 14.
    © 2018 KNIMEAG. All Rights Reserved. Save the Date: KNIME Spring Summit 2019 March 18 – 22 at bcc Berlin Congress Center, Berlin • Monday & Tuesday: One-day courses • Wednesday & Thursday: Summit sessions • Friday: Workshops Use the code LEARNATHON for 10% off tickets! Tickets available soon at www.knime.com
  • 15.
    © 2018 KNIMEAG. All Rights Reserved. KNIME Beginner’s Luck Free Copy of KNIME Beginner’s Luck Book from KNIME Press https://www.knime.com/knimepress with this code: HAMBURG-0918 15
  • 16.
    © 2018 KNIMEAG. All Rights Reserved. You can find KNIMers here! 16 • KNIME (www.knime.com) • BLOG for news, tips and tricks(www.knime.com/blog) • FORUM for questions and answers (tech.knime.com/forum) • EXAMPLE SERVER for example workflows • LEARNING HUB (www.knime.com/learning-hub) • KNIME TV channel on • KNIME on @KNIME • KNIME on https://www.facebook.com/KNIMEanalytics • On
  • 17.
    © 2018 KNIMEAG. All Rights Reserved. 17 The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. Thank You!