© 2018 KNIME AG. All Right Reserved.
From Raw Data to Deployment
Scott.Fincher@knime.com
Jeanette.Prinz@knime.com
Kathrin.Melcher@knime.com
@KNIME #KNIMERoadshow
© 2018 KNIME AG. All Rights Reserved.
Do you recognize this?
2
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
© 2018 KNIME AG. All Rights Reserved.
Let’s unroll it!
It always starts
with some data …
3
Data
Preparation
Model
Training
Model
Optimization
Deployment
Data Manipulation
Data Blending
Missing Values Handling
Feature Generation
Dimensionality Reduction
Feature Selection
Outlier Removal
Normalization
Partitioning
…
Model Training
Bag of Models
Model Selection
Ensemble Models
Own Ensemble Model
External Models
Import Existing Models
Model Factory
…
Parameter Tuning
Parameter Optimization
Regularization
Model Size
No Iterations
…
Performance Measures
Accuracy
ROC Curve
Cross-Validation
…
Files & DBs
Dashboards
REST API
SQL Code Export
Reporting
…
Model
Evaluation
© 2018 KNIME AG. All Rights Reserved.
The many Lives of a Dataset
4
Data
Preparation
Model
Training
Model
Optimization
Model
Evaluation
Deployment
Partitioning:
• Training Set
• Validation Set
• Test Set
Training Set Validation Set Test Set New Data from Real
World Applications
Original Data
Set with Past
Observations
© 2018 KNIME AG. All Rights Reserved.
Data Exploration
• Sometimes in between Data Access and Data
Preparation there is a Data Exploration phase
• The Data Exploration phase is useful to get to
know the data
• KNIME offers a few visualization nodes to build
dashboards to explore the data
5
© 2018 KNIME AG. All Rights Reserved.
What about Big Data?
• Big Data serves Scalability
• The whole Analytics Process is no different on
Big Data
• You need:
– a Big Data Platform
– The KNIME Big Data (Spark & Hive) Extension
6
© 2018 KNIME AG. All Rights Reserved.
One Example for Every Need
The KNIME EXAMPLES Server
7
50_Applications/27_FromRawDataToDeployment
© 2018 KNIME AG. All Rights Reserved.
Classification Problem & Data Set
• Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html
• Smaller dataset (Jan 2007) (AirlineDataset.table)
• Challenge:
Predict Departure Delays
If on original airline dataset, only flights from airport ORD
Output Class = “delay” if depdelay > 15min
otherwise “no delay”
Input features all what is available and more if you can find it!
8
© 2018 KNIME AG. All Rights Reserved.
Challenges
• Group 1. Data Access and Data Preparation
• Group 2. ML Model Training
• Group 3. Model Deployment
• Import file Learnathon_2018.knar into your workspace
9
© 2018 KNIME AG. All Rights Reserved.
Group 1. Data Access and Data Preparation
10
© 2018 KNIME AG. All Rights Reserved.
Group 2. Model Training & Optimization
11
© 2018 KNIME AG. All Rights Reserved.
Group 3. Deployment
12
© 2018 KNIME AG. All Rights Reserved.
One Week of KNIME Courses in Austin
• Course for KNIME Analytics Platform, April 23-24, 2018
• Course for KNIME Server, April 25, 2018
• Text Mining Course for KNIME Analytics Platform, April 26, 2018
• Big Data Course for KNIME Analytics Platform, April 27, 2018
13
© 2018 KNIME AG. All Rights Reserved.
KNIME Fall Summit 2018
November 6 – 9 at AT&T Executive Education and Conference Center,
Austin, Texas
• Tuesday & Wednesday: One-day courses
• Thursday & Friday: Summit sessions
Use the code
US-ROADSHOW
for 10% off tickets!
Register at
www.KNIME.com
© 2018 KNIME AG. All Rights Reserved.
KNIME Beginner’s Luck Book
Free Copy of KNIME Beginner’s Luck Book at KNIME Press
https://www.knime.org/knimepress
Promotion Code:
KNIME_Learnathon_2018
© 2018 KNIME AG. All Rights Reserved.
You can find KNIMers here!
16
• KNIME (www.knime.com)
• BLOG for news, tips and tricks(www.knime.com/blog)
• FORUM for questions and answers (tech.knime.com/forum)
• EXAMPLE SERVER for example workflows
• LEARNING HUB (www.knime.com/learning-hub)
• KNIME TV channel on
• KNIME on @KNIME
• KNIME on https://www.facebook.com/KNIMEanalytics
• On
© 2017 KNIME AG. All Rights Reserved. 17
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH,
and are registered in the United States. KNIME® is also registered in Germany.
Thank You!

From Raw Data to Deployment

  • 1.
    © 2018 KNIMEAG. All Right Reserved. From Raw Data to Deployment Scott.Fincher@knime.com Jeanette.Prinz@knime.com Kathrin.Melcher@knime.com @KNIME #KNIMERoadshow
  • 2.
    © 2018 KNIMEAG. All Rights Reserved. Do you recognize this? 2 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 3.
    © 2018 KNIMEAG. All Rights Reserved. Let’s unroll it! It always starts with some data … 3 Data Preparation Model Training Model Optimization Deployment Data Manipulation Data Blending Missing Values Handling Feature Generation Dimensionality Reduction Feature Selection Outlier Removal Normalization Partitioning … Model Training Bag of Models Model Selection Ensemble Models Own Ensemble Model External Models Import Existing Models Model Factory … Parameter Tuning Parameter Optimization Regularization Model Size No Iterations … Performance Measures Accuracy ROC Curve Cross-Validation … Files & DBs Dashboards REST API SQL Code Export Reporting … Model Evaluation
  • 4.
    © 2018 KNIMEAG. All Rights Reserved. The many Lives of a Dataset 4 Data Preparation Model Training Model Optimization Model Evaluation Deployment Partitioning: • Training Set • Validation Set • Test Set Training Set Validation Set Test Set New Data from Real World Applications Original Data Set with Past Observations
  • 5.
    © 2018 KNIMEAG. All Rights Reserved. Data Exploration • Sometimes in between Data Access and Data Preparation there is a Data Exploration phase • The Data Exploration phase is useful to get to know the data • KNIME offers a few visualization nodes to build dashboards to explore the data 5
  • 6.
    © 2018 KNIMEAG. All Rights Reserved. What about Big Data? • Big Data serves Scalability • The whole Analytics Process is no different on Big Data • You need: – a Big Data Platform – The KNIME Big Data (Spark & Hive) Extension 6
  • 7.
    © 2018 KNIMEAG. All Rights Reserved. One Example for Every Need The KNIME EXAMPLES Server 7 50_Applications/27_FromRawDataToDeployment
  • 8.
    © 2018 KNIMEAG. All Rights Reserved. Classification Problem & Data Set • Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html • Smaller dataset (Jan 2007) (AirlineDataset.table) • Challenge: Predict Departure Delays If on original airline dataset, only flights from airport ORD Output Class = “delay” if depdelay > 15min otherwise “no delay” Input features all what is available and more if you can find it! 8
  • 9.
    © 2018 KNIMEAG. All Rights Reserved. Challenges • Group 1. Data Access and Data Preparation • Group 2. ML Model Training • Group 3. Model Deployment • Import file Learnathon_2018.knar into your workspace 9
  • 10.
    © 2018 KNIMEAG. All Rights Reserved. Group 1. Data Access and Data Preparation 10
  • 11.
    © 2018 KNIMEAG. All Rights Reserved. Group 2. Model Training & Optimization 11
  • 12.
    © 2018 KNIMEAG. All Rights Reserved. Group 3. Deployment 12
  • 13.
    © 2018 KNIMEAG. All Rights Reserved. One Week of KNIME Courses in Austin • Course for KNIME Analytics Platform, April 23-24, 2018 • Course for KNIME Server, April 25, 2018 • Text Mining Course for KNIME Analytics Platform, April 26, 2018 • Big Data Course for KNIME Analytics Platform, April 27, 2018 13
  • 14.
    © 2018 KNIMEAG. All Rights Reserved. KNIME Fall Summit 2018 November 6 – 9 at AT&T Executive Education and Conference Center, Austin, Texas • Tuesday & Wednesday: One-day courses • Thursday & Friday: Summit sessions Use the code US-ROADSHOW for 10% off tickets! Register at www.KNIME.com
  • 15.
    © 2018 KNIMEAG. All Rights Reserved. KNIME Beginner’s Luck Book Free Copy of KNIME Beginner’s Luck Book at KNIME Press https://www.knime.org/knimepress Promotion Code: KNIME_Learnathon_2018
  • 16.
    © 2018 KNIMEAG. All Rights Reserved. You can find KNIMers here! 16 • KNIME (www.knime.com) • BLOG for news, tips and tricks(www.knime.com/blog) • FORUM for questions and answers (tech.knime.com/forum) • EXAMPLE SERVER for example workflows • LEARNING HUB (www.knime.com/learning-hub) • KNIME TV channel on • KNIME on @KNIME • KNIME on https://www.facebook.com/KNIMEanalytics • On
  • 17.
    © 2017 KNIMEAG. All Rights Reserved. 17 The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. Thank You!