© 2017 KNIME AG. All Right Reserved.
Your Flight is Boarding Now!
Rosaria Silipo
KNIME
© 2017 KNIME AG. All Rights Reserved.
This is the last Call. Your Flight is Boarding Now!
2
1. Which conditions are more likely to
cause delays?
2. Given these conditions, can we
predict delays?
© 2017 KNIME AG. All Rights Reserved.
Do you recognize this?
3
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
© 2017 KNIME AG. All Rights Reserved.
The KNIME® Analytics Platform
© 2017 KNIME AG. All Rights Reserved.
Visual KNIME Workflows
Nodes perform tasks on data
Workflows combine nodes
to model data flow
Status
Input(s)
Outputs
Not Configured
Idle
Executed
Error
© 2017 KNIME AG. All Rights Reserved.
Analysis & Mining
Statistics, Machine Learning, Data
Mining, Web Analytics, Text
Mining, Network Analysis, Social
Media Analysis, R, Weka, Python,
Community / 3rd party, ...
Data Access
MySQL, Oracle, ...
SAS, SPSS, ...
Excel, Flat, ...
Hive, Impala, ...
XML, JSON, PMML
Text, Doc, Image, ...
Web Crawlers,
Industry Specific,
Community / 3rd
party ...
Transformation
Row, Column, Matrix
Text, Image, Networks, Time
Series, Java, Python,
Community / 3rd party, ...
Visualization
R, Python,
JFreeChart,
JavaScript,
Community / 3rd party, ...
Deployment
via BIRT
PMML, XML, JSON
Databases, Excel, Flat, etc.
Text, Doc, Image
Industry Specific
Community / 3rd party, ...
Over 1500 native and embedded nodes included:
Big Data
Hive, Impala, HDFS Vertica,
Teradata/Aster, Spark, MLlib,
Community / 3rd party, ...
© 2017 KNIME AG. All Rights Reserved.
Do you recognize this?
7
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
© 2017 KNIME AG. All Rights Reserved.
This is the last Call. Your Flight is Boarding Now!
Predict Flight Departure Delay in US Airports
Selected Airport: ORD
8
http://stat-computing.org/dataexpo/2009/the-data.html
Airline Dataset:
https://www.ncdc.noaa.gov/cdo-web/datasets/
US Weather Information:
https://www.world-airport-codes.com/
Airport Codes & Cities:
https://www.calendar-365.com/2007-calendar.html
US Holiday Calendar:
CSV
Aircraft Maintenance Data: Web Crawling
http://vortex.plymouth.edu/rcm-u.html
Radar Images:
https://www.ncdc.noaa.gov/stormevents/
Storm Watch Database:
https://twitter.com/
Twitter: Sentiment Analysis
CSV
CSV
http://registry.faa.gov/aircraftinquiry/NNum_Results.aspx
© 2017 KNIME AG. All Rights Reserved.
Parallel Coordinates and Network Graph
9
Arrival Delay
© 2017 KNIME AG. All Rights Reserved.
Geographic Map
10
© 2017 KNIME AG. All Rights Reserved.
Sunburst Chart
11
© 2017 KNIME AG. All Rights Reserved.
CRISP-DM Data Science Cycle
12
© 2017 KNIME AG. All Rights Reserved.
Let’s unroll it!
It always starts
with some data …
13
Data
Preparation
Model
Training
Model
Optimization
Model
Evaluation
Deployment
Data Manipulation
Data Blending
Missing Values Handling
Feature Generation
Dimensionality Reduction
Feature Selection
Outlier Removal
Normalization
Partitioning
…
Model Training
Bag of Models
Model Selection
Ensemble Models
Own Ensemble Model
External Models
Import Existing Models
Model Factory
…
Parameter Tuning
Parameter Optimization
Regularization
Model Size
No Iterations
…
Performance Measures
Accuracy
ROC Curve
Cross-Validation
…
Files & DBs
Dashboards
REST API
SQL Code Export
Reporting
…
© 2017 KNIME AG. All Rights Reserved.
Data Science Cycle in Action
14
© 2017 KNIME AG. All Rights Reserved.
Bag of Models
15
© 2017 KNIME AG. All Rights Reserved.
Model Evaluation & Selection
16
© 2017 KNIME AG. All Rights Reserved.
To the Big Data Platform!
17
SparkHive
© 2017 KNIME AG. All Rights Reserved.
CRISP-DM Data Science Cycle
18
Next time!
© 2017 KNIME AG. All Rights Reserved.
Summary
• Data visualization matters!
Data Visualization and Interactive Exploration with KNIME
• Most common Techniques for Data Preparation, Analytics, and Model
Evaluation
KNIME Analytics: A Review
• Most common Techniques for Data Preparation, Analytics, and Model
Evaluation on Big Data
Scaling Analytics with Big Data
19
© 2017 KNIME AG. All Rights Reserved.
Thank You!
Free Copy of KNIME Beginner’s Luck Book at KNIME Press
https://www.knime.org/knimepress
Promotion Code:
BigDataLDN
© 2017 KNIME AG. All Rights Reserved.
Thank you!
On behalf of KNIME we thank you
for flying with us today!
21
© 2017 KNIME AG. All Rights Reserved. 22
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH,
and are registered in the United States. KNIME® is also registered in Germany.

Big Data LDN 2017: Your flight is boarding now!

  • 1.
    © 2017 KNIMEAG. All Right Reserved. Your Flight is Boarding Now! Rosaria Silipo KNIME
  • 2.
    © 2017 KNIMEAG. All Rights Reserved. This is the last Call. Your Flight is Boarding Now! 2 1. Which conditions are more likely to cause delays? 2. Given these conditions, can we predict delays?
  • 3.
    © 2017 KNIMEAG. All Rights Reserved. Do you recognize this? 3 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 4.
    © 2017 KNIMEAG. All Rights Reserved. The KNIME® Analytics Platform
  • 5.
    © 2017 KNIMEAG. All Rights Reserved. Visual KNIME Workflows Nodes perform tasks on data Workflows combine nodes to model data flow Status Input(s) Outputs Not Configured Idle Executed Error
  • 6.
    © 2017 KNIMEAG. All Rights Reserved. Analysis & Mining Statistics, Machine Learning, Data Mining, Web Analytics, Text Mining, Network Analysis, Social Media Analysis, R, Weka, Python, Community / 3rd party, ... Data Access MySQL, Oracle, ... SAS, SPSS, ... Excel, Flat, ... Hive, Impala, ... XML, JSON, PMML Text, Doc, Image, ... Web Crawlers, Industry Specific, Community / 3rd party ... Transformation Row, Column, Matrix Text, Image, Networks, Time Series, Java, Python, Community / 3rd party, ... Visualization R, Python, JFreeChart, JavaScript, Community / 3rd party, ... Deployment via BIRT PMML, XML, JSON Databases, Excel, Flat, etc. Text, Doc, Image Industry Specific Community / 3rd party, ... Over 1500 native and embedded nodes included: Big Data Hive, Impala, HDFS Vertica, Teradata/Aster, Spark, MLlib, Community / 3rd party, ...
  • 7.
    © 2017 KNIMEAG. All Rights Reserved. Do you recognize this? 7 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 8.
    © 2017 KNIMEAG. All Rights Reserved. This is the last Call. Your Flight is Boarding Now! Predict Flight Departure Delay in US Airports Selected Airport: ORD 8 http://stat-computing.org/dataexpo/2009/the-data.html Airline Dataset: https://www.ncdc.noaa.gov/cdo-web/datasets/ US Weather Information: https://www.world-airport-codes.com/ Airport Codes & Cities: https://www.calendar-365.com/2007-calendar.html US Holiday Calendar: CSV Aircraft Maintenance Data: Web Crawling http://vortex.plymouth.edu/rcm-u.html Radar Images: https://www.ncdc.noaa.gov/stormevents/ Storm Watch Database: https://twitter.com/ Twitter: Sentiment Analysis CSV CSV http://registry.faa.gov/aircraftinquiry/NNum_Results.aspx
  • 9.
    © 2017 KNIMEAG. All Rights Reserved. Parallel Coordinates and Network Graph 9 Arrival Delay
  • 10.
    © 2017 KNIMEAG. All Rights Reserved. Geographic Map 10
  • 11.
    © 2017 KNIMEAG. All Rights Reserved. Sunburst Chart 11
  • 12.
    © 2017 KNIMEAG. All Rights Reserved. CRISP-DM Data Science Cycle 12
  • 13.
    © 2017 KNIMEAG. All Rights Reserved. Let’s unroll it! It always starts with some data … 13 Data Preparation Model Training Model Optimization Model Evaluation Deployment Data Manipulation Data Blending Missing Values Handling Feature Generation Dimensionality Reduction Feature Selection Outlier Removal Normalization Partitioning … Model Training Bag of Models Model Selection Ensemble Models Own Ensemble Model External Models Import Existing Models Model Factory … Parameter Tuning Parameter Optimization Regularization Model Size No Iterations … Performance Measures Accuracy ROC Curve Cross-Validation … Files & DBs Dashboards REST API SQL Code Export Reporting …
  • 14.
    © 2017 KNIMEAG. All Rights Reserved. Data Science Cycle in Action 14
  • 15.
    © 2017 KNIMEAG. All Rights Reserved. Bag of Models 15
  • 16.
    © 2017 KNIMEAG. All Rights Reserved. Model Evaluation & Selection 16
  • 17.
    © 2017 KNIMEAG. All Rights Reserved. To the Big Data Platform! 17 SparkHive
  • 18.
    © 2017 KNIMEAG. All Rights Reserved. CRISP-DM Data Science Cycle 18 Next time!
  • 19.
    © 2017 KNIMEAG. All Rights Reserved. Summary • Data visualization matters! Data Visualization and Interactive Exploration with KNIME • Most common Techniques for Data Preparation, Analytics, and Model Evaluation KNIME Analytics: A Review • Most common Techniques for Data Preparation, Analytics, and Model Evaluation on Big Data Scaling Analytics with Big Data 19
  • 20.
    © 2017 KNIMEAG. All Rights Reserved. Thank You! Free Copy of KNIME Beginner’s Luck Book at KNIME Press https://www.knime.org/knimepress Promotion Code: BigDataLDN
  • 21.
    © 2017 KNIMEAG. All Rights Reserved. Thank you! On behalf of KNIME we thank you for flying with us today! 21
  • 22.
    © 2017 KNIMEAG. All Rights Reserved. 22 The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.