SlideShare a Scribd company logo
© 2019 KNIME AG. All rights reserved.
From Raw Data to Deployment
KNIMEr: Kathrin.Melcher@knime.com
KNIMEr: Maarit.Widmann@knime.com
@KNIME
© 2019 KNIME AG. All rights reserved.
Do you recognize this?
2
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
© 2019 KNIME AG. All rights reserved.
Let’s unroll it!
It always starts
with some data …
3
Data
Preparation
Model
Training
Model
Optimization
Deployment
Data Manipulation
Data Blending
Missing Values Handling
Feature Generation
Dimensionality Reduction
Feature Selection
Outlier Removal
Normalization
Partitioning
…
Model Training
Bag of Models
Model Selection
Ensemble Models
Own Ensemble Model
External Models
Import Existing Models
Model Factory
…
Parameter Tuning
Parameter Optimization
Regularization
Model Size
No. Iterations
…
Performance Measures
Accuracy
ROC Curve
Cross-Validation
…
Files & DBs
Dashboards
REST API
SQL Code Export
Reporting
…
Model
Evaluation
© 2019 KNIME AG. All rights reserved.
The many Lives of a Dataset
4
Data
Preparation
Model
Training
Model
Optimization
Model
Evaluation
Deployment
Partitioning:
• Training Set
• Validation Set
• Test Set
Training Set Validation Set Test Set New Data from Real
World Applications
Original Data
Set with Past
Observations
© 2019 KNIME AG. All rights reserved.
Data Exploration
• Sometimes in between Data Access and Data
Preparation there is a Data Exploration phase
• The Data Exploration phase is useful to get to
know the data
• KNIME offers a few visualization nodes to build
dashboards to explore the data
5
© 2019 KNIME AG. All rights reserved.
What about Big Data?
• Big Data serves Scalability
• The whole Analytics Process is no different on
Big Data
• You need:
– a Big Data Platform
– The KNIME Big Data (Spark & Hive) Extension
6
© 2019 KNIME AG. All rights reserved.
One Example for Every Need – on KNIME EXAMPLES Server
The KNIME EXAMPLES Server
7
50_Applications
© 2019 KNIME AG. All rights reserved.
Classification Problem & Data Set
• Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html
• Smaller dataset (Jan 2007) (AirlineDataset.table)
• Challenge:
Predict Departure Delays
If on original airline dataset, only flights from airport ORD
Output Class = “delay” if depdelay > 15min
otherwise “no delay”
Available features: date, dep time, arr time, carrier, destination, cancelled, …
14
© 2019 KNIME AG. All rights reserved.
Challenges
• Group 1. Data Access and Data Preparation
• Group 2. ML Model Training
• Group 3. Model Deployment
• Import file Learnathon_2019.knar into your workspace
15
© 2019 KNIME AG. All rights reserved.
Group 1. Data Access and Data Preparation
16
© 2019 KNIME AG. All rights reserved.
Group 2. Model Training & Optimization
17
© 2019 KNIME AG. All rights reserved.
Group 3. Deployment
18
• Deployment Options – Multiple challenges:
– Workflow deployment to KNIME Server
– Remote/Scheduled execution from KNIME
Server
– KNIME RESTful Web Services
– Build a Composite Interactive Dashboard and
make it available on KNIME Web Portal
– Generate a report with BIRT
– Write Prediction Results to a Database
© 2019 KNIME AG. All rights reserved.
KNIME Fall Summit 2019
November 5 – 8 at AT&T Executive Education and Conference Center,
Austin, Texas
• Tuesday & Wednesday: One-day courses
• Thursday & Friday: Summit sessions
Register by October 1 for
10 % Early Bird Discount
with this code:
LEARNATHON-DUBLIN
Register at
knime.com/summits
© 2019 KNIME AG. All rights reserved.
KNIME Beginner’s Luck
Free Copy of KNIME Beginner’s Luck Book from KNIME Press
https://www.knime.com/knimepress
with this code: DUBLIN-0619
20
© 2019 KNIME AG. All rights reserved.
Stay connected with KNIME
Blog: knime.com/blog
Forum: forum.knime.com
KNIME Hub: hub.knime.com
Follow us on social media:
KNIME E-Learning Course:
www.knime.com/e-learning-course
© 2019 KNIME AG. All rights reserved.
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH,
and are registered in the United States. KNIME® is also registered in Germany.
Thank You!
#KNIME
#Learnathon

More Related Content

What's hot

Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike StationsAdvanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike StationsKNIMESlides
 
Just add Imagination
Just add ImaginationJust add Imagination
Just add ImaginationKNIMESlides
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformKNIMESlides
 
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020KNIMESlides
 
Codeless Deep Learning for Language Modeling and Image Classification
Codeless Deep Learning for Language Modeling and Image ClassificationCodeless Deep Learning for Language Modeling and Image Classification
Codeless Deep Learning for Language Modeling and Image ClassificationKNIMESlides
 
Automating Inferences out of Financial Data
Automating Inferences out of Financial DataAutomating Inferences out of Financial Data
Automating Inferences out of Financial DataKNIMESlides
 
AWS reInvent 2019 Trip Report
AWS reInvent 2019 Trip ReportAWS reInvent 2019 Trip Report
AWS reInvent 2019 Trip ReportCraig Milroy
 
Instil - Why focus on cloud computing?
Instil - Why focus on cloud computing?Instil - Why focus on cloud computing?
Instil - Why focus on cloud computing?IainCameron35
 
What's new in_fme_2020_gerhard_fischl
What's new in_fme_2020_gerhard_fischlWhat's new in_fme_2020_gerhard_fischl
What's new in_fme_2020_gerhard_fischlGIM_nv
 
Cloud Governance within The Climate Corporation
Cloud Governance within The Climate CorporationCloud Governance within The Climate Corporation
Cloud Governance within The Climate CorporationMohamed Ahmed
 
Software-Cluster Internationalisation focusing Bahia/Brazil: R+D project of t...
Software-Cluster Internationalisation focusing Bahia/Brazil: R+D project of t...Software-Cluster Internationalisation focusing Bahia/Brazil: R+D project of t...
Software-Cluster Internationalisation focusing Bahia/Brazil: R+D project of t...ElisabethStemmler
 
Rightscale Cloudcamp Boston
Rightscale  Cloudcamp BostonRightscale  Cloudcamp Boston
Rightscale Cloudcamp Bostonjtreadway
 
Get Your Aircraft Spare Parts Inventory Management Off the Ground
Get Your Aircraft Spare Parts Inventory Management Off the GroundGet Your Aircraft Spare Parts Inventory Management Off the Ground
Get Your Aircraft Spare Parts Inventory Management Off the GroundPTC
 
This week in Neo4j - 3rd February 2018
This week in Neo4j - 3rd February 2018This week in Neo4j - 3rd February 2018
This week in Neo4j - 3rd February 2018Mark Needham
 
Amberix Energy Efficient Facilities
Amberix Energy Efficient FacilitiesAmberix Energy Efficient Facilities
Amberix Energy Efficient Facilitiesgueste5667f2
 
Big data, Cloud, and the NOAA CRADA at The Climate Corporation
Big data, Cloud, and the NOAA CRADA at The Climate CorporationBig data, Cloud, and the NOAA CRADA at The Climate Corporation
Big data, Cloud, and the NOAA CRADA at The Climate CorporationValliappa Lakshmanan
 
"Showcase Skiing Area" IoT Presentation - Splunk Partner Executive Summit: 18...
"Showcase Skiing Area" IoT Presentation - Splunk Partner Executive Summit: 18..."Showcase Skiing Area" IoT Presentation - Splunk Partner Executive Summit: 18...
"Showcase Skiing Area" IoT Presentation - Splunk Partner Executive Summit: 18...Splunk
 
Geold2015 wauer
Geold2015 wauerGeold2015 wauer
Geold2015 wauergeoknow
 

What's hot (20)

Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike StationsAdvanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
 
Just add Imagination
Just add ImaginationJust add Imagination
Just add Imagination
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics Platform
 
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
 
Codeless Deep Learning for Language Modeling and Image Classification
Codeless Deep Learning for Language Modeling and Image ClassificationCodeless Deep Learning for Language Modeling and Image Classification
Codeless Deep Learning for Language Modeling and Image Classification
 
Automating Inferences out of Financial Data
Automating Inferences out of Financial DataAutomating Inferences out of Financial Data
Automating Inferences out of Financial Data
 
AWS reInvent 2019 Trip Report
AWS reInvent 2019 Trip ReportAWS reInvent 2019 Trip Report
AWS reInvent 2019 Trip Report
 
Instil - Why focus on cloud computing?
Instil - Why focus on cloud computing?Instil - Why focus on cloud computing?
Instil - Why focus on cloud computing?
 
What's new in_fme_2020_gerhard_fischl
What's new in_fme_2020_gerhard_fischlWhat's new in_fme_2020_gerhard_fischl
What's new in_fme_2020_gerhard_fischl
 
Cloud Governance within The Climate Corporation
Cloud Governance within The Climate CorporationCloud Governance within The Climate Corporation
Cloud Governance within The Climate Corporation
 
Software-Cluster Internationalisation focusing Bahia/Brazil: R+D project of t...
Software-Cluster Internationalisation focusing Bahia/Brazil: R+D project of t...Software-Cluster Internationalisation focusing Bahia/Brazil: R+D project of t...
Software-Cluster Internationalisation focusing Bahia/Brazil: R+D project of t...
 
Rightscale Cloudcamp Boston
Rightscale  Cloudcamp BostonRightscale  Cloudcamp Boston
Rightscale Cloudcamp Boston
 
Get Your Aircraft Spare Parts Inventory Management Off the Ground
Get Your Aircraft Spare Parts Inventory Management Off the GroundGet Your Aircraft Spare Parts Inventory Management Off the Ground
Get Your Aircraft Spare Parts Inventory Management Off the Ground
 
Precisition Agriculture - (Stephan Vormbrock, CLAAS)
Precisition Agriculture - (Stephan Vormbrock, CLAAS)Precisition Agriculture - (Stephan Vormbrock, CLAAS)
Precisition Agriculture - (Stephan Vormbrock, CLAAS)
 
This week in Neo4j - 3rd February 2018
This week in Neo4j - 3rd February 2018This week in Neo4j - 3rd February 2018
This week in Neo4j - 3rd February 2018
 
Amberix Energy Efficient Facilities
Amberix Energy Efficient FacilitiesAmberix Energy Efficient Facilities
Amberix Energy Efficient Facilities
 
Big data, Cloud, and the NOAA CRADA at The Climate Corporation
Big data, Cloud, and the NOAA CRADA at The Climate CorporationBig data, Cloud, and the NOAA CRADA at The Climate Corporation
Big data, Cloud, and the NOAA CRADA at The Climate Corporation
 
"Showcase Skiing Area" IoT Presentation - Splunk Partner Executive Summit: 18...
"Showcase Skiing Area" IoT Presentation - Splunk Partner Executive Summit: 18..."Showcase Skiing Area" IoT Presentation - Splunk Partner Executive Summit: 18...
"Showcase Skiing Area" IoT Presentation - Splunk Partner Executive Summit: 18...
 
3D Clash Detection
3D Clash Detection3D Clash Detection
3D Clash Detection
 
Geold2015 wauer
Geold2015 wauerGeold2015 wauer
Geold2015 wauer
 

Similar to KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019

KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIMESlides
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to DeploymentKNIMESlides
 
Sharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME ServerSharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME ServerKNIMESlides
 
IBM Cloud Private and IBM Power Systems: Overview and Real-World Scenarios
IBM Cloud Private and IBM Power Systems: Overview and Real-World ScenariosIBM Cloud Private and IBM Power Systems: Overview and Real-World Scenarios
IBM Cloud Private and IBM Power Systems: Overview and Real-World ScenariosJoe Cropper
 
So you want to provision a test environment...
So you want to provision a test environment...So you want to provision a test environment...
So you want to provision a test environment...DevOps.com
 
Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)
Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)
Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)Data Driven Innovation
 
Emerging Cloud Migration Approaches
Emerging Cloud Migration ApproachesEmerging Cloud Migration Approaches
Emerging Cloud Migration ApproachesArvind Viswanathan
 
Capitalizing on cloud 4.3.18
Capitalizing on cloud 4.3.18Capitalizing on cloud 4.3.18
Capitalizing on cloud 4.3.18Yves Bienenfeld
 
How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...Antje Barth
 
Maximizing the Value of IBM’s New Mainframe Pricing Model
Maximizing the Value of IBM’s New Mainframe Pricing ModelMaximizing the Value of IBM’s New Mainframe Pricing Model
Maximizing the Value of IBM’s New Mainframe Pricing ModelPrecisely
 
DO for WS - PA external v1
DO for WS - PA external v1DO for WS - PA external v1
DO for WS - PA external v1Alain Chabrier
 
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...Precisely
 
IBM elm alm overview-software engineerin-lifecycle-management
IBM elm alm overview-software engineerin-lifecycle-managementIBM elm alm overview-software engineerin-lifecycle-management
IBM elm alm overview-software engineerin-lifecycle-managementImran Hashmi
 
Take the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented AnalyticsTake the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented AnalyticsTyler Wishnoff
 
Continuous Deployment for Deep Learning
Continuous Deployment for Deep LearningContinuous Deployment for Deep Learning
Continuous Deployment for Deep LearningDatabricks
 
Msp deck charles- final mb 2020 - Multicloud overview
Msp deck   charles- final mb 2020 - Multicloud overviewMsp deck   charles- final mb 2020 - Multicloud overview
Msp deck charles- final mb 2020 - Multicloud overviewCharles Keatts
 
IBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
IBM Cloud Côte d'Azur Meetup - 20190328 - OptimisationIBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
IBM Cloud Côte d'Azur Meetup - 20190328 - OptimisationIBM France Lab
 
Z105745 ibmz-cloud-cairo-v1902a
Z105745 ibmz-cloud-cairo-v1902aZ105745 ibmz-cloud-cairo-v1902a
Z105745 ibmz-cloud-cairo-v1902aTony Pearson
 

Similar to KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019 (20)

KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To Deployment
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to Deployment
 
Sharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME ServerSharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME Server
 
IBM Cloud Private and IBM Power Systems: Overview and Real-World Scenarios
IBM Cloud Private and IBM Power Systems: Overview and Real-World ScenariosIBM Cloud Private and IBM Power Systems: Overview and Real-World Scenarios
IBM Cloud Private and IBM Power Systems: Overview and Real-World Scenarios
 
So you want to provision a test environment...
So you want to provision a test environment...So you want to provision a test environment...
So you want to provision a test environment...
 
Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)
Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)
Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)
 
Emerging Cloud Migration Approaches
Emerging Cloud Migration ApproachesEmerging Cloud Migration Approaches
Emerging Cloud Migration Approaches
 
Capitalizing on cloud 4.3.18
Capitalizing on cloud 4.3.18Capitalizing on cloud 4.3.18
Capitalizing on cloud 4.3.18
 
How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...
 
Maximizing the Value of IBM’s New Mainframe Pricing Model
Maximizing the Value of IBM’s New Mainframe Pricing ModelMaximizing the Value of IBM’s New Mainframe Pricing Model
Maximizing the Value of IBM’s New Mainframe Pricing Model
 
DO for WS - PA external v1
DO for WS - PA external v1DO for WS - PA external v1
DO for WS - PA external v1
 
A journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercializationA journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercialization
 
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...
 
IBM elm alm overview-software engineerin-lifecycle-management
IBM elm alm overview-software engineerin-lifecycle-managementIBM elm alm overview-software engineerin-lifecycle-management
IBM elm alm overview-software engineerin-lifecycle-management
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
Take the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented AnalyticsTake the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented Analytics
 
Continuous Deployment for Deep Learning
Continuous Deployment for Deep LearningContinuous Deployment for Deep Learning
Continuous Deployment for Deep Learning
 
Msp deck charles- final mb 2020 - Multicloud overview
Msp deck   charles- final mb 2020 - Multicloud overviewMsp deck   charles- final mb 2020 - Multicloud overview
Msp deck charles- final mb 2020 - Multicloud overview
 
IBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
IBM Cloud Côte d'Azur Meetup - 20190328 - OptimisationIBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
IBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
 
Z105745 ibmz-cloud-cairo-v1902a
Z105745 ibmz-cloud-cairo-v1902aZ105745 ibmz-cloud-cairo-v1902a
Z105745 ibmz-cloud-cairo-v1902a
 

More from KNIMESlides

What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1KNIMESlides
 
Credit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialCredit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialKNIMESlides
 
Practicing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case StudiesPracticing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case StudiesKNIMESlides
 
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9KNIMESlides
 
Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformKNIMESlides
 
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon basedSentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon basedKNIMESlides
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software OverviewKNIMESlides
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkKNIMESlides
 
Knime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network MiningKnime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network MiningKNIMESlides
 
Text Processing with KNIME
Text Processing with KNIMEText Processing with KNIME
Text Processing with KNIMEKNIMESlides
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!KNIMESlides
 

More from KNIMESlides (11)

What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
 
Credit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialCredit Card Fraud Detection Tutorial
Credit Card Fraud Detection Tutorial
 
Practicing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case StudiesPracticing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case Studies
 
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
 
Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics Platform
 
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon basedSentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
Knime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network MiningKnime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network Mining
 
Text Processing with KNIME
Text Processing with KNIMEText Processing with KNIME
Text Processing with KNIME
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!
 

Recently uploaded

A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEJelle | Nordend
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisNeo4j
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobus
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion Clinic
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...Alluxio, Inc.
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageGlobus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024Ortus Solutions, Corp
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownloadvrstrong314
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfkalichargn70th171
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandIES VE
 

Recently uploaded (20)

A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 

KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019

  • 1. © 2019 KNIME AG. All rights reserved. From Raw Data to Deployment KNIMEr: Kathrin.Melcher@knime.com KNIMEr: Maarit.Widmann@knime.com @KNIME
  • 2. © 2019 KNIME AG. All rights reserved. Do you recognize this? 2 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 3. © 2019 KNIME AG. All rights reserved. Let’s unroll it! It always starts with some data … 3 Data Preparation Model Training Model Optimization Deployment Data Manipulation Data Blending Missing Values Handling Feature Generation Dimensionality Reduction Feature Selection Outlier Removal Normalization Partitioning … Model Training Bag of Models Model Selection Ensemble Models Own Ensemble Model External Models Import Existing Models Model Factory … Parameter Tuning Parameter Optimization Regularization Model Size No. Iterations … Performance Measures Accuracy ROC Curve Cross-Validation … Files & DBs Dashboards REST API SQL Code Export Reporting … Model Evaluation
  • 4. © 2019 KNIME AG. All rights reserved. The many Lives of a Dataset 4 Data Preparation Model Training Model Optimization Model Evaluation Deployment Partitioning: • Training Set • Validation Set • Test Set Training Set Validation Set Test Set New Data from Real World Applications Original Data Set with Past Observations
  • 5. © 2019 KNIME AG. All rights reserved. Data Exploration • Sometimes in between Data Access and Data Preparation there is a Data Exploration phase • The Data Exploration phase is useful to get to know the data • KNIME offers a few visualization nodes to build dashboards to explore the data 5
  • 6. © 2019 KNIME AG. All rights reserved. What about Big Data? • Big Data serves Scalability • The whole Analytics Process is no different on Big Data • You need: – a Big Data Platform – The KNIME Big Data (Spark & Hive) Extension 6
  • 7. © 2019 KNIME AG. All rights reserved. One Example for Every Need – on KNIME EXAMPLES Server The KNIME EXAMPLES Server 7 50_Applications
  • 8. © 2019 KNIME AG. All rights reserved. Classification Problem & Data Set • Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html • Smaller dataset (Jan 2007) (AirlineDataset.table) • Challenge: Predict Departure Delays If on original airline dataset, only flights from airport ORD Output Class = “delay” if depdelay > 15min otherwise “no delay” Available features: date, dep time, arr time, carrier, destination, cancelled, … 14
  • 9. © 2019 KNIME AG. All rights reserved. Challenges • Group 1. Data Access and Data Preparation • Group 2. ML Model Training • Group 3. Model Deployment • Import file Learnathon_2019.knar into your workspace 15
  • 10. © 2019 KNIME AG. All rights reserved. Group 1. Data Access and Data Preparation 16
  • 11. © 2019 KNIME AG. All rights reserved. Group 2. Model Training & Optimization 17
  • 12. © 2019 KNIME AG. All rights reserved. Group 3. Deployment 18 • Deployment Options – Multiple challenges: – Workflow deployment to KNIME Server – Remote/Scheduled execution from KNIME Server – KNIME RESTful Web Services – Build a Composite Interactive Dashboard and make it available on KNIME Web Portal – Generate a report with BIRT – Write Prediction Results to a Database
  • 13. © 2019 KNIME AG. All rights reserved. KNIME Fall Summit 2019 November 5 – 8 at AT&T Executive Education and Conference Center, Austin, Texas • Tuesday & Wednesday: One-day courses • Thursday & Friday: Summit sessions Register by October 1 for 10 % Early Bird Discount with this code: LEARNATHON-DUBLIN Register at knime.com/summits
  • 14. © 2019 KNIME AG. All rights reserved. KNIME Beginner’s Luck Free Copy of KNIME Beginner’s Luck Book from KNIME Press https://www.knime.com/knimepress with this code: DUBLIN-0619 20
  • 15. © 2019 KNIME AG. All rights reserved. Stay connected with KNIME Blog: knime.com/blog Forum: forum.knime.com KNIME Hub: hub.knime.com Follow us on social media: KNIME E-Learning Course: www.knime.com/e-learning-course
  • 16. © 2019 KNIME AG. All rights reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. Thank You! #KNIME #Learnathon