SlideShare a Scribd company logo
© 2018 KNIME AG. All Right Reserved.
From	Raw	Data	to	Deployment
Kilian.Thiel@knime.com
Marten.Pfannenschmidt@knime.com
Kathrin.Melcher@knime.com
KNIME
© 2018 KNIME AG. All Rights Reserved.
Do	you	recognize	this?
2
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
© 2018 KNIME AG. All Rights Reserved.
Let’s	unroll	it!
It	always	starts	
with	some	data	…
3
Data	
Preparation
Model	
Training
Model	
Optimization
Deployment
Data	Manipulation
Data	Blending
Missing	Values	Handling
Feature	Generation
Dimensionality	Reduction
Feature	Selection
Outlier	Removal
Normalization
Partitioning
…
Model	Training
Bag	of	Models
Model	Selection
Ensemble	Models
Own	Ensemble	Model
External	Models
Import	Existing	Models
Model	Factory
…
Parameter	Tuning
Parameter	Optimization
Regularization
Model	Size
No	Iterations
…
Performance	Measures
Accuracy
ROC	Curve
Cross-Validation
…
Files	&	DBs
Dashboards
REST	API
SQL	Code	Export
Reporting
…
Model	
Evaluation
© 2018 KNIME AG. All Rights Reserved.
The	many	Lives	of	a	Dataset
4
Data	
Preparation
Model	
Training
Model	
Optimization
Model	
Evaluation
Deployment
Partitioning:
• Training	Set
• Validation	Set
• Test	Set
Training	Set Validation	Set Test	Set New	Data	from	Real	
World	Applications
Original	Data	
Set	with	Past	
Observations
© 2018 KNIME AG. All Rights Reserved.
Data	Exploration
• Sometimes	in	between	Data	Access	and	Data	
Preparation	there	is	a	Data	Exploration	phase
• The	Data	Exploration	phase	is	useful	to	get	to	
know	the	data
• KNIME	offers	a	few	visualization	nodes	to	build	
dashboards	to	explore	the	data
5
© 2018 KNIME AG. All Rights Reserved.
One	Example	for	Every	Need
The	KNIME	EXAMPLES	Server
6
50_Applications/27_FromRawDataToDeployment
© 2018 KNIME AG. All Rights Reserved.
Classification	Problem	&	Data	Set
• Airline	Dataset:	http://stat-computing.org/dataexpo/2009/the-data.html
• Smaller	dataset	(Jan	2007)	(AirlineDataset.table)
• Challenge:
Predict	Departure	Delays	
If	on	original	airline	dataset,	only	flights	from	airport	ORD
Output	Class	=	“delay”	if	depdelay >	15min	
otherwise	“no	delay”
Input	features	all	what	is	available	and	more	if	you	can	find	it!
7
© 2018 KNIME AG. All Rights Reserved.
Challenges
• Group	1. Data	Access	and	Data	Preparation
• Group	2. ML	Model	Training
• Group	3. Model	Deployment
• Import	file	Learnathon_2018.knar into	your	workspace	
8
© 2018 KNIME AG. All Rights Reserved.
Group	1. Data	Access	and	Data	Preparation
9
© 2018 KNIME AG. All Rights Reserved.
Group	2.	Model	Training	&	Optimization
10
© 2018 KNIME AG. All Rights Reserved.
Group	3. Deployment
11
© 2018 KNIME AG. All Rights Reserved.
KNIME	Spring	Summit	2018
March	5	– 9	at	Hotel	Berlin,	Berlin	in	Germany
• Monday	&	Tuesday:	One-day	courses
• Wednesday	&	Thursday: Summit	sessions
• Friday:	Workshops
Use	the	code
LEARNATHON
for	10% off	tickets!
Register	at	
www.KNIME.com
© 2018 KNIME AG. All Rights Reserved.
KNIME	Beginner’s	Luck	Book
Free	Copy	of	KNIME	Beginner’s	Luck	Book	at	KNIME	Press	
https://www.knime.org/knimepress
Promotion	Code:
KNIME_Learnathon_2018
© 2018 KNIME AG. All Rights Reserved.
You	can	find	KNIMers here!
14
• KNIME (www.knime.org)
• BLOG	for	news,	tips	and	tricks(www.knime.org/blog)
• FORUM for	questions	and	answers	(tech.knime.org/forum)
• EXAMPLE	SERVER	for	example	workflows
• LEARNING	HUB (www.knime.org/learning-hub)
• KNIME	TV		channel on
• KNIME	on														@KNIME
• KNIME on https://www.facebook.com/KNIMEanalytics
• KNIME	User	Group	UK	on	
https://www.meetup.com/KNIME-User-Group-UK/
© 2017 KNIME AG. All Rights Reserved. 15
The	KNIME®	trademark	and	logo	and	OPEN	FOR	INNOVATION®	trademark	are	used	by	KNIME.com	AG	under	license	from	KNIME	GmbH,	
and	are	registered	in	the	United	States.	KNIME®	is	also	registered	in	Germany.
Thank	You!

More Related Content

What's hot

Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine LearningAnomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
KNIMESlides
 
AWS reInvent 2019 Trip Report
AWS reInvent 2019 Trip ReportAWS reInvent 2019 Trip Report
AWS reInvent 2019 Trip Report
Craig Milroy
 
#AI + #Cloud = #DigitalTransformation
#AI + #Cloud = #DigitalTransformation#AI + #Cloud = #DigitalTransformation
#AI + #Cloud = #DigitalTransformation
Craig Milroy
 
Cloud Governance within The Climate Corporation
Cloud Governance within The Climate CorporationCloud Governance within The Climate Corporation
Cloud Governance within The Climate Corporation
Mohamed Ahmed
 
Esri in AWS Cloud
Esri in AWS CloudEsri in AWS Cloud
[Cisco Connect 2018 - Vietnam] Joseph yap journey to the multi cloud
[Cisco Connect 2018 - Vietnam] Joseph yap journey to the multi cloud[Cisco Connect 2018 - Vietnam] Joseph yap journey to the multi cloud
[Cisco Connect 2018 - Vietnam] Joseph yap journey to the multi cloud
Nur Shiqim Chok
 
Unlock Your CAD Data for Real-Time Development (Unity+PiXYZ) - AEC
Unlock Your CAD Data for Real-Time Development (Unity+PiXYZ) - AECUnlock Your CAD Data for Real-Time Development (Unity+PiXYZ) - AEC
Unlock Your CAD Data for Real-Time Development (Unity+PiXYZ) - AEC
Unity Technologies
 
Melodic Keynote presentation at OW2con'19, June 12-13, Paris.
Melodic Keynote presentation at OW2con'19, June 12-13, Paris. Melodic Keynote presentation at OW2con'19, June 12-13, Paris.
Melodic Keynote presentation at OW2con'19, June 12-13, Paris.
OW2
 
Hosting For Your Startup, Side Project, or Big Dollar App - Minnebar 12
Hosting For Your Startup, Side Project, or Big Dollar App - Minnebar 12Hosting For Your Startup, Side Project, or Big Dollar App - Minnebar 12
Hosting For Your Startup, Side Project, or Big Dollar App - Minnebar 12
Keith Resar
 
From Interactive to Automatic CAD Data Prep
From Interactive to Automatic CAD Data PrepFrom Interactive to Automatic CAD Data Prep
From Interactive to Automatic CAD Data Prep
Unity Technologies
 
Get Your Aircraft Spare Parts Inventory Management Off the Ground
Get Your Aircraft Spare Parts Inventory Management Off the GroundGet Your Aircraft Spare Parts Inventory Management Off the Ground
Get Your Aircraft Spare Parts Inventory Management Off the Ground
PTC
 
IPv6 and Cloud Hosting
IPv6 and Cloud HostingIPv6 and Cloud Hosting
IPv6 and Cloud Hosting
RIPE NCC
 
Amberix Energy Efficient Facilities
Amberix Energy Efficient FacilitiesAmberix Energy Efficient Facilities
Amberix Energy Efficient Facilities
gueste5667f2
 
What is Capability Analysis?
What is Capability Analysis?What is Capability Analysis?
What is Capability Analysis?
Jay Arthur
 
Creating a GraphQL API in Python: from Django to fully asynchronous
Creating a GraphQL API in Python: from Django to fully asynchronousCreating a GraphQL API in Python: from Django to fully asynchronous
Creating a GraphQL API in Python: from Django to fully asynchronous
Mirumee Software
 
Optimise Energy Usage Using Amazon SageMaker Reinforcement Learning and Publi...
Optimise Energy Usage Using Amazon SageMaker Reinforcement Learning and Publi...Optimise Energy Usage Using Amazon SageMaker Reinforcement Learning and Publi...
Optimise Energy Usage Using Amazon SageMaker Reinforcement Learning and Publi...
Amazon Web Services
 
PlaatEnergy Design
PlaatEnergy DesignPlaatEnergy Design
PlaatEnergy Designwplaat
 
Summer 2017
Summer 2017Summer 2017
Summer 2017
sabativi
 
Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...
Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...
Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...
Dynatrace
 
AppSphere 15 - Monitoring Cloud & Asynchronous Applications
AppSphere 15 - Monitoring Cloud & Asynchronous ApplicationsAppSphere 15 - Monitoring Cloud & Asynchronous Applications
AppSphere 15 - Monitoring Cloud & Asynchronous Applications
AppDynamics
 

What's hot (20)

Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine LearningAnomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
 
AWS reInvent 2019 Trip Report
AWS reInvent 2019 Trip ReportAWS reInvent 2019 Trip Report
AWS reInvent 2019 Trip Report
 
#AI + #Cloud = #DigitalTransformation
#AI + #Cloud = #DigitalTransformation#AI + #Cloud = #DigitalTransformation
#AI + #Cloud = #DigitalTransformation
 
Cloud Governance within The Climate Corporation
Cloud Governance within The Climate CorporationCloud Governance within The Climate Corporation
Cloud Governance within The Climate Corporation
 
Esri in AWS Cloud
Esri in AWS CloudEsri in AWS Cloud
Esri in AWS Cloud
 
[Cisco Connect 2018 - Vietnam] Joseph yap journey to the multi cloud
[Cisco Connect 2018 - Vietnam] Joseph yap journey to the multi cloud[Cisco Connect 2018 - Vietnam] Joseph yap journey to the multi cloud
[Cisco Connect 2018 - Vietnam] Joseph yap journey to the multi cloud
 
Unlock Your CAD Data for Real-Time Development (Unity+PiXYZ) - AEC
Unlock Your CAD Data for Real-Time Development (Unity+PiXYZ) - AECUnlock Your CAD Data for Real-Time Development (Unity+PiXYZ) - AEC
Unlock Your CAD Data for Real-Time Development (Unity+PiXYZ) - AEC
 
Melodic Keynote presentation at OW2con'19, June 12-13, Paris.
Melodic Keynote presentation at OW2con'19, June 12-13, Paris. Melodic Keynote presentation at OW2con'19, June 12-13, Paris.
Melodic Keynote presentation at OW2con'19, June 12-13, Paris.
 
Hosting For Your Startup, Side Project, or Big Dollar App - Minnebar 12
Hosting For Your Startup, Side Project, or Big Dollar App - Minnebar 12Hosting For Your Startup, Side Project, or Big Dollar App - Minnebar 12
Hosting For Your Startup, Side Project, or Big Dollar App - Minnebar 12
 
From Interactive to Automatic CAD Data Prep
From Interactive to Automatic CAD Data PrepFrom Interactive to Automatic CAD Data Prep
From Interactive to Automatic CAD Data Prep
 
Get Your Aircraft Spare Parts Inventory Management Off the Ground
Get Your Aircraft Spare Parts Inventory Management Off the GroundGet Your Aircraft Spare Parts Inventory Management Off the Ground
Get Your Aircraft Spare Parts Inventory Management Off the Ground
 
IPv6 and Cloud Hosting
IPv6 and Cloud HostingIPv6 and Cloud Hosting
IPv6 and Cloud Hosting
 
Amberix Energy Efficient Facilities
Amberix Energy Efficient FacilitiesAmberix Energy Efficient Facilities
Amberix Energy Efficient Facilities
 
What is Capability Analysis?
What is Capability Analysis?What is Capability Analysis?
What is Capability Analysis?
 
Creating a GraphQL API in Python: from Django to fully asynchronous
Creating a GraphQL API in Python: from Django to fully asynchronousCreating a GraphQL API in Python: from Django to fully asynchronous
Creating a GraphQL API in Python: from Django to fully asynchronous
 
Optimise Energy Usage Using Amazon SageMaker Reinforcement Learning and Publi...
Optimise Energy Usage Using Amazon SageMaker Reinforcement Learning and Publi...Optimise Energy Usage Using Amazon SageMaker Reinforcement Learning and Publi...
Optimise Energy Usage Using Amazon SageMaker Reinforcement Learning and Publi...
 
PlaatEnergy Design
PlaatEnergy DesignPlaatEnergy Design
PlaatEnergy Design
 
Summer 2017
Summer 2017Summer 2017
Summer 2017
 
Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...
Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...
Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...
 
AppSphere 15 - Monitoring Cloud & Asynchronous Applications
AppSphere 15 - Monitoring Cloud & Asynchronous ApplicationsAppSphere 15 - Monitoring Cloud & Asynchronous Applications
AppSphere 15 - Monitoring Cloud & Asynchronous Applications
 

Similar to From raw data to deployment

KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIMESlides
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to Deployment
KNIMESlides
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
Greg Landrum
 
AI/ML is a Means to Digital Transformation, Not an End Itself
AI/ML is a Means to Digital Transformation, Not an End ItselfAI/ML is a Means to Digital Transformation, Not an End Itself
AI/ML is a Means to Digital Transformation, Not an End Itself
BESPIN GLOBAL
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
SnapLogic
 
From notebook to production with Amazon Sagemaker
From notebook to production with Amazon SagemakerFrom notebook to production with Amazon Sagemaker
From notebook to production with Amazon Sagemaker
Amazon Web Services
 
An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)
Julien SIMON
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
Alok Singh
 
Amazon SageMaker
Amazon SageMakerAmazon SageMaker
Amazon SageMaker
Amazon Web Services
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon Web Services
 
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS SummitWork with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Amazon Web Services
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Amazon Web Services
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
Miroslaw Staron
 
Building, Training, and Deploying fast.ai Models Using Amazon SageMaker (AIM4...
Building, Training, and Deploying fast.ai Models Using Amazon SageMaker (AIM4...Building, Training, and Deploying fast.ai Models Using Amazon SageMaker (AIM4...
Building, Training, and Deploying fast.ai Models Using Amazon SageMaker (AIM4...
Amazon Web Services
 
Introducing Amazon SageMaker - AWS Online Tech Talks
Introducing Amazon SageMaker - AWS Online Tech TalksIntroducing Amazon SageMaker - AWS Online Tech Talks
Introducing Amazon SageMaker - AWS Online Tech Talks
Amazon Web Services
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
Ning Jiang
 
Machine Learning at the Edge (AIM302) - AWS re:Invent 2018
Machine Learning at the Edge (AIM302) - AWS re:Invent 2018Machine Learning at the Edge (AIM302) - AWS re:Invent 2018
Machine Learning at the Edge (AIM302) - AWS re:Invent 2018
Amazon Web Services
 
Predictive Analytics - Big Data Warehousing Meetup, Zementis
Predictive Analytics - Big Data Warehousing Meetup, ZementisPredictive Analytics - Big Data Warehousing Meetup, Zementis
Predictive Analytics - Big Data Warehousing Meetup, Zementis
Caserta
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Amazon Web Services
 
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
NETWAYS
 

Similar to From raw data to deployment (20)

KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to Deployment
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
 
AI/ML is a Means to Digital Transformation, Not an End Itself
AI/ML is a Means to Digital Transformation, Not an End ItselfAI/ML is a Means to Digital Transformation, Not an End Itself
AI/ML is a Means to Digital Transformation, Not an End Itself
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
 
From notebook to production with Amazon Sagemaker
From notebook to production with Amazon SagemakerFrom notebook to production with Amazon Sagemaker
From notebook to production with Amazon Sagemaker
 
An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Amazon SageMaker
Amazon SageMakerAmazon SageMaker
Amazon SageMaker
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS SummitWork with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Building, Training, and Deploying fast.ai Models Using Amazon SageMaker (AIM4...
Building, Training, and Deploying fast.ai Models Using Amazon SageMaker (AIM4...Building, Training, and Deploying fast.ai Models Using Amazon SageMaker (AIM4...
Building, Training, and Deploying fast.ai Models Using Amazon SageMaker (AIM4...
 
Introducing Amazon SageMaker - AWS Online Tech Talks
Introducing Amazon SageMaker - AWS Online Tech TalksIntroducing Amazon SageMaker - AWS Online Tech Talks
Introducing Amazon SageMaker - AWS Online Tech Talks
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
 
Machine Learning at the Edge (AIM302) - AWS re:Invent 2018
Machine Learning at the Edge (AIM302) - AWS re:Invent 2018Machine Learning at the Edge (AIM302) - AWS re:Invent 2018
Machine Learning at the Edge (AIM302) - AWS re:Invent 2018
 
Predictive Analytics - Big Data Warehousing Meetup, Zementis
Predictive Analytics - Big Data Warehousing Meetup, ZementisPredictive Analytics - Big Data Warehousing Meetup, Zementis
Predictive Analytics - Big Data Warehousing Meetup, Zementis
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
 

More from KNIMESlides

What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
KNIMESlides
 
Codeless Deep Learning for Language Modeling and Image Classification
Codeless Deep Learning for Language Modeling and Image ClassificationCodeless Deep Learning for Language Modeling and Image Classification
Codeless Deep Learning for Language Modeling and Image Classification
KNIMESlides
 
Automating Inferences out of Financial Data
Automating Inferences out of Financial DataAutomating Inferences out of Financial Data
Automating Inferences out of Financial Data
KNIMESlides
 
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
KNIMESlides
 
Credit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialCredit Card Fraud Detection Tutorial
Credit Card Fraud Detection Tutorial
KNIMESlides
 
Practicing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case StudiesPracticing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case Studies
KNIMESlides
 
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
KNIMESlides
 
Scoring Metrics for Classification Models
Scoring Metrics for Classification ModelsScoring Metrics for Classification Models
Scoring Metrics for Classification Models
KNIMESlides
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics Platform
KNIMESlides
 
Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics Platform
KNIMESlides
 
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon basedSentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
KNIMESlides
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
KNIMESlides
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
KNIMESlides
 
Knime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network MiningKnime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network Mining
KNIMESlides
 
Text Processing with KNIME
Text Processing with KNIMEText Processing with KNIME
Text Processing with KNIME
KNIMESlides
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!
KNIMESlides
 

More from KNIMESlides (16)

What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
 
Codeless Deep Learning for Language Modeling and Image Classification
Codeless Deep Learning for Language Modeling and Image ClassificationCodeless Deep Learning for Language Modeling and Image Classification
Codeless Deep Learning for Language Modeling and Image Classification
 
Automating Inferences out of Financial Data
Automating Inferences out of Financial DataAutomating Inferences out of Financial Data
Automating Inferences out of Financial Data
 
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
 
Credit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialCredit Card Fraud Detection Tutorial
Credit Card Fraud Detection Tutorial
 
Practicing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case StudiesPracticing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case Studies
 
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
 
Scoring Metrics for Classification Models
Scoring Metrics for Classification ModelsScoring Metrics for Classification Models
Scoring Metrics for Classification Models
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics Platform
 
Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics Platform
 
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon basedSentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
Knime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network MiningKnime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network Mining
 
Text Processing with KNIME
Text Processing with KNIMEText Processing with KNIME
Text Processing with KNIME
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!
 

Recently uploaded

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 

Recently uploaded (20)

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 

From raw data to deployment

  • 1. © 2018 KNIME AG. All Right Reserved. From Raw Data to Deployment Kilian.Thiel@knime.com Marten.Pfannenschmidt@knime.com Kathrin.Melcher@knime.com KNIME
  • 2. © 2018 KNIME AG. All Rights Reserved. Do you recognize this? 2 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 3. © 2018 KNIME AG. All Rights Reserved. Let’s unroll it! It always starts with some data … 3 Data Preparation Model Training Model Optimization Deployment Data Manipulation Data Blending Missing Values Handling Feature Generation Dimensionality Reduction Feature Selection Outlier Removal Normalization Partitioning … Model Training Bag of Models Model Selection Ensemble Models Own Ensemble Model External Models Import Existing Models Model Factory … Parameter Tuning Parameter Optimization Regularization Model Size No Iterations … Performance Measures Accuracy ROC Curve Cross-Validation … Files & DBs Dashboards REST API SQL Code Export Reporting … Model Evaluation
  • 4. © 2018 KNIME AG. All Rights Reserved. The many Lives of a Dataset 4 Data Preparation Model Training Model Optimization Model Evaluation Deployment Partitioning: • Training Set • Validation Set • Test Set Training Set Validation Set Test Set New Data from Real World Applications Original Data Set with Past Observations
  • 5. © 2018 KNIME AG. All Rights Reserved. Data Exploration • Sometimes in between Data Access and Data Preparation there is a Data Exploration phase • The Data Exploration phase is useful to get to know the data • KNIME offers a few visualization nodes to build dashboards to explore the data 5
  • 6. © 2018 KNIME AG. All Rights Reserved. One Example for Every Need The KNIME EXAMPLES Server 6 50_Applications/27_FromRawDataToDeployment
  • 7. © 2018 KNIME AG. All Rights Reserved. Classification Problem & Data Set • Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html • Smaller dataset (Jan 2007) (AirlineDataset.table) • Challenge: Predict Departure Delays If on original airline dataset, only flights from airport ORD Output Class = “delay” if depdelay > 15min otherwise “no delay” Input features all what is available and more if you can find it! 7
  • 8. © 2018 KNIME AG. All Rights Reserved. Challenges • Group 1. Data Access and Data Preparation • Group 2. ML Model Training • Group 3. Model Deployment • Import file Learnathon_2018.knar into your workspace 8
  • 9. © 2018 KNIME AG. All Rights Reserved. Group 1. Data Access and Data Preparation 9
  • 10. © 2018 KNIME AG. All Rights Reserved. Group 2. Model Training & Optimization 10
  • 11. © 2018 KNIME AG. All Rights Reserved. Group 3. Deployment 11
  • 12. © 2018 KNIME AG. All Rights Reserved. KNIME Spring Summit 2018 March 5 – 9 at Hotel Berlin, Berlin in Germany • Monday & Tuesday: One-day courses • Wednesday & Thursday: Summit sessions • Friday: Workshops Use the code LEARNATHON for 10% off tickets! Register at www.KNIME.com
  • 13. © 2018 KNIME AG. All Rights Reserved. KNIME Beginner’s Luck Book Free Copy of KNIME Beginner’s Luck Book at KNIME Press https://www.knime.org/knimepress Promotion Code: KNIME_Learnathon_2018
  • 14. © 2018 KNIME AG. All Rights Reserved. You can find KNIMers here! 14 • KNIME (www.knime.org) • BLOG for news, tips and tricks(www.knime.org/blog) • FORUM for questions and answers (tech.knime.org/forum) • EXAMPLE SERVER for example workflows • LEARNING HUB (www.knime.org/learning-hub) • KNIME TV channel on • KNIME on @KNIME • KNIME on https://www.facebook.com/KNIMEanalytics • KNIME User Group UK on https://www.meetup.com/KNIME-User-Group-UK/
  • 15. © 2017 KNIME AG. All Rights Reserved. 15 The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. Thank You!