SlideShare a Scribd company logo
1 of 35
© 2018 KNIME AG. All rights reserved.
Leveraging H2O Machine Learning
with KNIME Analytics Platform
Christian Dietz
KNIME
H2O Distributed Machine Learning Algorithms
Supervised Learning
• Generalized Linear Models: Binomial,
Gaussian, Gamma, Poisson and Tweedie
• Naïve Bayes
Statistical
Analysis
Ensembles
• Distributed Random Forest: Classification
or regression models
• Gradient Boosting Machine: Produces an
ensemble of decision trees with increasing
refined approximations
Deep Neural
Networks
• Deep learning: Create multi-layer feed
forward neural networks starting with an
input layer followed by multiple layers of
nonlinear transformations
Unsupervised Learning
• K-means: Partitions observations into k
clusters/groups of the same spatial size.
Automatically detect optimal k
Clustering
Dimensionality
Reduction
• Principal Component Analysis: Linearly transforms
correlated variables to independent components
• Generalized Low Rank Models: extend the idea of
PCA to handle arbitrary data consisting of numerical,
Boolean, categorical, and missing data
Anomaly
Detection
• Autoencoders: Find outliers using a
nonlinear dimensionality reduction using
deep learning
Platforms with H2O Integration
H2O + KNIME Talk
at KNIME Summit
March 2018
© 2018 KNIME AG. All rights reserved. 4
KNIME®
• KNIME AG founded in 2008
• Offices in Zurich (HQ), Konstanz, Berlin, and Austin
• Maintainer of the Open Source KNIME Analytics Platform
– comprehensive data loading, processing, analysis, modeling platform
– visual frontend
– open: to all sorts of data, other tools (R and Python, etc.), various
user personas
– 20+ open source releases since 2006
– open source.
• KNIME Server
– 14 commercial product releases since 2008
• KNIME cloud offerings
© 2018 KNIME AG. All rights reserved. 5
KNIME® Software
© 2018 KNIME AG. All rights reserved. 6
KNIME® Analytics Platform
© 2018 KNIME AG. All rights reserved. 7
Analysis & Mining
Statistics
Data Mining
Machine Learning
Deep Learning
Web Analytics
Text Mining
Network Analysis
Social Media Analysis
R, Weka, Python, H2O
Community / 3rd
Data Access
MySQL, Oracle, ...
SAS, SPSS, ...
Excel, Flat, ...
Hive, Impala, ...
XML, JSON, PMML
Text, Doc, Image, ...
Web Crawlers
Industry Specific
Community / 3rd
Transformation
Row,
Column
Matrix
Text, Image
Time Series
Java
Python
Community / 3rd
Visualization
R
Python
JavaScript
Community / 3rd
Deployment
via BIRT
PMML
XML, JSON
Databases
Excel, Flat, etc.
Text, Doc, Image
Industry Specific
Community / 3rd
Over 2000 Native and Embedded Nodes Included
© 2018 KNIME AG. All rights reserved. 8
KNIME H2O Machine Learning Integration
• Offer our users high-performance machine learning
algorithms from H2O in KNIME
• Allow to mix & match with other KNIME
functionality
– Data wrangling KNIME Analytics Platform functionality
– KNIME Big-Data Connectors
– Text Mining, Image Processing, Cheminformatics, …
– and more!
© 2018 KNIME AG. All rights reserved. 9
KNIME H2O Machine Learning Integration
© 2018 KNIME AG. All rights reserved. 10
The Data
Date Store ID Visitors
2016-01-01 ba937bf13d40fb24 28
… … …
2017-04-22 324f7c39a8410e7c 216
Date Store ID Visitors
2017-04-23 e8ed9335d0c38333 ?
… … …
2017-05-31 8f13ef0f5e8c64dd ?
Provided data:
• Number of visitors
• Reservations
• Store information
• Calendar date info
© 2018 KNIME AG. All rights reserved. 11
Visitor Forecasting
Data
preparation
Model
training
Model
optimization
Model
evaluation
Deployment
© 2018 KNIME AG. All rights reserved. 12
Visitor Forecasting
Data
preparation
Model
training
Model
optimization
Model
evaluation
Deployment
© 2018 KNIME AG. All rights reserved. 13
Data Preparation with KNIME Nodes
© 2018 KNIME AG. All rights reserved. 15
Visitor Forecasting
Data
preparation
Model
training
Model
optimization
Model
evaluation
Deployment
© 2018 KNIME AG. All rights reserved. 16
Visitor Forecasting
Data
preparation
Model
training
Model
optimization
Model
evaluation
Deployment
© 2018 KNIME AG. All rights reserved. 17
Modeling with the H2O Nodes
© 2018 KNIME AG. All rights reserved. 18
Modeling with the H2O Nodes
© 2018 KNIME AG. All rights reserved. 19
Modeling with the H2O Nodes
© 2018 KNIME AG. All rights reserved. 20
Modeling with the H2O Nodes
© 2018 KNIME AG. All rights reserved. 21
Visitor Forecasting
Data
preparation
Model
training
Model
optimization
Model
evaluation
Deployment
© 2018 KNIME AG. All rights reserved. 22
Visitor Forecasting
.
Data
preparation
Model
training
Model
optimization
Model
evaluation
Deployment
© 2018 KNIME AG. All rights reserved. 24
Blend H2O with…Python, Java and R Scripting…
24
© 2018 KNIME AG. All rights reserved. 25
…Image Processing...
25
© 2018 KNIME AG. All rights reserved. 26
…Deep Learning...
© 2018 KNIME AG. All rights reserved. 27
...Text Processing...
© 2018 KNIME AG. All rights reserved. 28
...Databases...
© 2018 KNIME AG. All rights reserved. 29
...a growing Big Data Integration.
© 2018 KNIME AG. All rights reserved. 30
H2O Sparkling Water in KNIME
© 2018 KNIME AG. All rights reserved. 31
H2O Sparkling Water in KNIME
© 2018 KNIME AG. All rights reserved. 32
H2O Sparkling Water in KNIME
© 2018 KNIME AG. All rights reserved. 33
H2O Sparkling Water in KNIME
© 2018 KNIME AG. All rights reserved. 34
H2O Sparkling Water in KNIME
© 2018 KNIME AG. All rights reserved. 35
Scoring with H2O MOJOs on Apache Spark
© 2018 KNIME AG. All rights reserved. 36
Thank You!
www.knime.com
37© 2018 KNIME AG. All rights reserved.
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by
KNIME AG under license from KNIME GmbH, and are registered in the United States.
KNIME® is also registered in Germany.

More Related Content

What's hot

A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...Databricks
 
The SevOne Architecture
The SevOne ArchitectureThe SevOne Architecture
The SevOne ArchitectureSevOne
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Splunk Cloud
Splunk CloudSplunk Cloud
Splunk CloudSplunk
 
Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...
Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...
Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...Gautier Poupeau
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraReltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraDataStax Academy
 
Worst Splunk practices...and how to fix them
Worst Splunk practices...and how to fix themWorst Splunk practices...and how to fix them
Worst Splunk practices...and how to fix themSplunk
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief OverviewHal Kalechofsky
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Unidirectional Security, Andrew Ginter of Waterfall Security
Unidirectional Security, Andrew Ginter of Waterfall Security Unidirectional Security, Andrew Ginter of Waterfall Security
Unidirectional Security, Andrew Ginter of Waterfall Security Digital Bond
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Soroosh Khodami
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operatorEui Heo
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overviewStratebi
 
Splunk introduction
Splunk introductionSplunk introduction
Splunk introductionTruong Cuong
 
2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest managementDaliya Spasova
 

What's hot (20)

A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
 
Msbi Architecture
Msbi ArchitectureMsbi Architecture
Msbi Architecture
 
Backup using rsync
Backup using rsyncBackup using rsync
Backup using rsync
 
The SevOne Architecture
The SevOne ArchitectureThe SevOne Architecture
The SevOne Architecture
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Splunk Cloud
Splunk CloudSplunk Cloud
Splunk Cloud
 
Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...
Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...
Réalisation d'un mashup de données avec DSS de Dataiku et visualisation avec ...
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraReltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with Cassandra
 
DMBOK and Data Governance
DMBOK and Data GovernanceDMBOK and Data Governance
DMBOK and Data Governance
 
Worst Splunk practices...and how to fix them
Worst Splunk practices...and how to fix themWorst Splunk practices...and how to fix them
Worst Splunk practices...and how to fix them
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Unidirectional Security, Andrew Ginter of Waterfall Security
Unidirectional Security, Andrew Ginter of Waterfall Security Unidirectional Security, Andrew Ginter of Waterfall Security
Unidirectional Security, Andrew Ginter of Waterfall Security
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operator
 
Mdm: why, when, how
Mdm: why, when, howMdm: why, when, how
Mdm: why, when, how
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overview
 
Splunk introduction
Splunk introductionSplunk introduction
Splunk introduction
 
Splunk-Presentation
Splunk-Presentation Splunk-Presentation
Splunk-Presentation
 
2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management
 

Similar to H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI World London

Open Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareOpen Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareKNIMESlides
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIMESlides
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkKNIMESlides
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialGreg Landrum
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysisGreg Landrum
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Greg Landrum
 
What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1KNIMESlides
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to DeploymentKNIMESlides
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIMESlides
 
OpenPOWER partner presentation - GTS Data
OpenPOWER partner presentation - GTS DataOpenPOWER partner presentation - GTS Data
OpenPOWER partner presentation - GTS DataGanesan Narayanasamy
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphNeo4j
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
ISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureMicrosoft Tech Community
 
Machine Learning for Java Developers - Nasser Ebrahim
Machine Learning for Java Developers - Nasser EbrahimMachine Learning for Java Developers - Nasser Ebrahim
Machine Learning for Java Developers - Nasser EbrahimEclipse Day India
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationAbdelkrim Hadjidj
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 

Similar to H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI World London (20)

Open Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareOpen Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME Software
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
 
What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
 
Your Flight is Boarding Now!
Your Flight is Boarding Now!Your Flight is Boarding Now!
Your Flight is Boarding Now!
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to Deployment
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To Deployment
 
Knime & bioinformatics
Knime & bioinformaticsKnime & bioinformatics
Knime & bioinformatics
 
OpenPOWER partner presentation - GTS Data
OpenPOWER partner presentation - GTS DataOpenPOWER partner presentation - GTS Data
OpenPOWER partner presentation - GTS Data
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge Graph
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
ISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on Azure
 
Machine Learning for Java Developers - Nasser Ebrahim
Machine Learning for Java Developers - Nasser EbrahimMachine Learning for Java Developers - Nasser Ebrahim
Machine Learning for Java Developers - Nasser Ebrahim
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI World London

  • 1. © 2018 KNIME AG. All rights reserved. Leveraging H2O Machine Learning with KNIME Analytics Platform Christian Dietz KNIME
  • 2. H2O Distributed Machine Learning Algorithms Supervised Learning • Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie • Naïve Bayes Statistical Analysis Ensembles • Distributed Random Forest: Classification or regression models • Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations Deep Neural Networks • Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations Unsupervised Learning • K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k Clustering Dimensionality Reduction • Principal Component Analysis: Linearly transforms correlated variables to independent components • Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data Anomaly Detection • Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning
  • 3. Platforms with H2O Integration H2O + KNIME Talk at KNIME Summit March 2018
  • 4. © 2018 KNIME AG. All rights reserved. 4 KNIME® • KNIME AG founded in 2008 • Offices in Zurich (HQ), Konstanz, Berlin, and Austin • Maintainer of the Open Source KNIME Analytics Platform – comprehensive data loading, processing, analysis, modeling platform – visual frontend – open: to all sorts of data, other tools (R and Python, etc.), various user personas – 20+ open source releases since 2006 – open source. • KNIME Server – 14 commercial product releases since 2008 • KNIME cloud offerings
  • 5. © 2018 KNIME AG. All rights reserved. 5 KNIME® Software
  • 6. © 2018 KNIME AG. All rights reserved. 6 KNIME® Analytics Platform
  • 7. © 2018 KNIME AG. All rights reserved. 7 Analysis & Mining Statistics Data Mining Machine Learning Deep Learning Web Analytics Text Mining Network Analysis Social Media Analysis R, Weka, Python, H2O Community / 3rd Data Access MySQL, Oracle, ... SAS, SPSS, ... Excel, Flat, ... Hive, Impala, ... XML, JSON, PMML Text, Doc, Image, ... Web Crawlers Industry Specific Community / 3rd Transformation Row, Column Matrix Text, Image Time Series Java Python Community / 3rd Visualization R Python JavaScript Community / 3rd Deployment via BIRT PMML XML, JSON Databases Excel, Flat, etc. Text, Doc, Image Industry Specific Community / 3rd Over 2000 Native and Embedded Nodes Included
  • 8. © 2018 KNIME AG. All rights reserved. 8 KNIME H2O Machine Learning Integration • Offer our users high-performance machine learning algorithms from H2O in KNIME • Allow to mix & match with other KNIME functionality – Data wrangling KNIME Analytics Platform functionality – KNIME Big-Data Connectors – Text Mining, Image Processing, Cheminformatics, … – and more!
  • 9. © 2018 KNIME AG. All rights reserved. 9 KNIME H2O Machine Learning Integration
  • 10. © 2018 KNIME AG. All rights reserved. 10 The Data Date Store ID Visitors 2016-01-01 ba937bf13d40fb24 28 … … … 2017-04-22 324f7c39a8410e7c 216 Date Store ID Visitors 2017-04-23 e8ed9335d0c38333 ? … … … 2017-05-31 8f13ef0f5e8c64dd ? Provided data: • Number of visitors • Reservations • Store information • Calendar date info
  • 11. © 2018 KNIME AG. All rights reserved. 11 Visitor Forecasting Data preparation Model training Model optimization Model evaluation Deployment
  • 12. © 2018 KNIME AG. All rights reserved. 12 Visitor Forecasting Data preparation Model training Model optimization Model evaluation Deployment
  • 13. © 2018 KNIME AG. All rights reserved. 13 Data Preparation with KNIME Nodes
  • 14. © 2018 KNIME AG. All rights reserved. 15 Visitor Forecasting Data preparation Model training Model optimization Model evaluation Deployment
  • 15. © 2018 KNIME AG. All rights reserved. 16 Visitor Forecasting Data preparation Model training Model optimization Model evaluation Deployment
  • 16. © 2018 KNIME AG. All rights reserved. 17 Modeling with the H2O Nodes
  • 17. © 2018 KNIME AG. All rights reserved. 18 Modeling with the H2O Nodes
  • 18. © 2018 KNIME AG. All rights reserved. 19 Modeling with the H2O Nodes
  • 19. © 2018 KNIME AG. All rights reserved. 20 Modeling with the H2O Nodes
  • 20. © 2018 KNIME AG. All rights reserved. 21 Visitor Forecasting Data preparation Model training Model optimization Model evaluation Deployment
  • 21. © 2018 KNIME AG. All rights reserved. 22 Visitor Forecasting . Data preparation Model training Model optimization Model evaluation Deployment
  • 22. © 2018 KNIME AG. All rights reserved. 24 Blend H2O with…Python, Java and R Scripting… 24
  • 23. © 2018 KNIME AG. All rights reserved. 25 …Image Processing... 25
  • 24. © 2018 KNIME AG. All rights reserved. 26 …Deep Learning...
  • 25. © 2018 KNIME AG. All rights reserved. 27 ...Text Processing...
  • 26. © 2018 KNIME AG. All rights reserved. 28 ...Databases...
  • 27. © 2018 KNIME AG. All rights reserved. 29 ...a growing Big Data Integration.
  • 28. © 2018 KNIME AG. All rights reserved. 30 H2O Sparkling Water in KNIME
  • 29. © 2018 KNIME AG. All rights reserved. 31 H2O Sparkling Water in KNIME
  • 30. © 2018 KNIME AG. All rights reserved. 32 H2O Sparkling Water in KNIME
  • 31. © 2018 KNIME AG. All rights reserved. 33 H2O Sparkling Water in KNIME
  • 32. © 2018 KNIME AG. All rights reserved. 34 H2O Sparkling Water in KNIME
  • 33. © 2018 KNIME AG. All rights reserved. 35 Scoring with H2O MOJOs on Apache Spark
  • 34. © 2018 KNIME AG. All rights reserved. 36 Thank You! www.knime.com
  • 35. 37© 2018 KNIME AG. All rights reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.

Editor's Notes

  1. this competition was published by a japanese Restaurant chain. They wanted to know the number of future visitors for their different stores Lets see what kind of data they provided us to solve this problem
  2. This is the top-level-workflow we used to solve the problem It will guide us through the major steps from reading in the data up to doing the prediction and showcases the interaction of the knime native and the h2o nodes
  3. Lets jump right into our data preparation
  4. Data preparation part of the workflow We‘ll not discuss it in too much detail
  5. In the end we get two datasets the trainset with information about the number of visitors (target variable), which we will use to build our model in the next steps the test dataset without the number of visitors. These have to be predicted by our model and submitted to kaggle lateron
  6. We just did the data preparation, before we jump right into the modeling we have to create a local h2o context and convert our knime table into an h2o frame This frame will be used to build our models
  7. At the moment there are three H2O models implemented in KNIME which are capable of solving such a regression task: Random Forest, Generalized Linear Model and Gradient Boosting Machine Lets have a look at one of those to see how we trained, optimized and evaluated our models
  8. The actual learning of a model happens in one single node: The H2O Random Forest Learner takes the h2o frame with the testset and builds a model Configuration dialog: What is the target variable you want to predict? Here it is visitors, enter some model specific parameters, e.g. number of levels of a single tree and the number of tree models in this forest Next we use the H2O predictor to use the just created model to predict the visitors for our testset Afterwards the score of the model is computed with the H2O regression scorer. as performance measure we used the root mean squared logarithmic error, as this measure is also used on Kaggle to evaluate the final submissions.
  9. To avoid overfitting we use the h2o cross validation loop, which partitions the data and trains one model for each partition of the data !!! Tabelle mit mean von cv einbauen !!!
  10. With one machine learning algorithm, here e.g. random forest, you can solve different problems. With parameters, for a random forest e.g. the number of trees and the treedepth, one can adapt it to a specific problem with respect to the objective function. Here we are looking for parameters that minimize the error of our model validations We did it with a grid search that performs one iteration of the loop for every possible combination of parameters At the end we have a table with all parameter combinations and their respective scores
  11. At the end of the loop we’ve got all parameter combinations with their respective scores. We selected the parameters that lead to the best result and trained a new model on the complete public dataset As you can see we’ve got a nested loop here. Luckily the new H2O nodes are really fast, so this is not gonna be a performance issue
  12. The steps I just showed you happen in all three nodes. Afterwards we select the model which scored best
  13. convert it into an h2o MOJO, which is a model object that is optimized to be embedded in any java environment By doing this we are able to use our just created model outside of an H2O context. We can for example do our prediction for the submission dataset from Kaggle Or we can deploy it to where ever we want, so we just stored it somewhere for Christian. Lets see he is doing with it.