SlideShare a Scribd company logo
1 of 1
Use loops to automate reoccurring processes
exploreread process split model asses deploy
csv reader –
brows to the file
you wish to read
and run the node
xls or xlsx reader
– brows to the
file you wish to
read, select the
desired sheet
and run the node
Parquet files are
great for fast
loading of data –
brows to the file
you wish to read,
and run the node
The traffic light below nodes
signals their state :
Red – not configured needs
additional configuration to run
Yellow – configured and ready
for execution
Green - executed
Use 2D-3D
scatter plot to
visually explore
sparsity, correlation,
feature-interaction
and more
Use the line plot
node in order to
explore trends
and regression
results
Use the data
explorer node to
quickly examine
various stats
about the data
Use math
formula nodes to
manipulate data
and create new
features
Use Pivoting /
Unpivoting /
GroupBy nodes to
create aggregations
of the data per one
of the features
Remove outliers to
create a less skewed
dataset (use wisely as
you might also remove
some of the legitimate
variability of the data)
Use Missing data node to
deal with missing data
completion of missing
values can be by a statistical
value interpolation or other
Type conversions:
Use integer to string to
define a classification task
with numeric targets
Partition the data to
train and test
We can further
partition to train /
validation / test using
two such nodes
Partition the data
multiple times
using a loop
Use xgboost learner to
model classification tasks
(predicting categories from
data) by boosted trees
Parameters:
Objective – binary for 2 classes /
multiclass for more
Eta – the learning rate the lower it is
the more boosting round we will
need – use eta=0.05 as default with
boosting rounds=500-1000
Subsampling rate=0.8
Column sample rate by tree=0.7
Increase regularization by:
• Reducing maximum depth
• Increasing minimum child weight
Use xgboost learner
(regression) to
model regression
tasks (prediction of
sequential targets)
And the same with
regression
ports are the way for each node to receive & transfer
information with other nodes in the workflow
Use the scorer node to
evaluate prediction accuracy
and confusion matrix
(classification models)
Roc curve
You can automatically send
mails from a process using
the send mail node
use the csv writer to
write output data
to a csv file
Meta nodes
Use loop end node
to collect results of
all runs
Use the appropriate
predictor to generate
predictions from the
trained model
Use string to datetime in
order to extract datetime
related features from
datetime saved as strings
Again, use the
appropriate predictor
to generate predictions
from the trained model
Use random forest
learner to model
classification tasks
(predicting categories
from data) by bagging
many decision trees
Again, use the
appropriate predictor
to generate predictions
from the trained model
Data table
Flow variables
Model
Tree ensemble
model
Black triangle ports
represent Input / output
Contains table with data –
either strings / numerical /
integer values are allowed
Red circle ports represent
Input/output consisted of
one or more flow variables
that may be used to replace
parameters in future nodes
Blue ports represent a
trained model
grey ports represent a
trained tree ensemble model
Flow variable ports are
usually hidden for most of
the nodes to display them
right click the node and select
“show flow variable ports”
use the csv writer to
write output data to
either xls or xlsx file
Example for using feature selection loop
Use the numeric scorer node
to evaluate prediction MAE /
(regression models)
Its easy to create encapsulate several nodes to a meta-node
Just select all of the nodes you want to encapsulate, right click
one of them, and select “create meta-node”
Time series
Create lagged
columns to imitate
prediction mode
Remember to split the data by
time to make sure that validation
is consistent with the type of data
you are expected to get on
deployment
Read / write
trained network
learner
Time series
lagged feature creation
Deep learning LSTM training and prediction flow
Machine learning and data science with KNIME – Nathaniel Shimoni
Hyper-parameters
optimization loop

More Related Content

What's hot

Understandung Firebird optimizer, by Dmitry Yemanov (in English)
Understandung Firebird optimizer, by Dmitry Yemanov (in English)Understandung Firebird optimizer, by Dmitry Yemanov (in English)
Understandung Firebird optimizer, by Dmitry Yemanov (in English)Alexey Kovyazin
 
FOSS4G2018: presentation of the HortonMachine library
FOSS4G2018: presentation of the HortonMachine libraryFOSS4G2018: presentation of the HortonMachine library
FOSS4G2018: presentation of the HortonMachine librarysilli
 
Assignment 3 elastic and inelastic traffic
Assignment 3 elastic and inelastic trafficAssignment 3 elastic and inelastic traffic
Assignment 3 elastic and inelastic traffichomeworktimes
 
Algorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial OperationsAlgorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial OperationsNatasha Mandal
 
Wynberg girls high-Jade Gibson-maths-grade9-statistics analysing data
Wynberg girls high-Jade Gibson-maths-grade9-statistics analysing dataWynberg girls high-Jade Gibson-maths-grade9-statistics analysing data
Wynberg girls high-Jade Gibson-maths-grade9-statistics analysing dataWynberg Girls High
 
Spreadsheet terminology
Spreadsheet terminologySpreadsheet terminology
Spreadsheet terminologyTammy Carter
 
Raster data analysis
Raster data analysisRaster data analysis
Raster data analysisAbdul Raziq
 
Use of-Excel
Use of-ExcelUse of-Excel
Use of-ExcelBrisbane
 
Introduction to Basic Spreadsheets
Introduction to Basic SpreadsheetsIntroduction to Basic Spreadsheets
Introduction to Basic SpreadsheetsKingston Tagoe
 
Au 2008 Gs100 1 P Getting Spatial With
Au 2008   Gs100 1 P Getting Spatial WithAu 2008   Gs100 1 P Getting Spatial With
Au 2008 Gs100 1 P Getting Spatial WithRichard Chappell, GISP
 
Geographical information system unit 5
Geographical information  system unit 5Geographical information  system unit 5
Geographical information system unit 5WE-IT TUTORIALS
 
spreadsheet program
spreadsheet programspreadsheet program
spreadsheet programsamina khan
 
Spreadsheet fundamentals
Spreadsheet fundamentalsSpreadsheet fundamentals
Spreadsheet fundamentalscrystalpullen
 
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...IMPACT Centre of Competence
 

What's hot (20)

Dfg & sg ppt (1)
Dfg & sg ppt (1)Dfg & sg ppt (1)
Dfg & sg ppt (1)
 
R Datatypes
R DatatypesR Datatypes
R Datatypes
 
Understandung Firebird optimizer, by Dmitry Yemanov (in English)
Understandung Firebird optimizer, by Dmitry Yemanov (in English)Understandung Firebird optimizer, by Dmitry Yemanov (in English)
Understandung Firebird optimizer, by Dmitry Yemanov (in English)
 
FOSS4G2018: presentation of the HortonMachine library
FOSS4G2018: presentation of the HortonMachine libraryFOSS4G2018: presentation of the HortonMachine library
FOSS4G2018: presentation of the HortonMachine library
 
Assignment 3 elastic and inelastic traffic
Assignment 3 elastic and inelastic trafficAssignment 3 elastic and inelastic traffic
Assignment 3 elastic and inelastic traffic
 
Algorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial OperationsAlgorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial Operations
 
3D Analyst
3D Analyst3D Analyst
3D Analyst
 
Wynberg girls high-Jade Gibson-maths-grade9-statistics analysing data
Wynberg girls high-Jade Gibson-maths-grade9-statistics analysing dataWynberg girls high-Jade Gibson-maths-grade9-statistics analysing data
Wynberg girls high-Jade Gibson-maths-grade9-statistics analysing data
 
Spreadsheet terminology
Spreadsheet terminologySpreadsheet terminology
Spreadsheet terminology
 
Raster data analysis
Raster data analysisRaster data analysis
Raster data analysis
 
Types of flowchart
Types of flowchartTypes of flowchart
Types of flowchart
 
Use of-Excel
Use of-ExcelUse of-Excel
Use of-Excel
 
Introduction to Basic Spreadsheets
Introduction to Basic SpreadsheetsIntroduction to Basic Spreadsheets
Introduction to Basic Spreadsheets
 
Spreadsheet Concepts
Spreadsheet ConceptsSpreadsheet Concepts
Spreadsheet Concepts
 
Au 2008 Gs100 1 P Getting Spatial With
Au 2008   Gs100 1 P Getting Spatial WithAu 2008   Gs100 1 P Getting Spatial With
Au 2008 Gs100 1 P Getting Spatial With
 
Excel to excel
Excel to excelExcel to excel
Excel to excel
 
Geographical information system unit 5
Geographical information  system unit 5Geographical information  system unit 5
Geographical information system unit 5
 
spreadsheet program
spreadsheet programspreadsheet program
spreadsheet program
 
Spreadsheet fundamentals
Spreadsheet fundamentalsSpreadsheet fundamentals
Spreadsheet fundamentals
 
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
 

Similar to ML whitepaper v0.2

Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mark Kromer
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Mark Kromer
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mark Kromer
 
Introduction to-sas-1211594349119006-8
Introduction to-sas-1211594349119006-8Introduction to-sas-1211594349119006-8
Introduction to-sas-1211594349119006-8thotakoti
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelJenny Liu
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark MLAhmet Bulut
 
Payola ESWC 2014 demo poster
Payola ESWC 2014 demo posterPayola ESWC 2014 demo poster
Payola ESWC 2014 demo posterJiří Helmich
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluationavniS
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDatamining Tools
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDataminingTools Inc
 
Evaluation Spss
Evaluation SpssEvaluation Spss
Evaluation Spssjackng
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016 Databricks
 
SumatraTT – PPT
SumatraTT – PPTSumatraTT – PPT
SumatraTT – PPTbutest
 

Similar to ML whitepaper v0.2 (20)

Potter’S Wheel
Potter’S WheelPotter’S Wheel
Potter’S Wheel
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22
 
Introduction to-sas-1211594349119006-8
Introduction to-sas-1211594349119006-8Introduction to-sas-1211594349119006-8
Introduction to-sas-1211594349119006-8
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Spss
SpssSpss
Spss
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
SPSS
SPSSSPSS
SPSS
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
Payola ESWC 2014 demo poster
Payola ESWC 2014 demo posterPayola ESWC 2014 demo poster
Payola ESWC 2014 demo poster
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Evaluation Spss
Evaluation SpssEvaluation Spss
Evaluation Spss
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
 
SumatraTT – PPT
SumatraTT – PPTSumatraTT – PPT
SumatraTT – PPT
 

More from Nathaniel Shimoni

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Current and future challenges in data science
Current and future challenges in data scienceCurrent and future challenges in data science
Current and future challenges in data scienceNathaniel Shimoni
 
Machine learning basic course with KNIME analytics platform
Machine learning basic course with KNIME analytics platformMachine learning basic course with KNIME analytics platform
Machine learning basic course with KNIME analytics platformNathaniel Shimoni
 
Introduction to competitive data science
Introduction to competitive data scienceIntroduction to competitive data science
Introduction to competitive data scienceNathaniel Shimoni
 
Starting data science with kaggle.com
Starting data science with kaggle.comStarting data science with kaggle.com
Starting data science with kaggle.comNathaniel Shimoni
 

More from Nathaniel Shimoni (6)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Current and future challenges in data science
Current and future challenges in data scienceCurrent and future challenges in data science
Current and future challenges in data science
 
Machine learning basic course with KNIME analytics platform
Machine learning basic course with KNIME analytics platformMachine learning basic course with KNIME analytics platform
Machine learning basic course with KNIME analytics platform
 
My path to data science
My path to data scienceMy path to data science
My path to data science
 
Introduction to competitive data science
Introduction to competitive data scienceIntroduction to competitive data science
Introduction to competitive data science
 
Starting data science with kaggle.com
Starting data science with kaggle.comStarting data science with kaggle.com
Starting data science with kaggle.com
 

Recently uploaded

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Recently uploaded (20)

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

ML whitepaper v0.2

  • 1. Use loops to automate reoccurring processes exploreread process split model asses deploy csv reader – brows to the file you wish to read and run the node xls or xlsx reader – brows to the file you wish to read, select the desired sheet and run the node Parquet files are great for fast loading of data – brows to the file you wish to read, and run the node The traffic light below nodes signals their state : Red – not configured needs additional configuration to run Yellow – configured and ready for execution Green - executed Use 2D-3D scatter plot to visually explore sparsity, correlation, feature-interaction and more Use the line plot node in order to explore trends and regression results Use the data explorer node to quickly examine various stats about the data Use math formula nodes to manipulate data and create new features Use Pivoting / Unpivoting / GroupBy nodes to create aggregations of the data per one of the features Remove outliers to create a less skewed dataset (use wisely as you might also remove some of the legitimate variability of the data) Use Missing data node to deal with missing data completion of missing values can be by a statistical value interpolation or other Type conversions: Use integer to string to define a classification task with numeric targets Partition the data to train and test We can further partition to train / validation / test using two such nodes Partition the data multiple times using a loop Use xgboost learner to model classification tasks (predicting categories from data) by boosted trees Parameters: Objective – binary for 2 classes / multiclass for more Eta – the learning rate the lower it is the more boosting round we will need – use eta=0.05 as default with boosting rounds=500-1000 Subsampling rate=0.8 Column sample rate by tree=0.7 Increase regularization by: • Reducing maximum depth • Increasing minimum child weight Use xgboost learner (regression) to model regression tasks (prediction of sequential targets) And the same with regression ports are the way for each node to receive & transfer information with other nodes in the workflow Use the scorer node to evaluate prediction accuracy and confusion matrix (classification models) Roc curve You can automatically send mails from a process using the send mail node use the csv writer to write output data to a csv file Meta nodes Use loop end node to collect results of all runs Use the appropriate predictor to generate predictions from the trained model Use string to datetime in order to extract datetime related features from datetime saved as strings Again, use the appropriate predictor to generate predictions from the trained model Use random forest learner to model classification tasks (predicting categories from data) by bagging many decision trees Again, use the appropriate predictor to generate predictions from the trained model Data table Flow variables Model Tree ensemble model Black triangle ports represent Input / output Contains table with data – either strings / numerical / integer values are allowed Red circle ports represent Input/output consisted of one or more flow variables that may be used to replace parameters in future nodes Blue ports represent a trained model grey ports represent a trained tree ensemble model Flow variable ports are usually hidden for most of the nodes to display them right click the node and select “show flow variable ports” use the csv writer to write output data to either xls or xlsx file Example for using feature selection loop Use the numeric scorer node to evaluate prediction MAE / (regression models) Its easy to create encapsulate several nodes to a meta-node Just select all of the nodes you want to encapsulate, right click one of them, and select “create meta-node” Time series Create lagged columns to imitate prediction mode Remember to split the data by time to make sure that validation is consistent with the type of data you are expected to get on deployment Read / write trained network learner Time series lagged feature creation Deep learning LSTM training and prediction flow Machine learning and data science with KNIME – Nathaniel Shimoni Hyper-parameters optimization loop