SlideShare a Scribd company logo
October 4, 2017
Roger Dev
Machine Learning Innovations
ML Mini Intro
Machine Learning Basics
• Current implementations support “supervised” learning:
Given a set of data samples:
And a set of target values,
Learn how to predict target values for new samples.
Machine Learning Innovations 3
Record1: Field1, Field2, Field3, … , FieldM
Record2: Field1, Field2, Field3, … , FieldM
…
RecordN: Field1, Field2, Field3, …, FieldM
Record1: TargetValue
Record2: TargetValue
…
RecordN: TargetValue
“Independent” Variables
“Dependent” Variables
Note: The fields in the
independent data are also
known as “features” of the data
• The set of Independent and Dependent data is known as the “Training Set”
• The encapsulated learning is known as the “Model”
• Each model represents a “Hypothesis” regarding the relationship of the Independent to Dependent variables.
• The hallmark of machine learning is the ability to “Generalize”
Machine Learning Example
• Given the following data about trees in a forest:
• Learn a Model that approximates Age (i.e. the Dependent Variable) from Height, Diameter, Altitude, and
Rainfall (i.e. the Independent Variables).
• It is hard to see from the data how to construct such a model
• Machine Learning will automatically construct a model that (usually) minimizes “prediction error”.
• In the process, depending on the algorithm, it may also provide insight into the relationships between the
independent and dependent variables (i.e. inference).
• Note: We normally want to map easy to measure (i.e. “observed”) features to hard-to-measure
(“unobserved”) features.
Machine Learning Innovations 4
Height Diameter Altitude Rainfall Age
50 8 5000 12 80
56 9 4400 10 75
72 12 6500 18 60
47 10 5200 14 53
Quantitative and Qualitative Models
• Supervised Machine Learning supports two major types of model:
• Quantitative – e.g., Determine the numerical age of a tree
• Qualitative – e.g. Determine the species of the tree, Determine if the tree is
healthy or not.
• Learning a Quantitative model is called “Regression” (a term with archaic origins
– don’t try to make sense of it).
• Learning a Qualitative model is called “Classification”.
Machine Learning Innovations 5
Machine Learning Algorithms
• There are many different ML algorithms that all try to do the same things (Classification or
Regression)
• Each ML algorithm has limitations on the types of model it can produce. The space of all
possible models for an algorithm is called its “Hypothesis Set”.
• “Linear models” assume that the dependent variable is a function of the
independent variables multiplied by a coefficient (e.g. f(field1 * coef1 + field2 *
coef2 + … + fieldM * coefM + constant))
• “Tree models” assume that the dependent variable can be determined by asking a
hierarchical set of questions about the independent data. (e.g. is height >= 50
feet? If so, is rainfall >= 11 inches?...then age is 65).
• Some other models (e.g. Neural Nets) are difficult to explain or visualize, and are
therefore not considered “Explanatory”.
Machine Learning Innovations 6
Using Machine Learning
• Machine Learning is fun and easy
• It is simple to invoke one of our ML algorithms, learn a model and
make some predictions.
• If you have some data, give it a try.
• It will likely provide good insights, and may identify some significant
possibilities for adding value.
• Machine Learning is also dangerous (not just because of robot death rays)
• There are many ways to produce inaccurate and misleading
predictions.
• Bad questions: yesterday’s temperature -> today’s stock close
• Bad data – Missing values, incorrect values -> bad predictions
• Wrong assumptions vis-à-vis the algorithm chosen
• Insufficient Training Data
• Overfitting
• Many others …
• Always consult a qualified Data Scientist before applying your models
to a ‘production’ activity.
Machine Learning Innovations 7
And we
may never
have
discovered
PI
The “true” story of Archimedes – A cautionary tale
Machine Learning Innovations 8
D  AD, D2, D3 A
Boosted Trees
Current ML
Bundles
HPCC Systems Platform Machine Learning Approaches
• Experimental Bundle – ecl-ml (https://github.com/hpcc-systems/ecl-ml)
• Rich set of machine learning methods at various levels of quality, performance,
and documentation.
• Recently added Gradient Boosted Trees
• Embedded Language support for Python and R allow industry ML platforms:
• Various R packages
• Sckit-learn (Python)
• Google TensorFlow (Python)
• Production Bundles
• Fully supported, Validation tested, HPCC optimized, and Performance profiled.
• Bundles are generally independent of platform release, and easy to install
• The Bundle installer will alert if there is a platform version dependency for a
particular bundle
Machine Learning Innovations 10
ML Bundles
• ML_Core – Machine Learning Core (https://github.com/hpcc-systems/ML_Core)
• Provides the base data types and common data analysis methods.
• Also includes methods to convert between your record oriented data and the matrix form used by the ML
algorithms.
• This is a dependency for all of the ML bundles.
• PBblas – Parallel Block Basic Linear Algebra Subsystem (https://github.com/hpcc-systems/PBblas)
• Provides parallelized large-scale matrix operations tailored to the HPCC Systems Platform.
• This is a dependency for many of the ML bundles.
• LinearRegression – Ordinary least squares linear regression (https://github.com/hpcc-systems/LinearRegression)
• Allows multiple dependent variables (Multi-variate)
• Provides a range of analytic methods for interpreting the results
• Can be useful for testing relationship hypotheses (
• LogisticRegression – Logistic Regression Bundle (https://github.com/hpcc-systems/LogisticRegression)
• Despite “regression” in its name, provides a linear model based classification mechanism
• Binomial (yes/no) classification
• Multinomial (n-way) classification using Softmax.
• Allows multiple dependent variables (Multi-variate)
• Includes analytic methods.
Machine Learning Innovations 11
ML Bundles (cont’d)
• SupportVectorMachines – SVM Bundle (https://github.com/hpcc-systems/SupportVectorMachines)
• Classification or Regression.
• Uses the popular LibSVM library under the hood
• Appropriate for moderate sized datasets or solving many moderate sized problems in parallel.
• Includes automatic parallelized grid-search to find the best regularization parameters.
• GLM – General Linear Model (https://github.com/hpcc-systems/GLM)
• Provides various linear methods that can be used when the assumptions of Linear or Logistic
Regression don’t apply to your data.
• Includes: binomial, quasibinomial, Poisson, quasipoisson, Gaussian, inverse Gaussian and
Gamma GLM families.
• Can be readily adapted to other error distribution assumptions.
• Accepts user-defined observation weights/frequencies/priors via a weights argument.
• Includes analytics.
Machine Learning Innovations 12
ML Bundles (cont’d)
• LearningTrees –Random Forests (https://github.com/hpcc-systems/LearningTrees)
• Classification or Regression
• One of the best “out-of-box” ML methods as it makes few assumptions about the nature of the
data, and is resistant to overfitting. Capable of dealing with large numbers of Independent
Variables.
• Creates a “forest” of diverse Decision Trees and averages the responses of the different Trees.
• Provides “Feature Importance” metrics.
• Later versions are planned to support single Decision Trees (C4.5) and Gradient Boosting Trees.
Machine Learning Innovations 13

More Related Content

What's hot

Standard template library
Standard template libraryStandard template library
Standard template library
ThamizhselviKrishnam
 
weka data mining
weka data mining weka data mining
weka data mining
kalthoom almaqbali
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learn
Yoss Cohen
 
Data strucutre basic introduction
Data strucutre basic introductionData strucutre basic introduction
Data strucutre basic introduction
manirajan12
 
Collections Training
Collections TrainingCollections Training
Collections Training
Ramindu Deshapriya
 
Python Programming | JNTUK | UNIT 1 | Lecture 4
Python Programming | JNTUK | UNIT 1 | Lecture 4Python Programming | JNTUK | UNIT 1 | Lecture 4
Python Programming | JNTUK | UNIT 1 | Lecture 4
FabMinds
 

What's hot (6)

Standard template library
Standard template libraryStandard template library
Standard template library
 
weka data mining
weka data mining weka data mining
weka data mining
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learn
 
Data strucutre basic introduction
Data strucutre basic introductionData strucutre basic introduction
Data strucutre basic introduction
 
Collections Training
Collections TrainingCollections Training
Collections Training
 
Python Programming | JNTUK | UNIT 1 | Lecture 4
Python Programming | JNTUK | UNIT 1 | Lecture 4Python Programming | JNTUK | UNIT 1 | Lecture 4
Python Programming | JNTUK | UNIT 1 | Lecture 4
 

Similar to Machine Learning Innovations

Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Sri Ambati
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
Akshay Kanchan
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
NIKHILGR3
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
MLconf
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
Sri Ambati
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
Novartis Institutes for BioMedical Research
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
BhagyasriPatel2
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
Ted Xiao
 
Primer on major data mining algorithms
Primer on major data mining algorithmsPrimer on major data mining algorithms
Primer on major data mining algorithms
Vikram Sankhala IIT, IIM, Ex IRS, FRM, Fin.Engr
 
The Object Oriented Database System Manifesto
The Object Oriented Database System ManifestoThe Object Oriented Database System Manifesto
The Object Oriented Database System Manifesto
Beat Signer
 
c23_ml1.ppt
c23_ml1.pptc23_ml1.ppt
c23_ml1.ppt
Faiz430036
 
CodeLess Machine Learning
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine Learning
Sharjeel Imtiaz
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Maninda Edirisooriya
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
Sri Ambati
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
Jeff Heaton
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
jaumebp
 

Similar to Machine Learning Innovations (20)

Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Primer on major data mining algorithms
Primer on major data mining algorithmsPrimer on major data mining algorithms
Primer on major data mining algorithms
 
The Object Oriented Database System Manifesto
The Object Oriented Database System ManifestoThe Object Oriented Database System Manifesto
The Object Oriented Database System Manifesto
 
c23_ml1.ppt
c23_ml1.pptc23_ml1.ppt
c23_ml1.ppt
 
CodeLess Machine Learning
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine Learning
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 

More from HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
HPCC Systems
 
Welcome
WelcomeWelcome
Welcome
HPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
HPCC Systems
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
HPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
HPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
HPCC Systems
 
Docker Support
Docker Support Docker Support
Docker Support
HPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
HPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
HPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
HPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
HPCC Systems
 

More from HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Recently uploaded

一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
ArianaRamos54
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
mbawufebxi
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
22ad0301
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 

Recently uploaded (20)

一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 

Machine Learning Innovations

  • 1. October 4, 2017 Roger Dev Machine Learning Innovations
  • 3. Machine Learning Basics • Current implementations support “supervised” learning: Given a set of data samples: And a set of target values, Learn how to predict target values for new samples. Machine Learning Innovations 3 Record1: Field1, Field2, Field3, … , FieldM Record2: Field1, Field2, Field3, … , FieldM … RecordN: Field1, Field2, Field3, …, FieldM Record1: TargetValue Record2: TargetValue … RecordN: TargetValue “Independent” Variables “Dependent” Variables Note: The fields in the independent data are also known as “features” of the data • The set of Independent and Dependent data is known as the “Training Set” • The encapsulated learning is known as the “Model” • Each model represents a “Hypothesis” regarding the relationship of the Independent to Dependent variables. • The hallmark of machine learning is the ability to “Generalize”
  • 4. Machine Learning Example • Given the following data about trees in a forest: • Learn a Model that approximates Age (i.e. the Dependent Variable) from Height, Diameter, Altitude, and Rainfall (i.e. the Independent Variables). • It is hard to see from the data how to construct such a model • Machine Learning will automatically construct a model that (usually) minimizes “prediction error”. • In the process, depending on the algorithm, it may also provide insight into the relationships between the independent and dependent variables (i.e. inference). • Note: We normally want to map easy to measure (i.e. “observed”) features to hard-to-measure (“unobserved”) features. Machine Learning Innovations 4 Height Diameter Altitude Rainfall Age 50 8 5000 12 80 56 9 4400 10 75 72 12 6500 18 60 47 10 5200 14 53
  • 5. Quantitative and Qualitative Models • Supervised Machine Learning supports two major types of model: • Quantitative – e.g., Determine the numerical age of a tree • Qualitative – e.g. Determine the species of the tree, Determine if the tree is healthy or not. • Learning a Quantitative model is called “Regression” (a term with archaic origins – don’t try to make sense of it). • Learning a Qualitative model is called “Classification”. Machine Learning Innovations 5
  • 6. Machine Learning Algorithms • There are many different ML algorithms that all try to do the same things (Classification or Regression) • Each ML algorithm has limitations on the types of model it can produce. The space of all possible models for an algorithm is called its “Hypothesis Set”. • “Linear models” assume that the dependent variable is a function of the independent variables multiplied by a coefficient (e.g. f(field1 * coef1 + field2 * coef2 + … + fieldM * coefM + constant)) • “Tree models” assume that the dependent variable can be determined by asking a hierarchical set of questions about the independent data. (e.g. is height >= 50 feet? If so, is rainfall >= 11 inches?...then age is 65). • Some other models (e.g. Neural Nets) are difficult to explain or visualize, and are therefore not considered “Explanatory”. Machine Learning Innovations 6
  • 7. Using Machine Learning • Machine Learning is fun and easy • It is simple to invoke one of our ML algorithms, learn a model and make some predictions. • If you have some data, give it a try. • It will likely provide good insights, and may identify some significant possibilities for adding value. • Machine Learning is also dangerous (not just because of robot death rays) • There are many ways to produce inaccurate and misleading predictions. • Bad questions: yesterday’s temperature -> today’s stock close • Bad data – Missing values, incorrect values -> bad predictions • Wrong assumptions vis-à-vis the algorithm chosen • Insufficient Training Data • Overfitting • Many others … • Always consult a qualified Data Scientist before applying your models to a ‘production’ activity. Machine Learning Innovations 7
  • 8. And we may never have discovered PI The “true” story of Archimedes – A cautionary tale Machine Learning Innovations 8 D  AD, D2, D3 A Boosted Trees
  • 10. HPCC Systems Platform Machine Learning Approaches • Experimental Bundle – ecl-ml (https://github.com/hpcc-systems/ecl-ml) • Rich set of machine learning methods at various levels of quality, performance, and documentation. • Recently added Gradient Boosted Trees • Embedded Language support for Python and R allow industry ML platforms: • Various R packages • Sckit-learn (Python) • Google TensorFlow (Python) • Production Bundles • Fully supported, Validation tested, HPCC optimized, and Performance profiled. • Bundles are generally independent of platform release, and easy to install • The Bundle installer will alert if there is a platform version dependency for a particular bundle Machine Learning Innovations 10
  • 11. ML Bundles • ML_Core – Machine Learning Core (https://github.com/hpcc-systems/ML_Core) • Provides the base data types and common data analysis methods. • Also includes methods to convert between your record oriented data and the matrix form used by the ML algorithms. • This is a dependency for all of the ML bundles. • PBblas – Parallel Block Basic Linear Algebra Subsystem (https://github.com/hpcc-systems/PBblas) • Provides parallelized large-scale matrix operations tailored to the HPCC Systems Platform. • This is a dependency for many of the ML bundles. • LinearRegression – Ordinary least squares linear regression (https://github.com/hpcc-systems/LinearRegression) • Allows multiple dependent variables (Multi-variate) • Provides a range of analytic methods for interpreting the results • Can be useful for testing relationship hypotheses ( • LogisticRegression – Logistic Regression Bundle (https://github.com/hpcc-systems/LogisticRegression) • Despite “regression” in its name, provides a linear model based classification mechanism • Binomial (yes/no) classification • Multinomial (n-way) classification using Softmax. • Allows multiple dependent variables (Multi-variate) • Includes analytic methods. Machine Learning Innovations 11
  • 12. ML Bundles (cont’d) • SupportVectorMachines – SVM Bundle (https://github.com/hpcc-systems/SupportVectorMachines) • Classification or Regression. • Uses the popular LibSVM library under the hood • Appropriate for moderate sized datasets or solving many moderate sized problems in parallel. • Includes automatic parallelized grid-search to find the best regularization parameters. • GLM – General Linear Model (https://github.com/hpcc-systems/GLM) • Provides various linear methods that can be used when the assumptions of Linear or Logistic Regression don’t apply to your data. • Includes: binomial, quasibinomial, Poisson, quasipoisson, Gaussian, inverse Gaussian and Gamma GLM families. • Can be readily adapted to other error distribution assumptions. • Accepts user-defined observation weights/frequencies/priors via a weights argument. • Includes analytics. Machine Learning Innovations 12
  • 13. ML Bundles (cont’d) • LearningTrees –Random Forests (https://github.com/hpcc-systems/LearningTrees) • Classification or Regression • One of the best “out-of-box” ML methods as it makes few assumptions about the nature of the data, and is resistant to overfitting. Capable of dealing with large numbers of Independent Variables. • Creates a “forest” of diverse Decision Trees and averages the responses of the different Trees. • Provides “Feature Importance” metrics. • Later versions are planned to support single Decision Trees (C4.5) and Gradient Boosting Trees. Machine Learning Innovations 13