MachineLearningSparkML.pptx

•Download as PPTX, PDF•

0 likes•2 views

snigdhaagrawal11

Machine learning

Engineering

Machine Learning with Spark MLlib
Manuel Martín Márquez
Antonio Romero Marin
Joeri Hermans
Hadoop Tutorials

Machine Learning (ML)
• ML is a branch of artificial intelligence:
• Uses computing based systems to make sense out
of data
• Extracting patterns, fitting data to functions, classifying
data, etc
• ML systems can learn and improve
• With historical data, time and experience
• Bridges theoretical computer science and real noise
data.
3

Supervised and Unsupervised Learning
• Unsupervised Learning
• There are not predefined and known set of outcomes
• Look for hidden patterns and relations in the data
• A typical example: Clustering
5
0.0
0.5
1.0
1.5
2.0
2.5
2 4 6
Petal.Length
Petal.Width
irisCluster$cluster
1
2
3

Supervised and Unsupervised Learning
• Supervised Learning
• For every example in the data there is always a predefined
outcome
• Models the relations between a set of descriptive features and
a target (Fits data to a function)
• 2 groups of problems:
• Classification
• Regression
6

Supervised Learning
• Classification
• Predicts which class a given sample of data (sample of
descriptive features) is part of (discrete value).
• Regression
• Predicts continuous values.
7
100.0
0.0
0.0
0.0
96.0
4.0
4.0
0.0
96.0
setosa
versicolor
virginica
setosa versicolor virginica
Actual
Predicted
0
25
50
75
100
Percent

Machine Learning as a Process
Define
Objectives
Data
Preparation
Model
Building
Model
Evaluation
Model
Deployment
8
- Define measurable and quantifiable goals
- Use this stage to learn about the problem
- Normalization
- Transformation
- Missing Values
- Outliers
- Data Splitting
- Features Engineering
- Estimating Performance
- Evaluation and Model
Selection
- Study models accuracy
- Work better than the naïve
approach or previous system
- Do the results make sense in
the context of the problem

ML as a Process: Data Preparation
9
• Needed for several reasons
• Some Models have strict data requirements
• Scale of the data, data point intervals, etc
• Some characteristics of the data may impact dramatically on the
model performance
• Time on data preparation should not be underestimated
• Missing
Values
• Error Values
• Different
Scales
• Dimensionality
• Types
Problems
• Many others
Raw
Data
• Scaling
• Centering
• Skewness
• Outliers
• Missing
Values
• Errors
Data
Transfor
mation
Modeling
phase
Data
Ready

ML as a Process: Feature engineering
10
• Determine the predictors (features) to be used is one of the most critical
questions
• Some times we need to add predictors
• Reduce Number:
• Fewer predictors more interpretable model and less costly
• Most of the models are affected by high dimensionality, specially for non-informative
predictors
• Binning predictors
Wrappers
Multiple models
adding and
removing parameter
Algorithms that use
models as input and
performance as
output
Genetics Algorithms
Filters
Evaluate the
relevance of the
predictor
Based normally on
correlations

ML as a Process: Model Building
11
• Data Splitting
• Allocate data to different tasks
• model training
• performance evaluation
• Define Training, Validation and Test sets
• Feature Selection (Review the decision made previously)
• Estimating Performance
• Visualization of results – discovery interesting areas of the problem space
• Statistics and performance measures
• Evaluation and Model selection
• The ‘no free lunch’ theorem no a priory assumptions can be made
• Avoid use of favorite models if NEEDED

Similar to MachineLearningSparkML.pptx

Guiding through a typical Machine Learning PipelineMichael Gerke

01 Introduction to Data MiningValerii Klymchuk

ML Ops.pptxAdam Doyle

Apache Spark MLlib Zahra Eskandari

chapter5-220725172250-dc425eb2.pdfMahmoudSOLIMAN380726

Chapter 5: Data Development Ahmed Alorage

Machine learning 101 dkom 2017fredverheul

Ideas spracklen-finalsupportlogic

Introduction to Data ScienceNiko Vuokko

Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.

Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu

Citizen Data Science Training using KNIMEAli Raza Anjum

Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Maninda Edirisooriya

Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks

Introduction to Machine LearningKnowledge And Skill Forum

Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsGanesan Narayanasamy

The Machine Learning Workflow with AzureIvo Andreev

Towards a Comprehensive Machine Learning BenchmarkTuri, Inc.

Getting Started with BigQuery MLDan Sullivan, Ph.D.

Fms invited talk_2018 v5Nisha Talagala

Similar to MachineLearningSparkML.pptx (20)

Guiding through a typical Machine Learning Pipeline

01 Introduction to Data Mining

ML Ops.pptx

Apache Spark MLlib

chapter5-220725172250-dc425eb2.pdf

Chapter 5: Data Development

Machine learning 101 dkom 2017

Ideas spracklen-final

Introduction to Data Science

Lucene/Solr Revolution 2015: Where Search Meets Machine Learning

Citizen Data Science Training using KNIME

Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...

Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...

Introduction to Machine Learning

Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems

The Machine Learning Workflow with Azure

Towards a Comprehensive Machine Learning Benchmark

Getting Started with BigQuery ML

Fms invited talk_2018 v5

Recently uploaded

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665

Oxy acetylene welding presentation note.eptoze12

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ

Artificial-Intelligence-in-Electronics (K).pptxbritheesh05

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha

Heart Disease Prediction using machine learning.pptxPoojaBan

power system scada applications and usesDevarapalliHaritha

main PPT.pptx of girls hostel security using rfidNikhilNagaraju

Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1

Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff

Recently uploaded (20)

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service

Call Girls Delhi {Jodhpur} 9711199012 high profile service

Oxy acetylene welding presentation note.

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...

Artificial-Intelligence-in-Electronics (K).pptx

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx

Heart Disease Prediction using machine learning.pptx

power system scada applications and uses

main PPT.pptx of girls hostel security using rfid

Introduction to Microprocesso programming and interfacing.pptx

Software and Systems Engineering Standards: Verification and Validation of Sy...

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Call Girls Narol 7397865700 Independent Call Girls

MachineLearningSparkML.pptx

2. Machine Learning with Spark MLlib Manuel Martín Márquez Antonio Romero Marin Joeri Hermans Hadoop Tutorials

3. Machine Learning (ML) • ML is a branch of artificial intelligence: • Uses computing based systems to make sense out of data • Extracting patterns, fitting data to functions, classifying data, etc • ML systems can learn and improve • With historical data, time and experience • Bridges theoretical computer science and real noise data. 3

4. ML in real-life 4

5. Supervised and Unsupervised Learning • Unsupervised Learning • There are not predefined and known set of outcomes • Look for hidden patterns and relations in the data • A typical example: Clustering 5 0.0 0.5 1.0 1.5 2.0 2.5 2 4 6 Petal.Length Petal.Width irisCluster$cluster 1 2 3

6. Supervised and Unsupervised Learning • Supervised Learning • For every example in the data there is always a predefined outcome • Models the relations between a set of descriptive features and a target (Fits data to a function) • 2 groups of problems: • Classification • Regression 6

7. Supervised Learning • Classification • Predicts which class a given sample of data (sample of descriptive features) is part of (discrete value). • Regression • Predicts continuous values. 7 100.0 0.0 0.0 0.0 96.0 4.0 4.0 0.0 96.0 setosa versicolor virginica setosa versicolor virginica Actual Predicted 0 25 50 75 100 Percent

8. Machine Learning as a Process Define Objectives Data Preparation Model Building Model Evaluation Model Deployment 8 - Define measurable and quantifiable goals - Use this stage to learn about the problem - Normalization - Transformation - Missing Values - Outliers - Data Splitting - Features Engineering - Estimating Performance - Evaluation and Model Selection - Study models accuracy - Work better than the naïve approach or previous system - Do the results make sense in the context of the problem

9. ML as a Process: Data Preparation 9 • Needed for several reasons • Some Models have strict data requirements • Scale of the data, data point intervals, etc • Some characteristics of the data may impact dramatically on the model performance • Time on data preparation should not be underestimated • Missing Values • Error Values • Different Scales • Dimensionality • Types Problems • Many others Raw Data • Scaling • Centering • Skewness • Outliers • Missing Values • Errors Data Transfor mation Modeling phase Data Ready

10. ML as a Process: Feature engineering 10 • Determine the predictors (features) to be used is one of the most critical questions • Some times we need to add predictors • Reduce Number: • Fewer predictors more interpretable model and less costly • Most of the models are affected by high dimensionality, specially for non-informative predictors • Binning predictors Wrappers Multiple models adding and removing parameter Algorithms that use models as input and performance as output Genetics Algorithms Filters Evaluate the relevance of the predictor Based normally on correlations

11. ML as a Process: Model Building 11 • Data Splitting • Allocate data to different tasks • model training • performance evaluation • Define Training, Validation and Test sets • Feature Selection (Review the decision made previously) • Estimating Performance • Visualization of results – discovery interesting areas of the problem space • Statistics and performance measures • Evaluation and Model selection • The ‘no free lunch’ theorem no a priory assumptions can be made • Avoid use of favorite models if NEEDED

Editor's Notes

ML methods fall into two learning types Unsupervised Suppose you want to segment your customers into general categories of people with similar buying patterns.
More formally fits data to a function or a function approximation
More formally fits data to a function or a function approximation
More formally fits data to a function or a function Adding Roles
Add Examples
Random Forest (tree based) MARS and LASSO internally perform predictor selection Add Examples
there is no one single model that will works better than any other a priory

MachineLearningSparkML.pptx

Recommended

Recommended

More Related Content

Similar to MachineLearningSparkML.pptx

Similar to MachineLearningSparkML.pptx (20)

Recently uploaded

Recently uploaded (20)

MachineLearningSparkML.pptx

Editor's Notes