SlideShare a Scribd company logo
Large-Scale
Machine Learning
Armin shoughi
Dec 2019
machine-learning framework
In this section we introduce the framework for machine-learning algorithms and give the basic
definitions.
Machine learning
Tom Mitchel : “A computer program is said to learn from experience
E with respect to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P, improves with
experience E.”
▪ Supervise learning >> 1)Regression 2)Classification
▪ Unsupervised learning >> 1)Clustering 2)Assuciation Rules
▪ Reinforcement learning
3
Training set
The data to which a machine-learning (often abbreviated ML)
algorithm is applied is called a training set. A training set consists of a
set of pairs (x,y), called training examples, where
• x is a vector of values, often called a feature vector,
• y is the class label, or simply output, the classification value for x.
A B C D
1 122 78 23 1
2 12 64 65 1
3 65 257 82 2
4
Validation and Test set
One general issue regarding the handling of data is that there is a
good reason to withhold some of the available data from the training
set. The remaining data is called the test set. In some cases, we
withhold two sets of training examples, a validation set, as well as the
test set.
5
Batch VS On-line Learning
That is, the entire training set is available at the beginning of the process, and it is all used in
whatever way the algorithm requires to produce a model once and for all. The alternative is
on-line learning, where the training set arrives in a stream and, like any stream, cannot be
revisited after it is processed.
1. Deal with very large training sets, because it does not access more than one training
example at a time.
2. Adapt to changes in the population of training examples as time goes on.
For instance, Google trains its spam-email classifier this way, adapting the classifier for spam
as new kinds of spam email are sent by spammers and indicated to be spam by the
recipients.
6
Feature Selection
Sometimes, the hardest part of designing a good model or classifier
is figuring out what features to use as input to the learning algorithm.
For example, spam is often generated by particular hosts, either
those belonging to the spammers, or hosts that have been coopted
into a “botnet” for the purpose of generating spam. Thus, including
the originating host or originating email address into the feature
vector describing an email might enable us to design a better
classifier and lower the error rate.
7
Perceptron
8
A perceptron is a linear binary classifier. Its input is a vector
x = [x1,x2,...,xd]
with real-valued components. Associated with the perceptron is a vector of weights
w = [w1,w2,...,wd]
also with real-valued components.
Each perceptron has a threshold θ. The output of the perceptron is +1 if w.x > θ,
and the output is −1 if w.x < θ. The special case where w.x = θ will always be
regarded as “wrong,”
Training a Perceptron
9
The following method will converge to some hyperplane that separates the positive and
negative examples, provided one exists.
1. Initialize the weight vector w to all 0’s.
2. Pick a learning-rate parameter η, which is a small, positive real number. The choice of η
affects the convergence of the perceptron. If η is too small, then convergence is slow; if it
is too big, then the decision boundary will “dance around” and again will converge slowly,
if at all.
3. Consider each training example t = (x,y) in turn.
(a) Let y′ = w.x.
(b) If y′ and y have the same sign, then do nothing; t is properly classified.
(c) However, if y′ and y have different signs, or y′ = 0, replace w by w + ηyx. That is,
adjust w slightly in the direction of x.
Example
10
and Viagra the of nigeria y
1 1 1 1 1 1 +1
2 0 1 0 0 1 -1
3 0 0 1 0 0 +1
4 1 0 0 1 0 -1
5 1 0 1 1 1 +1
η = 1/2,
Convergence of Perceptron's
11
As we mentioned at the beginning of this section, if the data points
are linearly separable, then the perceptron algorithm will converge to
a separator. However, if the data is not linearly separable, then the
algorithm will eventually repeat a weight vector and loop infinitely.
Parallel Implementation of Perceptron's
12
The training of a perceptron is an inherently sequential process. If the
number of dimensions of the vectors involved is huge, then we might
obtain some parallelism by computing dot products in parallel
The Map Function: Each Map task is given a chunk of training examples, and each Map
task knows the current weight vector w. The Map task computes w.x for each feature vector
x = [x1,x2,...,xk]
in its chunk and compares that dot product with the label y, which is +1 or −1, associated with
x.
The Reduce Function: For each key i, the Reduce task that handles key i adds all the
associated increments and then adds that sum to the ith component of w.
SVM Support Vector Machine
13
As we mentioned at the beginning of this section, if the data points are linearly
separable, then the perceptron algorithm will converge to a separator. However, if
the data is not linearly separable, then the algorithm will eventually repeat a weight
vector and loop infinitely.
SVM Support Vector Machine
14
How to Calculate this Distance ?
15
Test
16
1. What's On-line learning has the advantages?
2. Explain common tests for Perceptron termination.
3. Let us consider training a perceptron to recognize spam email. The training set consists
of pairs (x,y) where x is a vector of 0’s and 1’s, with each component xi corresponding
to the presence (xi = 1) or absence (xi = 0) of a particular word in the email. The value
of y is +1 if the email is known to be spam and −1 if it is known not to be spam. While
the number of words found in the training set of emails is very large, we shall use a
simplified example where there are only five words: “and,” “Viagra,” “the,” “of,” and
“Nigeria.” Figure gives the training set of six vectors and their corresponding classes.
and Viagra the of Nigeria y
1 1 0 0 1 0 +1
2 1 1 1 0 1 -1
3 0 0 1 1 0 +1
4 1 1 0 0 0 -1

More Related Content

What's hot

Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
mahutte
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
EshanAgarwal4
 
Data cleaning and visualization
Data cleaning and visualizationData cleaning and visualization
Data cleaning and visualization
Tapan Gautam
 
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMachine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Maris R
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
MartinHogg9
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
KOYELMAJUMDAR1
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
Davis David
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
Leo Salemann
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
Derek Kane
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using Python
Shirin Mojarad, Ph.D.
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
 
1 Supervised learning
1 Supervised learning1 Supervised learning
1 Supervised learning
Dmytro Fishman
 
Introduction to artificial neural network
Introduction to artificial neural networkIntroduction to artificial neural network
Introduction to artificial neural network
Dr. C.V. Suresh Babu
 

What's hot (20)

Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
Data cleaning and visualization
Data cleaning and visualizationData cleaning and visualization
Data cleaning and visualization
 
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMachine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using Python
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
1 Supervised learning
1 Supervised learning1 Supervised learning
1 Supervised learning
 
Introduction to artificial neural network
Introduction to artificial neural networkIntroduction to artificial neural network
Introduction to artificial neural network
 

Similar to large scale Machine learning

INTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.pptINTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.ppt
BharatDaiyaBharat
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_reportRavi Gupta
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)NYversity
 
Mncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningMncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learning
Seung-gyu Byeon
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
Arunangsu Sahu
 
ML_lec1.pdf
ML_lec1.pdfML_lec1.pdf
ML_lec1.pdf
Abdulrahman181781
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
Prithvi Paneru
 
Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62
Max Kleiner
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
csandit
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
csandit
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
cscpconf
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Classifiers
ClassifiersClassifiers
Classifiers
Ayurdata
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5
Sunwoo Kim
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
preetikumara
 
Perceptron in ANN
Perceptron in ANNPerceptron in ANN
Perceptron in ANN
Zaid Al-husseini
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
ijistjournal
 
Getting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptxGetting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptx
Mohamed Essam
 

Similar to large scale Machine learning (20)

INTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.pptINTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.ppt
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_report
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
Mncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningMncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learning
 
IEEE
IEEEIEEE
IEEE
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
ML_lec1.pdf
ML_lec1.pdfML_lec1.pdf
ML_lec1.pdf
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Classifiers
ClassifiersClassifiers
Classifiers
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Perceptron in ANN
Perceptron in ANNPerceptron in ANN
Perceptron in ANN
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
 
Getting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptxGetting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptx
 

Recently uploaded

Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
yusufzako14
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 

Recently uploaded (20)

Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 

large scale Machine learning

  • 2. machine-learning framework In this section we introduce the framework for machine-learning algorithms and give the basic definitions.
  • 3. Machine learning Tom Mitchel : “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” ▪ Supervise learning >> 1)Regression 2)Classification ▪ Unsupervised learning >> 1)Clustering 2)Assuciation Rules ▪ Reinforcement learning 3
  • 4. Training set The data to which a machine-learning (often abbreviated ML) algorithm is applied is called a training set. A training set consists of a set of pairs (x,y), called training examples, where • x is a vector of values, often called a feature vector, • y is the class label, or simply output, the classification value for x. A B C D 1 122 78 23 1 2 12 64 65 1 3 65 257 82 2 4
  • 5. Validation and Test set One general issue regarding the handling of data is that there is a good reason to withhold some of the available data from the training set. The remaining data is called the test set. In some cases, we withhold two sets of training examples, a validation set, as well as the test set. 5
  • 6. Batch VS On-line Learning That is, the entire training set is available at the beginning of the process, and it is all used in whatever way the algorithm requires to produce a model once and for all. The alternative is on-line learning, where the training set arrives in a stream and, like any stream, cannot be revisited after it is processed. 1. Deal with very large training sets, because it does not access more than one training example at a time. 2. Adapt to changes in the population of training examples as time goes on. For instance, Google trains its spam-email classifier this way, adapting the classifier for spam as new kinds of spam email are sent by spammers and indicated to be spam by the recipients. 6
  • 7. Feature Selection Sometimes, the hardest part of designing a good model or classifier is figuring out what features to use as input to the learning algorithm. For example, spam is often generated by particular hosts, either those belonging to the spammers, or hosts that have been coopted into a “botnet” for the purpose of generating spam. Thus, including the originating host or originating email address into the feature vector describing an email might enable us to design a better classifier and lower the error rate. 7
  • 8. Perceptron 8 A perceptron is a linear binary classifier. Its input is a vector x = [x1,x2,...,xd] with real-valued components. Associated with the perceptron is a vector of weights w = [w1,w2,...,wd] also with real-valued components. Each perceptron has a threshold θ. The output of the perceptron is +1 if w.x > θ, and the output is −1 if w.x < θ. The special case where w.x = θ will always be regarded as “wrong,”
  • 9. Training a Perceptron 9 The following method will converge to some hyperplane that separates the positive and negative examples, provided one exists. 1. Initialize the weight vector w to all 0’s. 2. Pick a learning-rate parameter η, which is a small, positive real number. The choice of η affects the convergence of the perceptron. If η is too small, then convergence is slow; if it is too big, then the decision boundary will “dance around” and again will converge slowly, if at all. 3. Consider each training example t = (x,y) in turn. (a) Let y′ = w.x. (b) If y′ and y have the same sign, then do nothing; t is properly classified. (c) However, if y′ and y have different signs, or y′ = 0, replace w by w + ηyx. That is, adjust w slightly in the direction of x.
  • 10. Example 10 and Viagra the of nigeria y 1 1 1 1 1 1 +1 2 0 1 0 0 1 -1 3 0 0 1 0 0 +1 4 1 0 0 1 0 -1 5 1 0 1 1 1 +1 η = 1/2,
  • 11. Convergence of Perceptron's 11 As we mentioned at the beginning of this section, if the data points are linearly separable, then the perceptron algorithm will converge to a separator. However, if the data is not linearly separable, then the algorithm will eventually repeat a weight vector and loop infinitely.
  • 12. Parallel Implementation of Perceptron's 12 The training of a perceptron is an inherently sequential process. If the number of dimensions of the vectors involved is huge, then we might obtain some parallelism by computing dot products in parallel The Map Function: Each Map task is given a chunk of training examples, and each Map task knows the current weight vector w. The Map task computes w.x for each feature vector x = [x1,x2,...,xk] in its chunk and compares that dot product with the label y, which is +1 or −1, associated with x. The Reduce Function: For each key i, the Reduce task that handles key i adds all the associated increments and then adds that sum to the ith component of w.
  • 13. SVM Support Vector Machine 13 As we mentioned at the beginning of this section, if the data points are linearly separable, then the perceptron algorithm will converge to a separator. However, if the data is not linearly separable, then the algorithm will eventually repeat a weight vector and loop infinitely.
  • 14. SVM Support Vector Machine 14
  • 15. How to Calculate this Distance ? 15
  • 16. Test 16 1. What's On-line learning has the advantages? 2. Explain common tests for Perceptron termination. 3. Let us consider training a perceptron to recognize spam email. The training set consists of pairs (x,y) where x is a vector of 0’s and 1’s, with each component xi corresponding to the presence (xi = 1) or absence (xi = 0) of a particular word in the email. The value of y is +1 if the email is known to be spam and −1 if it is known not to be spam. While the number of words found in the training set of emails is very large, we shall use a simplified example where there are only five words: “and,” “Viagra,” “the,” “of,” and “Nigeria.” Figure gives the training set of six vectors and their corresponding classes. and Viagra the of Nigeria y 1 1 0 0 1 0 +1 2 1 1 1 0 1 -1 3 0 0 1 1 0 +1 4 1 1 0 0 0 -1