SlideShare a Scribd company logo
1 of 16
Large-Scale
Machine Learning
Armin shoughi
Dec 2019
machine-learning framework
In this section we introduce the framework for machine-learning algorithms and give the basic
definitions.
Machine learning
Tom Mitchel : “A computer program is said to learn from experience
E with respect to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P, improves with
experience E.”
▪ Supervise learning >> 1)Regression 2)Classification
▪ Unsupervised learning >> 1)Clustering 2)Assuciation Rules
▪ Reinforcement learning
3
Training set
The data to which a machine-learning (often abbreviated ML)
algorithm is applied is called a training set. A training set consists of a
set of pairs (x,y), called training examples, where
• x is a vector of values, often called a feature vector,
• y is the class label, or simply output, the classification value for x.
A B C D
1 122 78 23 1
2 12 64 65 1
3 65 257 82 2
4
Validation and Test set
One general issue regarding the handling of data is that there is a
good reason to withhold some of the available data from the training
set. The remaining data is called the test set. In some cases, we
withhold two sets of training examples, a validation set, as well as the
test set.
5
Batch VS On-line Learning
That is, the entire training set is available at the beginning of the process, and it is all used in
whatever way the algorithm requires to produce a model once and for all. The alternative is
on-line learning, where the training set arrives in a stream and, like any stream, cannot be
revisited after it is processed.
1. Deal with very large training sets, because it does not access more than one training
example at a time.
2. Adapt to changes in the population of training examples as time goes on.
For instance, Google trains its spam-email classifier this way, adapting the classifier for spam
as new kinds of spam email are sent by spammers and indicated to be spam by the
recipients.
6
Feature Selection
Sometimes, the hardest part of designing a good model or classifier
is figuring out what features to use as input to the learning algorithm.
For example, spam is often generated by particular hosts, either
those belonging to the spammers, or hosts that have been coopted
into a “botnet” for the purpose of generating spam. Thus, including
the originating host or originating email address into the feature
vector describing an email might enable us to design a better
classifier and lower the error rate.
7
Perceptron
8
A perceptron is a linear binary classifier. Its input is a vector
x = [x1,x2,...,xd]
with real-valued components. Associated with the perceptron is a vector of weights
w = [w1,w2,...,wd]
also with real-valued components.
Each perceptron has a threshold θ. The output of the perceptron is +1 if w.x > θ,
and the output is −1 if w.x < θ. The special case where w.x = θ will always be
regarded as “wrong,”
Training a Perceptron
9
The following method will converge to some hyperplane that separates the positive and
negative examples, provided one exists.
1. Initialize the weight vector w to all 0’s.
2. Pick a learning-rate parameter η, which is a small, positive real number. The choice of η
affects the convergence of the perceptron. If η is too small, then convergence is slow; if it
is too big, then the decision boundary will “dance around” and again will converge slowly,
if at all.
3. Consider each training example t = (x,y) in turn.
(a) Let y′ = w.x.
(b) If y′ and y have the same sign, then do nothing; t is properly classified.
(c) However, if y′ and y have different signs, or y′ = 0, replace w by w + ηyx. That is,
adjust w slightly in the direction of x.
Example
10
and Viagra the of nigeria y
1 1 1 1 1 1 +1
2 0 1 0 0 1 -1
3 0 0 1 0 0 +1
4 1 0 0 1 0 -1
5 1 0 1 1 1 +1
η = 1/2,
Convergence of Perceptron's
11
As we mentioned at the beginning of this section, if the data points
are linearly separable, then the perceptron algorithm will converge to
a separator. However, if the data is not linearly separable, then the
algorithm will eventually repeat a weight vector and loop infinitely.
Parallel Implementation of Perceptron's
12
The training of a perceptron is an inherently sequential process. If the
number of dimensions of the vectors involved is huge, then we might
obtain some parallelism by computing dot products in parallel
The Map Function: Each Map task is given a chunk of training examples, and each Map
task knows the current weight vector w. The Map task computes w.x for each feature vector
x = [x1,x2,...,xk]
in its chunk and compares that dot product with the label y, which is +1 or −1, associated with
x.
The Reduce Function: For each key i, the Reduce task that handles key i adds all the
associated increments and then adds that sum to the ith component of w.
SVM Support Vector Machine
13
As we mentioned at the beginning of this section, if the data points are linearly
separable, then the perceptron algorithm will converge to a separator. However, if
the data is not linearly separable, then the algorithm will eventually repeat a weight
vector and loop infinitely.
SVM Support Vector Machine
14
How to Calculate this Distance ?
15
Test
16
1. What's On-line learning has the advantages?
2. Explain common tests for Perceptron termination.
3. Let us consider training a perceptron to recognize spam email. The training set consists
of pairs (x,y) where x is a vector of 0’s and 1’s, with each component xi corresponding
to the presence (xi = 1) or absence (xi = 0) of a particular word in the email. The value
of y is +1 if the email is known to be spam and −1 if it is known not to be spam. While
the number of words found in the training set of emails is very large, we shall use a
simplified example where there are only five words: “and,” “Viagra,” “the,” “of,” and
“Nigeria.” Figure gives the training set of six vectors and their corresponding classes.
and Viagra the of Nigeria y
1 1 0 0 1 0 +1
2 1 1 1 0 1 -1
3 0 0 1 1 0 +1
4 1 1 0 0 0 -1

More Related Content

What's hot

Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineSrivatsan Srinivasan
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Neural network final NWU 4.3 Graphics Course
Neural network final NWU 4.3 Graphics CourseNeural network final NWU 4.3 Graphics Course
Neural network final NWU 4.3 Graphics CourseMohaiminur Rahman
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationMohammed Bennamoun
 
Text classification
Text classificationText classification
Text classificationJames Wong
 
Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentationRamandeep Kaur Bagri
 
Text categorization
Text categorizationText categorization
Text categorizationKU Leuven
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine LearningVARUN KUMAR
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionKnoldus Inc.
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptxsurbhidutta4
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Nural network ER. Abhishek k. upadhyay
Nural network ER. Abhishek  k. upadhyayNural network ER. Abhishek  k. upadhyay
Nural network ER. Abhishek k. upadhyayabhishek upadhyay
 
Multi Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back PropagationMulti Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back PropagationSung-ju Kim
 

What's hot (20)

Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning Pipeline
 
NLP in Cognitive Systems
NLP in Cognitive SystemsNLP in Cognitive Systems
NLP in Cognitive Systems
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Neural network final NWU 4.3 Graphics Course
Neural network final NWU 4.3 Graphics CourseNeural network final NWU 4.3 Graphics Course
Neural network final NWU 4.3 Graphics Course
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
Text classification
Text classificationText classification
Text classification
 
Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentation
 
Text categorization
Text categorizationText categorization
Text categorization
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
 
Back propagation
Back propagationBack propagation
Back propagation
 
Nural network ER. Abhishek k. upadhyay
Nural network ER. Abhishek  k. upadhyayNural network ER. Abhishek  k. upadhyay
Nural network ER. Abhishek k. upadhyay
 
Multi Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back PropagationMulti Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back Propagation
 

Similar to large scale Machine learning

INTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.pptINTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.pptBharatDaiyaBharat
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_reportRavi Gupta
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)NYversity
 
Mncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningMncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningSeung-gyu Byeon
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Max Kleiner
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningcsandit
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...cscpconf
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5Sunwoo Kim
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...ijistjournal
 
Getting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptxGetting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptxMohamed Essam
 

Similar to large scale Machine learning (20)

INTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.pptINTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.ppt
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_report
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
Mncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningMncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learning
 
IEEE
IEEEIEEE
IEEE
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
ML_lec1.pdf
ML_lec1.pdfML_lec1.pdf
ML_lec1.pdf
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Classifiers
ClassifiersClassifiers
Classifiers
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Perceptron in ANN
Perceptron in ANNPerceptron in ANN
Perceptron in ANN
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
 
Getting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptxGetting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptx
 

Recently uploaded

Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 

Recently uploaded (20)

Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 

large scale Machine learning

  • 2. machine-learning framework In this section we introduce the framework for machine-learning algorithms and give the basic definitions.
  • 3. Machine learning Tom Mitchel : “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” ▪ Supervise learning >> 1)Regression 2)Classification ▪ Unsupervised learning >> 1)Clustering 2)Assuciation Rules ▪ Reinforcement learning 3
  • 4. Training set The data to which a machine-learning (often abbreviated ML) algorithm is applied is called a training set. A training set consists of a set of pairs (x,y), called training examples, where • x is a vector of values, often called a feature vector, • y is the class label, or simply output, the classification value for x. A B C D 1 122 78 23 1 2 12 64 65 1 3 65 257 82 2 4
  • 5. Validation and Test set One general issue regarding the handling of data is that there is a good reason to withhold some of the available data from the training set. The remaining data is called the test set. In some cases, we withhold two sets of training examples, a validation set, as well as the test set. 5
  • 6. Batch VS On-line Learning That is, the entire training set is available at the beginning of the process, and it is all used in whatever way the algorithm requires to produce a model once and for all. The alternative is on-line learning, where the training set arrives in a stream and, like any stream, cannot be revisited after it is processed. 1. Deal with very large training sets, because it does not access more than one training example at a time. 2. Adapt to changes in the population of training examples as time goes on. For instance, Google trains its spam-email classifier this way, adapting the classifier for spam as new kinds of spam email are sent by spammers and indicated to be spam by the recipients. 6
  • 7. Feature Selection Sometimes, the hardest part of designing a good model or classifier is figuring out what features to use as input to the learning algorithm. For example, spam is often generated by particular hosts, either those belonging to the spammers, or hosts that have been coopted into a “botnet” for the purpose of generating spam. Thus, including the originating host or originating email address into the feature vector describing an email might enable us to design a better classifier and lower the error rate. 7
  • 8. Perceptron 8 A perceptron is a linear binary classifier. Its input is a vector x = [x1,x2,...,xd] with real-valued components. Associated with the perceptron is a vector of weights w = [w1,w2,...,wd] also with real-valued components. Each perceptron has a threshold θ. The output of the perceptron is +1 if w.x > θ, and the output is −1 if w.x < θ. The special case where w.x = θ will always be regarded as “wrong,”
  • 9. Training a Perceptron 9 The following method will converge to some hyperplane that separates the positive and negative examples, provided one exists. 1. Initialize the weight vector w to all 0’s. 2. Pick a learning-rate parameter η, which is a small, positive real number. The choice of η affects the convergence of the perceptron. If η is too small, then convergence is slow; if it is too big, then the decision boundary will “dance around” and again will converge slowly, if at all. 3. Consider each training example t = (x,y) in turn. (a) Let y′ = w.x. (b) If y′ and y have the same sign, then do nothing; t is properly classified. (c) However, if y′ and y have different signs, or y′ = 0, replace w by w + ηyx. That is, adjust w slightly in the direction of x.
  • 10. Example 10 and Viagra the of nigeria y 1 1 1 1 1 1 +1 2 0 1 0 0 1 -1 3 0 0 1 0 0 +1 4 1 0 0 1 0 -1 5 1 0 1 1 1 +1 η = 1/2,
  • 11. Convergence of Perceptron's 11 As we mentioned at the beginning of this section, if the data points are linearly separable, then the perceptron algorithm will converge to a separator. However, if the data is not linearly separable, then the algorithm will eventually repeat a weight vector and loop infinitely.
  • 12. Parallel Implementation of Perceptron's 12 The training of a perceptron is an inherently sequential process. If the number of dimensions of the vectors involved is huge, then we might obtain some parallelism by computing dot products in parallel The Map Function: Each Map task is given a chunk of training examples, and each Map task knows the current weight vector w. The Map task computes w.x for each feature vector x = [x1,x2,...,xk] in its chunk and compares that dot product with the label y, which is +1 or −1, associated with x. The Reduce Function: For each key i, the Reduce task that handles key i adds all the associated increments and then adds that sum to the ith component of w.
  • 13. SVM Support Vector Machine 13 As we mentioned at the beginning of this section, if the data points are linearly separable, then the perceptron algorithm will converge to a separator. However, if the data is not linearly separable, then the algorithm will eventually repeat a weight vector and loop infinitely.
  • 14. SVM Support Vector Machine 14
  • 15. How to Calculate this Distance ? 15
  • 16. Test 16 1. What's On-line learning has the advantages? 2. Explain common tests for Perceptron termination. 3. Let us consider training a perceptron to recognize spam email. The training set consists of pairs (x,y) where x is a vector of 0’s and 1’s, with each component xi corresponding to the presence (xi = 1) or absence (xi = 0) of a particular word in the email. The value of y is +1 if the email is known to be spam and −1 if it is known not to be spam. While the number of words found in the training set of emails is very large, we shall use a simplified example where there are only five words: “and,” “Viagra,” “the,” “of,” and “Nigeria.” Figure gives the training set of six vectors and their corresponding classes. and Viagra the of Nigeria y 1 1 0 0 1 0 +1 2 1 1 1 0 1 -1 3 0 0 1 1 0 +1 4 1 1 0 0 0 -1