SlideShare a Scribd company logo
1 of 42
MATHEMATICS ONLINE
Data-Mining, Predictive Analytics, Clustering, A.I.,
Machine Learning… and where to learn all this.

                   Boole Prize 2012
                    Mark Moriarty
                University College Cork
3 SECTIONS:
• 1 - Overview to some applications of Maths online.
• 2 - Sample algorithms.
• 3 - Recommended online Maths courses.
SECTION 1 (MOTIVATION):
MATHEMATICS IN ACTION
• User Clustering.             • Facebook Feed.
• Recommender Systems. Movie   • Google’s PageRank.
  recommendations.
                               • DNA sequencing.
• Shopper analytics – send
  relevant coupons.            • Health analytics.
• Voice recognition. Machine   • Intelligent ad displays.
  Learning.                    • etc.
• Spam detection.
• Fraud detection.
AWKS…

“My daughter got this in the mail!
She’s still in high school, and
you’re sending her coupons for
baby clothes and cribs? Are you
trying to encourage her to get
pregnant?! ”
HOW TARGET FIGURED OUT A TEEN GIRL WAS
   PREGNANT BEFORE HER FATHER DID
As Pole’s computers crawled through the data, he was
able to identify about 25 products that, when analyzed
together, allowed him to assign each shopper a
“pregnancy prediction” score. More important, he
could also estimate her due date to within a small
window, so Target could send coupons timed to very
specific stages of her pregnancy.

  Take a fictional Target shopper who is 23, and in March bought cocoa-
  butter lotion, a purse large enough to double as a diaper bag, zinc and
  magnesium supplements and a bright blue rug. There’s, say, an 87%
  chance that she’s pregnant and that her delivery date is sometime in late
  August.
HOW KHAN ACADEMY IS USING MACHINE
LEARNING TO ASSESS STUDENT MASTERY
Old method: To determine when a student has finished a certain
exercise, they awarded proficiency to a user who has answered at
least 10 problems in a row correctly — known as a streak.
New metric for accuracy…
What do I mean by accuracy? Now define it as


which is just notation desperately trying to say ‖Given that we just
gained proficiency, what’s the probability of getting the next
problem correct?‖
NETFLIX PRIZE




$1 million top prize for their verified
submission on July 26, 2009,
achieving the winning RMSE of
0.8567 on the test subset. This
represents a 10.06% improvement
over Cinematch’s score on the test
subset at the start of the contest.
PANDORA & THE MUSIC GENOME PROJECT®

• On January 6, 2000 a group of musicians and music-loving
  technologists came together with the idea of creating the most
  comprehensive analysis of music ever.
• Together we set out to capture the essence of music at the most
  fundamental level. We ended up assembling literally hundreds
  of musical attributes or "genes" into a very large Music Genome.
FACEBOOK NEWS FEED & MACHINE LEARNING
FACEBOOK NEWS FEED

The default wall setting is "Top News―.
EdgeRank is there to do the customizing for you, based on
how each item scores in the algorithm.
The three main criteria for an item's algorithm score are:
1. Affinity: How often you and your friends interact
   on the platform
2. Weight: Each type of content is weighted
   differently, based on the past interactions of that
   type of content
3. Time: How old the published item is
AD PLACEMENT
MACHINE LEARNING IS EVERYWHERE




Mario learns to survive: http://www.youtube.com/watch?v=m0tJLTXNT0A
SECTION 2:
SOME ALGORITHMS, BROKEN DOWN
• Recommender Systems
• Logistic Regression
• K nearest neighbours
• K-means clustering
• Naïve Bayes Classifiers
RECOMMENDER SYSTEMS
[CONTENT-BASED EXAMPLE HERE:]




CONTENT-BASED VS COLLABORATIVE
LOGISTIC REGRESSION
• At the most basic level, for one input variable, linear
  regression is simply ―fitting a line to some data‖.


• Let’s look at the in the sample case of the Khan
  Academy:
LOGISTIC REGRESSION ALGORITHM
• vector x = the values of input features
  (eg. % correct).
• vector w = how much each feature
  makes it more likely that the user is
  proficient.
• We can write compactly as a linear
  algebra dot product:
 Already, you can see that the higher z is, the more
 likely the user is to be proficient. To obtain our
 probability estimate, all we have to do is
 ―shrink‖ into the interval (0, 1). We can do this
 by plugging into a sigmoid function:
LOGISTIC REGRESSION RESULTS




From http://david-hu.com/2011/11/02/how-khan-academy-is-using-machine-learning-to-assess-student-mastery.html
K-NEAREST NEIGHBOUR
Tarring you with the same brush as your k nearest peers.
K-MEANS CLUSTERING
     A personal favourite
K-MEANS ALGORITHM SUBSECTION:
• Introduction

• K-means Algorithm

• Example

• K-means Demo

• Relevant Issues

• Conclusion
K-MEANS: INTRODUCTION
•   Partitioning Clustering Approach
    •   a typical clustering analysis approach via partitioning data set iteratively
    •   construct a partition of a data set to produce several non-empty clusters
        (usually, the number of clusters given in advance)
    •   in principle, partitions achieved via minimising the sum of squared distance in
        each cluster                  K                        2
                           E        i 1 x Ci    || x mi ||
•   Given a K, find a partition of K clusters to optimise the chosen
    partitioning criterion
    •   K-means algorithm: each cluster is represented by the centroid of the cluster
        and the algorithm converges to stable centres of clusters.
K-MEAN ALGORITHM
•    Given the cluster number K, the K-means algorithm is carried out in three steps:



    Initialisation: set seed points
    • Assign each object to the
       cluster with the nearest seed
       point
    • Compute seed points as the
       centroids of the clusters of the
       current partition (the centroid
       is the centre, i.e., mean point,
       of the cluster)
    • Go back to Step 1), stop when
       no more new assignment
K-MEANS DEMO

                                             1. User set up the number of
                                                clusters they’d like. (e.g.
                                                   k=5)




Credit to Ke Chen for the example graphics used on this and next few slides.
K-MEANS DEMO
               1. User set up the number of
                  clusters they’d like. (e.g.
                  K=5)
               2. Randomly guess K cluster
                  Center locations
K-MEANS DEMO
               1. User set up the number of
                  clusters they’d like. (e.g.
                  K=5)
               2. Randomly guess K cluster
                  Center locations
               3. Each data point finds out
                  which Center it’s closest to.
                  (Thus each Center “owns” a
                  set of data points)
K-MEANS DEMO
               1. User set up the number of
                  clusters they’d like. (e.g.
                  K=5)
               2. Randomly guess K cluster
                  centre locations
               3. Each data point finds out
                  which centre it’s closest to.
                  (Thus each Center “owns” a
                  set of data points)
               4. Each centre finds the
                  centroid of the points it owns
K-MEANS DEMO
               1. User set up the number of
                  clusters they’d like. (e.g.
                  K=5)
               2. Randomly guess K cluster
                  centre locations
               3. Each data point finds out
                  which centre it’s closest to.
                  (Thus each centre “owns” a
                  set of data points)
               4. Each centre finds the
                  centroid of the points it owns
               5. …and jumps there
K-MEANS DEMO
               1. User set up the number of
                  clusters they’d like. (e.g.
                  K=5)
               2. Randomly guess K cluster
                  centre locations
               3. Each data point finds out
                  which centre it’s closest to.
                  (Thus each centre “owns” a
                  set of data points)
               4. Each centre finds the
                  centroid of the points it owns
               5. …and jumps there
               6. …Repeat until terminated!
RELEVANT ISSUES
•   Efficient in computation
    •   O(tKn), where n is number of objects, K is number of clusters, and t is
        number of iterations. Normally, K, t << n.
•   Local optimum
    •   sensitive to initial seed points
    •   converge to a local optimum that may be unwanted solution
•   Other problems
    •   Need to specify K, the number of clusters, in advance
    •   Unable to handle noisy data and outliers (K-Medoids algorithm)
    •   Not suitable for discovering clusters with non-convex shapes
    •   Applicable only when mean is defined, then what about categorical data? (K-
        mode algorithm)
RELEVANT ISSUES
•   Cluster Validity
    •   With different initial conditions, the K-means algorithm may result in different partitions
        for a given data set.
    •   Which partition is the ―best‖ one for the given data set?
    •   In theory, no answer to this question as there is no ground-truth available in
        unsupervised learning
    •   Nevertheless, there are several cluster validity criteria to assess the quality of
        clustering analysis from different perspectives
    •   A common cluster validity criterion is the ratio of the total between-cluster to the total
        within-cluster distances
        •    Between-cluster distance (BCD): the distance between means of two clusters
        •    Within-cluster distance (WCD): sum of all distance between data points and the
             mean in a specific cluster
        •    A large ratio of BCD:WCD suggests good compactness inside clusters and good
             separability among different clusters!
CONCLUSION
•   K-means algorithm is a simple yet popular method for clustering
    analysis
•   Its performance is determined by initialisation and appropriate
    distance measure
•   There are several variants of K-means to overcome its weaknesses
    • K-Medoids: resistance to noise and/or outliers
    • K-Modes: extension to categorical data clustering analysis
END OF K-MEANS SUBSECTION
• Nearly there now…
ALGORITHM:
NAÏVE BAYES
• What is a classifier?
NAÏVE BAYES ALGORITHM
 • Want          P(spam| words)

                                           P( words | spam) P( spam)
 • Use Bayes Rule: P( spam| words)
                                                  P( words)

 • In English:


P(words) P(words | spam) P(spam) P(words | good) P( good)
 • Assume independence: probability of each word
   independent of others
P(words | spam)   P(word1 | spam) P(word 2 | spam) ... P(wordn | spam)
                                                                 34
SECTION 3:
TAKE FREE TOP-CLASS ONLINE MATH COURSES
• ml-class.org
• Udacity.com
• http://mitx.mit.edu/
FREE STANFORD CLASSES, SPRING 2012
SOME OFFER A STATEMENT OF ACCOMPLISHMENT
UDACITY.COM
ITUNES U
ITUNES U


For philosophy lectures, I recommend Dreyfus or Searle. -Mark
REFERENCES
•   ―One Learning Hypothesis‖ image from http://www.ml-class.org
•   Khan Academy discussion from http://david-hu.com/2011/11/02/how-khan-academy-is-
    using-machine-learning-to-assess-student-mastery.html
•   K-Means images from
    http://www.cs.manchester.ac.uk/ugt/COMP24111/materials/slides/K-means.ppt
•   Word equation for Naïve Bayes: http://www.wikipedia.org
•   K nearest neighbours image from http://mlpy.sourceforge.net/docs/3.0/_images/knn.png
•   Recommender Systems image from
    http://holehouse.org/mlclass/16_Recommender_Systems.html




QUESTIONS?



2012-22-02 UCC Boole Prize                                      M@rkMoriarty.com

More Related Content

What's hot

K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Simplilearn
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 

What's hot (20)

Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep Learning
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
08 clustering
08 clustering08 clustering
08 clustering
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
K means
K meansK means
K means
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Cluster analysis using k-means method in R
Cluster analysis using k-means method in RCluster analysis using k-means method in R
Cluster analysis using k-means method in R
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
K-means Clustering Algorithm with Matlab Source code
K-means Clustering Algorithm with Matlab Source codeK-means Clustering Algorithm with Matlab Source code
K-means Clustering Algorithm with Matlab Source code
 
Machine learning by using python Lesson One Part 2 By Professor Lili Saghafi
Machine learning by using python Lesson One Part 2 By Professor Lili SaghafiMachine learning by using python Lesson One Part 2 By Professor Lili Saghafi
Machine learning by using python Lesson One Part 2 By Professor Lili Saghafi
 
Prediction of Exchange Rate Using Deep Neural Network
Prediction of Exchange Rate Using Deep Neural Network  Prediction of Exchange Rate Using Deep Neural Network
Prediction of Exchange Rate Using Deep Neural Network
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
K-Means manual work
K-Means manual workK-Means manual work
K-Means manual work
 

Similar to Mathematics online: some common algorithms

CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
 
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
MostafaHazemMostafaa
 

Similar to Mathematics online: some common algorithms (20)

Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
6 clustering
6 clustering6 clustering
6 clustering
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Poggi analytics - clustering - 1
Poggi   analytics - clustering - 1Poggi   analytics - clustering - 1
Poggi analytics - clustering - 1
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptx
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Recently uploaded (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

Mathematics online: some common algorithms

  • 1. MATHEMATICS ONLINE Data-Mining, Predictive Analytics, Clustering, A.I., Machine Learning… and where to learn all this. Boole Prize 2012 Mark Moriarty University College Cork
  • 2. 3 SECTIONS: • 1 - Overview to some applications of Maths online. • 2 - Sample algorithms. • 3 - Recommended online Maths courses.
  • 3. SECTION 1 (MOTIVATION): MATHEMATICS IN ACTION • User Clustering. • Facebook Feed. • Recommender Systems. Movie • Google’s PageRank. recommendations. • DNA sequencing. • Shopper analytics – send relevant coupons. • Health analytics. • Voice recognition. Machine • Intelligent ad displays. Learning. • etc. • Spam detection. • Fraud detection.
  • 4. AWKS… “My daughter got this in the mail! She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?! ”
  • 5. HOW TARGET FIGURED OUT A TEEN GIRL WAS PREGNANT BEFORE HER FATHER DID As Pole’s computers crawled through the data, he was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. More important, he could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy. Take a fictional Target shopper who is 23, and in March bought cocoa- butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87% chance that she’s pregnant and that her delivery date is sometime in late August.
  • 6. HOW KHAN ACADEMY IS USING MACHINE LEARNING TO ASSESS STUDENT MASTERY Old method: To determine when a student has finished a certain exercise, they awarded proficiency to a user who has answered at least 10 problems in a row correctly — known as a streak. New metric for accuracy… What do I mean by accuracy? Now define it as which is just notation desperately trying to say ‖Given that we just gained proficiency, what’s the probability of getting the next problem correct?‖
  • 7. NETFLIX PRIZE $1 million top prize for their verified submission on July 26, 2009, achieving the winning RMSE of 0.8567 on the test subset. This represents a 10.06% improvement over Cinematch’s score on the test subset at the start of the contest.
  • 8. PANDORA & THE MUSIC GENOME PROJECT® • On January 6, 2000 a group of musicians and music-loving technologists came together with the idea of creating the most comprehensive analysis of music ever. • Together we set out to capture the essence of music at the most fundamental level. We ended up assembling literally hundreds of musical attributes or "genes" into a very large Music Genome.
  • 9. FACEBOOK NEWS FEED & MACHINE LEARNING
  • 10. FACEBOOK NEWS FEED The default wall setting is "Top News―. EdgeRank is there to do the customizing for you, based on how each item scores in the algorithm. The three main criteria for an item's algorithm score are: 1. Affinity: How often you and your friends interact on the platform 2. Weight: Each type of content is weighted differently, based on the past interactions of that type of content 3. Time: How old the published item is
  • 12. MACHINE LEARNING IS EVERYWHERE Mario learns to survive: http://www.youtube.com/watch?v=m0tJLTXNT0A
  • 13. SECTION 2: SOME ALGORITHMS, BROKEN DOWN • Recommender Systems • Logistic Regression • K nearest neighbours • K-means clustering • Naïve Bayes Classifiers
  • 14. RECOMMENDER SYSTEMS [CONTENT-BASED EXAMPLE HERE:] CONTENT-BASED VS COLLABORATIVE
  • 15. LOGISTIC REGRESSION • At the most basic level, for one input variable, linear regression is simply ―fitting a line to some data‖. • Let’s look at the in the sample case of the Khan Academy:
  • 16. LOGISTIC REGRESSION ALGORITHM • vector x = the values of input features (eg. % correct). • vector w = how much each feature makes it more likely that the user is proficient. • We can write compactly as a linear algebra dot product: Already, you can see that the higher z is, the more likely the user is to be proficient. To obtain our probability estimate, all we have to do is ―shrink‖ into the interval (0, 1). We can do this by plugging into a sigmoid function:
  • 17. LOGISTIC REGRESSION RESULTS From http://david-hu.com/2011/11/02/how-khan-academy-is-using-machine-learning-to-assess-student-mastery.html
  • 18. K-NEAREST NEIGHBOUR Tarring you with the same brush as your k nearest peers.
  • 19. K-MEANS CLUSTERING A personal favourite
  • 20. K-MEANS ALGORITHM SUBSECTION: • Introduction • K-means Algorithm • Example • K-means Demo • Relevant Issues • Conclusion
  • 21. K-MEANS: INTRODUCTION • Partitioning Clustering Approach • a typical clustering analysis approach via partitioning data set iteratively • construct a partition of a data set to produce several non-empty clusters (usually, the number of clusters given in advance) • in principle, partitions achieved via minimising the sum of squared distance in each cluster K 2 E i 1 x Ci || x mi || • Given a K, find a partition of K clusters to optimise the chosen partitioning criterion • K-means algorithm: each cluster is represented by the centroid of the cluster and the algorithm converges to stable centres of clusters.
  • 22. K-MEAN ALGORITHM • Given the cluster number K, the K-means algorithm is carried out in three steps: Initialisation: set seed points • Assign each object to the cluster with the nearest seed point • Compute seed points as the centroids of the clusters of the current partition (the centroid is the centre, i.e., mean point, of the cluster) • Go back to Step 1), stop when no more new assignment
  • 23. K-MEANS DEMO 1. User set up the number of clusters they’d like. (e.g. k=5) Credit to Ke Chen for the example graphics used on this and next few slides.
  • 24. K-MEANS DEMO 1. User set up the number of clusters they’d like. (e.g. K=5) 2. Randomly guess K cluster Center locations
  • 25. K-MEANS DEMO 1. User set up the number of clusters they’d like. (e.g. K=5) 2. Randomly guess K cluster Center locations 3. Each data point finds out which Center it’s closest to. (Thus each Center “owns” a set of data points)
  • 26. K-MEANS DEMO 1. User set up the number of clusters they’d like. (e.g. K=5) 2. Randomly guess K cluster centre locations 3. Each data point finds out which centre it’s closest to. (Thus each Center “owns” a set of data points) 4. Each centre finds the centroid of the points it owns
  • 27. K-MEANS DEMO 1. User set up the number of clusters they’d like. (e.g. K=5) 2. Randomly guess K cluster centre locations 3. Each data point finds out which centre it’s closest to. (Thus each centre “owns” a set of data points) 4. Each centre finds the centroid of the points it owns 5. …and jumps there
  • 28. K-MEANS DEMO 1. User set up the number of clusters they’d like. (e.g. K=5) 2. Randomly guess K cluster centre locations 3. Each data point finds out which centre it’s closest to. (Thus each centre “owns” a set of data points) 4. Each centre finds the centroid of the points it owns 5. …and jumps there 6. …Repeat until terminated!
  • 29. RELEVANT ISSUES • Efficient in computation • O(tKn), where n is number of objects, K is number of clusters, and t is number of iterations. Normally, K, t << n. • Local optimum • sensitive to initial seed points • converge to a local optimum that may be unwanted solution • Other problems • Need to specify K, the number of clusters, in advance • Unable to handle noisy data and outliers (K-Medoids algorithm) • Not suitable for discovering clusters with non-convex shapes • Applicable only when mean is defined, then what about categorical data? (K- mode algorithm)
  • 30. RELEVANT ISSUES • Cluster Validity • With different initial conditions, the K-means algorithm may result in different partitions for a given data set. • Which partition is the ―best‖ one for the given data set? • In theory, no answer to this question as there is no ground-truth available in unsupervised learning • Nevertheless, there are several cluster validity criteria to assess the quality of clustering analysis from different perspectives • A common cluster validity criterion is the ratio of the total between-cluster to the total within-cluster distances • Between-cluster distance (BCD): the distance between means of two clusters • Within-cluster distance (WCD): sum of all distance between data points and the mean in a specific cluster • A large ratio of BCD:WCD suggests good compactness inside clusters and good separability among different clusters!
  • 31. CONCLUSION • K-means algorithm is a simple yet popular method for clustering analysis • Its performance is determined by initialisation and appropriate distance measure • There are several variants of K-means to overcome its weaknesses • K-Medoids: resistance to noise and/or outliers • K-Modes: extension to categorical data clustering analysis
  • 32. END OF K-MEANS SUBSECTION • Nearly there now…
  • 34. NAÏVE BAYES ALGORITHM • Want P(spam| words) P( words | spam) P( spam) • Use Bayes Rule: P( spam| words) P( words) • In English: P(words) P(words | spam) P(spam) P(words | good) P( good) • Assume independence: probability of each word independent of others P(words | spam) P(word1 | spam) P(word 2 | spam) ... P(wordn | spam) 34
  • 35. SECTION 3: TAKE FREE TOP-CLASS ONLINE MATH COURSES • ml-class.org • Udacity.com • http://mitx.mit.edu/
  • 36. FREE STANFORD CLASSES, SPRING 2012
  • 37. SOME OFFER A STATEMENT OF ACCOMPLISHMENT
  • 40. ITUNES U For philosophy lectures, I recommend Dreyfus or Searle. -Mark
  • 41.
  • 42. REFERENCES • ―One Learning Hypothesis‖ image from http://www.ml-class.org • Khan Academy discussion from http://david-hu.com/2011/11/02/how-khan-academy-is- using-machine-learning-to-assess-student-mastery.html • K-Means images from http://www.cs.manchester.ac.uk/ugt/COMP24111/materials/slides/K-means.ppt • Word equation for Naïve Bayes: http://www.wikipedia.org • K nearest neighbours image from http://mlpy.sourceforge.net/docs/3.0/_images/knn.png • Recommender Systems image from http://holehouse.org/mlclass/16_Recommender_Systems.html QUESTIONS? 2012-22-02 UCC Boole Prize M@rkMoriarty.com