Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Machine Learning
Techniques for the
  Semantic Web
         Paul Dix
     http://pauldix.net
     paul@pauldix.net
Machine Learning
Semantic Web
What is Semantic Web?
Ontology
RDF
Machine Learning is
   about Data
actually...
Making Predictions
 Based on Data
FOAF
Simple Example
Marco Neumann
<http://www.marconeumann.org/foaf.rdf>
  <http://xmlns.com/foaf/0.1/knows>
  <http://community.linkeddata.or...
Marco only knows 4
     people?
Two Degrees Out
4   -   <http://www.w3.org/People/Connolly/home-smart.rdf>
4   -   <http://jibbering.com/foaf.rdf>
2   -  ...
Three Degrees
9   -   <http://sw.deri.org/~knud/knudfoaf.rdf>
8   -   <http://www.w3.org/People/Connolly/home-smart.rdf>
7...
but that’s not really
 machine learning
Short
Machine Learning is


• How you formulate the problem
• How you represent the data
• Graphical Models
• Vector Space Models
Back to FOAF
Convert RDF triples to vector space
We Want to Find
Groups of People
To make predictions on
   their interests...
(subject) (predicate) (object)
Paul        knows      Jeff
Paul        knows      Joe
Paul        knows      Marco
Jeff   ...
Vector Space
        Representation
          Jeff   Joe   Marco   Paul

 Jeff            1              1

 Joe       1  ...
Latent Factors Analysis

• Used in Latent Semantic Indexing (LSI)
• Good for finding synonyms
• Good for finding “genres”
Latent Factors Methods

• Principle Component Analysis (PCA)
• Singular Value Decomposition (SVD)
• Restricted Boltzmann M...
Considerations for
  Semantic Web Data

• Large Data Sets
• Sparse Data Sets
Netflix Prize Research

• Movie Review Data set has similar
  problems
• Generalized Hebbian Algorithm for
  Dimensionality...
Reduce Dimensions


• 1m x 1m matrix with 1m people
• Reduce to 1m x 100
100 Latent Factors
Represent different groups of people based on who
                    they know.
What the Data Might
    Look Like
         Factor 1   Factor 2

  Paul    0.678      0.311

  Joe     0.455      0.432

  ...
Find Similar People
    k Nearest Neighbors
Pick a Similarity Metric

• Euclidean Distance
• Jaccard index
• Cosine Similarity
Joe’s Similarity to Paul
(Paul (f1) - Joe (f1))^2 + (Paul (f2) - Joe (f2))^2)^1/2
Once We’ve Calculated
     Similarities
• Fill In Missing Interests
• Target Ads, Content, Products
• ???
• Profit!
Generalizing RDF
Triples to Vector Space
• Subjects are Rows
• Objects are Columns
• Predicates are values
Object 1    Object 2




Subject 1   Predicate




Subject 2
Predicates Should be
  Mutually Exclusive

• Paul likes Ruby
• Paul hates PHP
• Paul loves PHP
Assign Values to
        Predicates
• 1 = Hates
• 2 = Dislikes
• 3 = Neutral
• 4 = Likes
• 5 = Loves
More Applications
Supervised Learning

• Classifiers
• Ontology Mapping
• Assigning Instances to Concepts
Ontology Mapping


• Examples from Ontology A
• Examples from Ontology B
Train Classifiers


• One Classifier for each Concept in A
• One Classifier for each Concept in B
Classify Instances

• Use A Classifiers to predict which concepts
  B instances map to
• Use B Classifiers to predict which ...
Use Classified Instances


• Predict Concept Mappings
 • Which in A match ones in B
Limitations

• One Classifier per Concept
 • Large Ontologies Could be a Problem
• Ontologies should be a little similar
Unsupervised Learning

• Clustering
 • Hierarchical Clustering
• Learning Ontologies from Text
Machine Learning as
        Triage

• Automatically tag or recommend Examples
  the algorithm is Certain About
• Send unce...
Thank You
     Paul Dix
 paul@pauldix.net
 http://pauldix.net
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
Upcoming SlideShare
Loading in …5
×

Machine Learning Techniques for the Semantic Web

5,137 views

Published on

Published in: Technology, Education
  • Nice to find someone looking at bridging machine learning with semweb :)

    I found my way here by searching for ruby + restricted boltzmann, ... hoping to find some nicely packaged RBM implementation that could be fed to the SemWeb community, so that structure implicit in eg dbpedia and social graph data can be explored. Any recommendations? Or maybe it'd be more productive teaching the machine learning folk where to go find RDF linked data themselves?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Machine Learning Techniques for the Semantic Web

  1. 1. Machine Learning Techniques for the Semantic Web Paul Dix http://pauldix.net paul@pauldix.net
  2. 2. Machine Learning
  3. 3. Semantic Web
  4. 4. What is Semantic Web?
  5. 5. Ontology
  6. 6. RDF
  7. 7. Machine Learning is about Data
  8. 8. actually...
  9. 9. Making Predictions Based on Data
  10. 10. FOAF Simple Example
  11. 11. Marco Neumann <http://www.marconeumann.org/foaf.rdf> <http://xmlns.com/foaf/0.1/knows> <http://community.linkeddata.org/dataspace/person/ kidehen2/about.rdf> . <http://www.marconeumann.org/foaf.rdf> <http://xmlns.com/foaf/0.1/knows> <http://www.johnbreslin.com/foaf/foaf.rdf> . <http://www.marconeumann.org/foaf.rdf> <http://xmlns.com/foaf/0.1/knows> <http://swordfish.rdfweb.org/people/libby/rdfweb/ webwho.xrdf> . <http://www.marconeumann.org/foaf.rdf> <http://xmlns.com/foaf/0.1/knows> <http://danbri.org/foaf.rdf> .
  12. 12. Marco only knows 4 people?
  13. 13. Two Degrees Out 4 - <http://www.w3.org/People/Connolly/home-smart.rdf> 4 - <http://jibbering.com/foaf.rdf> 2 - <http://sw.deri.org/~haller/foaf.rdf> 2 - <http://sw.deri.org/~knud/knudfoaf.rdf> 2 - <http://www-cdr.stanford.edu/~petrie/foaf.rdf>
  14. 14. Three Degrees 9 - <http://sw.deri.org/~knud/knudfoaf.rdf> 8 - <http://www.w3.org/People/Connolly/home-smart.rdf> 7 - <http://jibbering.com/foaf.rdf> 6 - <http://www.aaronsw.com/about.xrdf> 5 - <http://sw.deri.org/~aharth/foaf.rdf>
  15. 15. but that’s not really machine learning
  16. 16. Short
  17. 17. Machine Learning is • How you formulate the problem • How you represent the data
  18. 18. • Graphical Models • Vector Space Models
  19. 19. Back to FOAF Convert RDF triples to vector space
  20. 20. We Want to Find Groups of People
  21. 21. To make predictions on their interests...
  22. 22. (subject) (predicate) (object) Paul knows Jeff Paul knows Joe Paul knows Marco Jeff knows Joe
  23. 23. Vector Space Representation Jeff Joe Marco Paul Jeff 1 1 Joe 1 1 Marco 1 Paul 1 1 1
  24. 24. Latent Factors Analysis • Used in Latent Semantic Indexing (LSI) • Good for finding synonyms • Good for finding “genres”
  25. 25. Latent Factors Methods • Principle Component Analysis (PCA) • Singular Value Decomposition (SVD) • Restricted Boltzmann Machines (RBM)
  26. 26. Considerations for Semantic Web Data • Large Data Sets • Sparse Data Sets
  27. 27. Netflix Prize Research • Movie Review Data set has similar problems • Generalized Hebbian Algorithm for Dimensionality Reduction in NLP (Gorrell ’06.)
  28. 28. Reduce Dimensions • 1m x 1m matrix with 1m people • Reduce to 1m x 100
  29. 29. 100 Latent Factors Represent different groups of people based on who they know.
  30. 30. What the Data Might Look Like Factor 1 Factor 2 Paul 0.678 0.311 Joe 0.455 0.432 Jeff 0.476 0.398 Marco 0.203 0.789
  31. 31. Find Similar People k Nearest Neighbors
  32. 32. Pick a Similarity Metric • Euclidean Distance • Jaccard index • Cosine Similarity
  33. 33. Joe’s Similarity to Paul (Paul (f1) - Joe (f1))^2 + (Paul (f2) - Joe (f2))^2)^1/2
  34. 34. Once We’ve Calculated Similarities • Fill In Missing Interests • Target Ads, Content, Products • ??? • Profit!
  35. 35. Generalizing RDF Triples to Vector Space
  36. 36. • Subjects are Rows • Objects are Columns • Predicates are values
  37. 37. Object 1 Object 2 Subject 1 Predicate Subject 2
  38. 38. Predicates Should be Mutually Exclusive • Paul likes Ruby • Paul hates PHP • Paul loves PHP
  39. 39. Assign Values to Predicates • 1 = Hates • 2 = Dislikes • 3 = Neutral • 4 = Likes • 5 = Loves
  40. 40. More Applications
  41. 41. Supervised Learning • Classifiers • Ontology Mapping • Assigning Instances to Concepts
  42. 42. Ontology Mapping • Examples from Ontology A • Examples from Ontology B
  43. 43. Train Classifiers • One Classifier for each Concept in A • One Classifier for each Concept in B
  44. 44. Classify Instances • Use A Classifiers to predict which concepts B instances map to • Use B Classifiers to predict which concepts A instances map to
  45. 45. Use Classified Instances • Predict Concept Mappings • Which in A match ones in B
  46. 46. Limitations • One Classifier per Concept • Large Ontologies Could be a Problem • Ontologies should be a little similar
  47. 47. Unsupervised Learning • Clustering • Hierarchical Clustering • Learning Ontologies from Text
  48. 48. Machine Learning as Triage • Automatically tag or recommend Examples the algorithm is Certain About • Send uncertain examples to human for review
  49. 49. Thank You Paul Dix paul@pauldix.net http://pauldix.net

×