Your SlideShare is downloading. ×
Machine Learning Techniques for the Semantic Web
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Machine Learning Techniques for the Semantic Web

3,732
views

Published on

Published in: Technology, Education

1 Comment
10 Likes
Statistics
Notes
  • Nice to find someone looking at bridging machine learning with semweb :)

    I found my way here by searching for ruby + restricted boltzmann, ... hoping to find some nicely packaged RBM implementation that could be fed to the SemWeb community, so that structure implicit in eg dbpedia and social graph data can be explored. Any recommendations? Or maybe it'd be more productive teaching the machine learning folk where to go find RDF linked data themselves?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,732
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
99
Comments
1
Likes
10
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Machine Learning Techniques for the Semantic Web Paul Dix http://pauldix.net paul@pauldix.net
  • 2. Machine Learning
  • 3. Semantic Web
  • 4. What is Semantic Web?
  • 5. Ontology
  • 6. RDF
  • 7. Machine Learning is about Data
  • 8. actually...
  • 9. Making Predictions Based on Data
  • 10. FOAF Simple Example
  • 11. Marco Neumann <http://www.marconeumann.org/foaf.rdf> <http://xmlns.com/foaf/0.1/knows> <http://community.linkeddata.org/dataspace/person/ kidehen2/about.rdf> . <http://www.marconeumann.org/foaf.rdf> <http://xmlns.com/foaf/0.1/knows> <http://www.johnbreslin.com/foaf/foaf.rdf> . <http://www.marconeumann.org/foaf.rdf> <http://xmlns.com/foaf/0.1/knows> <http://swordfish.rdfweb.org/people/libby/rdfweb/ webwho.xrdf> . <http://www.marconeumann.org/foaf.rdf> <http://xmlns.com/foaf/0.1/knows> <http://danbri.org/foaf.rdf> .
  • 12. Marco only knows 4 people?
  • 13. Two Degrees Out 4 - <http://www.w3.org/People/Connolly/home-smart.rdf> 4 - <http://jibbering.com/foaf.rdf> 2 - <http://sw.deri.org/~haller/foaf.rdf> 2 - <http://sw.deri.org/~knud/knudfoaf.rdf> 2 - <http://www-cdr.stanford.edu/~petrie/foaf.rdf>
  • 14. Three Degrees 9 - <http://sw.deri.org/~knud/knudfoaf.rdf> 8 - <http://www.w3.org/People/Connolly/home-smart.rdf> 7 - <http://jibbering.com/foaf.rdf> 6 - <http://www.aaronsw.com/about.xrdf> 5 - <http://sw.deri.org/~aharth/foaf.rdf>
  • 15. but that’s not really machine learning
  • 16. Short
  • 17. Machine Learning is • How you formulate the problem • How you represent the data
  • 18. • Graphical Models • Vector Space Models
  • 19. Back to FOAF Convert RDF triples to vector space
  • 20. We Want to Find Groups of People
  • 21. To make predictions on their interests...
  • 22. (subject) (predicate) (object) Paul knows Jeff Paul knows Joe Paul knows Marco Jeff knows Joe
  • 23. Vector Space Representation Jeff Joe Marco Paul Jeff 1 1 Joe 1 1 Marco 1 Paul 1 1 1
  • 24. Latent Factors Analysis • Used in Latent Semantic Indexing (LSI) • Good for finding synonyms • Good for finding “genres”
  • 25. Latent Factors Methods • Principle Component Analysis (PCA) • Singular Value Decomposition (SVD) • Restricted Boltzmann Machines (RBM)
  • 26. Considerations for Semantic Web Data • Large Data Sets • Sparse Data Sets
  • 27. Netflix Prize Research • Movie Review Data set has similar problems • Generalized Hebbian Algorithm for Dimensionality Reduction in NLP (Gorrell ’06.)
  • 28. Reduce Dimensions • 1m x 1m matrix with 1m people • Reduce to 1m x 100
  • 29. 100 Latent Factors Represent different groups of people based on who they know.
  • 30. What the Data Might Look Like Factor 1 Factor 2 Paul 0.678 0.311 Joe 0.455 0.432 Jeff 0.476 0.398 Marco 0.203 0.789
  • 31. Find Similar People k Nearest Neighbors
  • 32. Pick a Similarity Metric • Euclidean Distance • Jaccard index • Cosine Similarity
  • 33. Joe’s Similarity to Paul (Paul (f1) - Joe (f1))^2 + (Paul (f2) - Joe (f2))^2)^1/2
  • 34. Once We’ve Calculated Similarities • Fill In Missing Interests • Target Ads, Content, Products • ??? • Profit!
  • 35. Generalizing RDF Triples to Vector Space
  • 36. • Subjects are Rows • Objects are Columns • Predicates are values
  • 37. Object 1 Object 2 Subject 1 Predicate Subject 2
  • 38. Predicates Should be Mutually Exclusive • Paul likes Ruby • Paul hates PHP • Paul loves PHP
  • 39. Assign Values to Predicates • 1 = Hates • 2 = Dislikes • 3 = Neutral • 4 = Likes • 5 = Loves
  • 40. More Applications
  • 41. Supervised Learning • Classifiers • Ontology Mapping • Assigning Instances to Concepts
  • 42. Ontology Mapping • Examples from Ontology A • Examples from Ontology B
  • 43. Train Classifiers • One Classifier for each Concept in A • One Classifier for each Concept in B
  • 44. Classify Instances • Use A Classifiers to predict which concepts B instances map to • Use B Classifiers to predict which concepts A instances map to
  • 45. Use Classified Instances • Predict Concept Mappings • Which in A match ones in B
  • 46. Limitations • One Classifier per Concept • Large Ontologies Could be a Problem • Ontologies should be a little similar
  • 47. Unsupervised Learning • Clustering • Hierarchical Clustering • Learning Ontologies from Text
  • 48. Machine Learning as Triage • Automatically tag or recommend Examples the algorithm is Certain About • Send uncertain examples to human for review
  • 49. Thank You Paul Dix paul@pauldix.net http://pauldix.net