Your SlideShare is downloading. ×
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Text Analytic Summit 2010

1,191

Published on

With over 12 million entities and 350 million relationships, Freebase is an excellent resource for performing text analysis. One way to look at document "understanding" is to think about how the …

With over 12 million entities and 350 million relationships, Freebase is an excellent resource for performing text analysis. One way to look at document "understanding" is to think about how the entities in the document are connected on a knowledge graph. This is similar to the "reconciliation" process that is used to grow Freebase itself.

The web is currently full of semantic hints, whether they are explicit (like those promoted by the Semantic Web) or implicit (like the use of blog widgets.) Using these hints, text analytic methods can get a toe-hold on the web corpus at large.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,191
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
44
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. It's not what you said, it's how you said it. Jamie Taylor, Ph.D. Text Analytic Summit Boston 2010
  • 2. What do y'all mean "Semantics" The Web! Now with Better Flavor!
  • 3. Tim Berners-Lee, James Hendler and Ora Lassila    May 2001
  • 4. The Semantic Web? The Cake taken from http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/layerCake-4.png
  • 5. Linked Open Data
  • 6. The Real Web http://en.wikipedia.org/wiki/File:Internet_map_1024.jpg
  • 7. Wish it were real
  • 8. Might be real
  • 9. Is real, but don't believe it
  • 10. Is currently useful
  • 11. Entities
  • 12. Identifiers Side Step Polysemy Bono, A.K.A. Paul David Hewson http://rdf.freebase.com/ns/en.paul_david_hewson
  • 13. Vocabulary Manufactures http://rdf.freebase.com/ns/automotive.make.model_s
  • 14. A socially managed semantic database
  • 15. Freebase has Many Types of Things
  • 16. Many Strong Identifiers http://rdf.freebase.com/ns/en.berlin_wall http://www.ellerdale.com/topics/view/0080-6ba0 http://www.bbc.co.uk/music/artists/7f347782-eb14-40c3-98e2-17b6e1bfe56c http://musicbrainz.org/artist/7f347782-eb14-40c3-98e2-17b6e1bfe56c http://rdf.freebase.com/ns/authority.musicbrainz.7f347782-eb14-40c3-98e2-17b6e1bfe56c
  • 17. 12 Million Entites 350 Million Relations
  • 18. Users contribute data Users extend the data model
  • 19. schema = vocabulary
  • 20. 1500 types with 500+ instances!! A range of of vocabularies....
  • 21. Growing Freebase
  • 22. Reconciliation +=
  • 23. Reconciliation Relational Learning Record Matching Collective Entity Resolution Equivalence Mining Record Linking Identity Matching
  • 24. Reconciliation "Excuse Me" "Excuse Me" "Harrison Ford" "Harrison Ford" "Vanity Fair" "Maytime"
  • 25. Reconciliation "Fugitive" "Excuse Me" "Harrison Ford" "Harrison Ford" "Vanity Fair" "Blade Runner"
  • 26. A Graph of Entities
  • 27. Vocabulary contains located performed-at released-by created plays-in plays-in nationality education education located
  • 28. Reconciliation as "understanding" contains located performed-at released-by created plays-in plays-in nationality education education located
  • 29. { "/type/object/name":"Blade Runner", "/type/object/type":"/film/film", "/film/film/starring/actor":["Harrison Ford", "Rutger Hauer"], "/film/film/director":"Ridley Scott", "/film/film/release_date_s":"1981" } [{ "id":"/guid/9202a8c04000641f8000000000009e89", "name":["Blade Runner", "Bladerunner"], "score":1.4320519, "match":true, "type":["/common/topic", "/film/film","/media_common/adapted_work", "/award/ award_winning_work", ...... ]}, { "id":"/guid/9202a8c04000641f80000000002643d0", "name":["Blade"], "score":0.48852453, "match":false, "type":["/common/topic", "/film/film", "/award/award_winning_work", "/award/ award_nominated_work", ....... ]}, { "id":"/guid/9202a8c04000641f800000000e5daaae", "name":["Blade"], "score":0.46398318, "match":false, ..... http://data.labs.freebase.com/recon/
  • 30. Data Everywhere
  • 31. Wikipedia Features
  • 32. Wikipedia Features X X Error Prone -- Usually <99%
  • 33. (Machine) Learning Semantics get 5M type types assertions 2.8M Wikipedia topics intersect the two calculate feature join feature counts generate type sources counts per type with topics scores for topics 2.4M features 1.6G scores 1400 types extract features 37M features 5M articles WEX
  • 34. /people/person distribution untyped topics person topics other topics all topics Data courtesy Viral Shah
  • 35. RABJ: Humans in the loop
  • 36. Thresholding Results 99% threshold at 16.75
  • 37. /people/person assertions threshold 53K /people/person assertions
  • 38. Training Wheels? Semantics are Everywhere
  • 39. A Strong Tag for Food Inc. http://movi.es/BVl43
  • 40. Widgets: Content Tags
  • 41. Explicit Semantics
  • 42. Rich Snippets <div class="post-item restaurant-gen-info hreview-aggregate"> <div class="item vcard"> <h1 class="fn org">Taylor's Refresher</h1> <div class="address"> <div class="ratings"> <ul class="star-rating-2 rating" title="4.0 star rating across 3 ratings"> <li class="current-rating average" style="width:80%;">4.0 star rating</li> <li class="star">&nbsp;</li> <li class="star">&nbsp;</li><li class="star">&nbsp;</li> <li class="star">&nbsp;</li> <li class="star">&nbsp;</li> </ul> <div class="rating-stats"> <span class="rating"> <span class="average">4.0</span> </span> rating over <span class="count">1</span> review </div>
  • 43. RDFa microformats HTML5 MicroData Open Graph Protocol
  • 44. Explicit Semantics in Surprising Places
  • 45. Blog Tags::Entities
  • 46. Metaweb Topic Block
  • 47. Widget Microdata <div class="fb-widget" id="fbtb-9a1f44348ad145b5b7d7d7d2376b0420" style="border:0; outline:0; padding:0; margin:0; position:relative;" itemscope="" itemid="http:// www.freebase.com/id/en/taylor_swift" itemtype="http://www.freebase.com/id/music/ artist"> ..... </div>
  • 48. Thickening the Graph
  • 49. "Vocabulary" Pattern taw shooter marksman marble marksman http://wordnet.freebaseapps.com photo: http://sarabbit.openphoto.net
  • 50. Review (neighborhood) Pattern Eric Schlosser E. Coli Michael Pollan Robert Kenner

×