It's not what you said,
             it's how you said it.
                         Jamie Taylor, Ph.D.




  Text Analytic Summit
      Boston 2010
What do y'all mean
  "Semantics"



                  The Web!
                  Now with
                 Better Flavor!
Tim Berners-Lee, James Hendler
           and Ora Lassila   




May 2001
The Semantic Web?




   The Cake
      taken from http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/layerCake-4.png
Linked Open Data
The Real Web




               http://en.wikipedia.org/wiki/File:Internet_map_1024.jpg
Wish it were real
Might be real
Is real, but don't believe it
Is currently useful
Entities
Identifiers        Side Step Polysemy




       Bono, A.K.A. Paul David Hewson
http://rdf.freebase.com/ns/en.paul_david_hewson
Vocabulary

                  Manufactures




http://rdf.freebase.com/ns/automotive.make.model_s
A socially managed semantic database
Freebase has Many Types of Things
Many Strong Identifiers
            http://rdf.freebase.com/ns/en.berlin_wall




            http://www.ellerdale.com/topics/view/0080-6ba0




            http://www.bbc.co.uk/music/artists/7f347782-eb14-40c3-98e2-17b6e1bfe56c

                   http://musicbrainz.org/artist/7f347782-eb14-40c3-98e2-17b6e1bfe56c

http://rdf.freebase.com/ns/authority.musicbrainz.7f347782-eb14-40c3-98e2-17b6e1bfe56c
12 Million Entites
350 Million Relations
Users contribute data




Users extend the data model
schema = vocabulary
1500 types with 500+ instances!!




A range of of vocabularies....
Growing Freebase
Reconciliation



   +=
Reconciliation

Relational Learning
            Record Matching
Collective Entity Resolution
                 Equivalence Mining
 Record Linking
                Identity Matching
Reconciliation
                              "Excuse Me"
"Excuse Me"
                                   "Harrison Ford"
          "Harrison Ford"




     "Vanity Fair"
                            "Maytime"
Reconciliation
                            "Fugitive"
"Excuse Me"
                                "Harrison Ford"
          "Harrison Ford"




     "Vanity Fair"
                                "Blade Runner"
A Graph of Entities
Vocabulary
contains

            located
                           performed-at               released-by
                                          created


                        plays-in
                                           plays-in

       nationality

                      education
                                          education

                        located
Reconciliation as "understanding"
   contains

               located
                              performed-at               released-by
                                             created


                           plays-in
                                              plays-in

          nationality

                         education
                                             education

                           located
{
    "/type/object/name":"Blade Runner",
    "/type/object/type":"/film/film",
    "/film/film/starring/actor":["Harrison Ford", "Rutger Hauer"],
    "/film/film/director":"Ridley Scott",
    "/film/film/release_date_s":"1981"
}                                   [{
                                       "id":"/guid/9202a8c04000641f8000000000009e89",
                                       "name":["Blade Runner", "Bladerunner"],
                                       "score":1.4320519,
                                       "match":true,
                                       "type":["/common/topic", "/film/film","/media_common/adapted_work", "/award/
                                    award_winning_work",
                                         ......
                                       ]},
                                     {
                                       "id":"/guid/9202a8c04000641f80000000002643d0",
                                       "name":["Blade"],
                                       "score":0.48852453,
                                       "match":false,
                                       "type":["/common/topic", "/film/film", "/award/award_winning_work", "/award/
                                    award_nominated_work",
                                        .......
                                       ]},
                                     {
                                       "id":"/guid/9202a8c04000641f800000000e5daaae",
                                       "name":["Blade"],
                                       "score":0.46398318,
                                       "match":false,
                                        .....


         http://data.labs.freebase.com/recon/
Data Everywhere
Wikipedia Features
Wikipedia Features



    X


X

    Error Prone -- Usually <99%
(Machine) Learning Semantics
                              get             5M type
                             types
                                             assertions
2.8M Wikipedia topics




                        intersect the two         calculate feature   join feature counts    generate type
                             sources               counts per type         with topics      scores for topics


                                                  2.4M features
                                                                                            1.6G scores
                                                   1400 types
                             extract
                            features


                                            37M features
     5M articles

                             WEX
/people/person distribution
                             untyped topics
                             person topics
                             other topics
                             all topics




                  Data courtesy Viral Shah
RABJ: Humans in the loop
Thresholding Results

          99% threshold at 16.75
/people/person assertions

                threshold




                        53K /people/person
                            assertions
Training Wheels?
Semantics are Everywhere
A Strong Tag for Food Inc.
   http://movi.es/BVl43
Widgets: Content Tags
Explicit Semantics
Rich Snippets
<div class="post-item restaurant-gen-info hreview-aggregate">
 <div class="item vcard">
  <h1 class="fn org">Taylor's Refresher</h1>
  <div class="address">
   <div class="ratings">
     <ul class="star-rating-2 rating" title="4.0 star rating across 3 ratings">
      <li class="current-rating average" style="width:80%;">4.0 star rating</li>
      <li class="star">&nbsp;</li>
      <li class="star">&nbsp;</li><li class="star">&nbsp;</li>
      <li class="star">&nbsp;</li>
      <li class="star">&nbsp;</li>
     </ul>
     <div class="rating-stats">
     <span class="rating">
       <span class="average">4.0</span>
     </span> rating over
     <span class="count">1</span> review
    </div>
RDFa

       microformats


  HTML5 MicroData


Open Graph Protocol
Explicit Semantics in
 Surprising Places
Blog Tags::Entities
Metaweb Topic Block
Widget Microdata


<div class="fb-widget"
id="fbtb-9a1f44348ad145b5b7d7d7d2376b0420"
style="border:0; outline:0; padding:0; margin:0;
position:relative;" itemscope="" itemid="http://
www.freebase.com/id/en/taylor_swift"
itemtype="http://www.freebase.com/id/music/
artist"> ..... </div>
Thickening the Graph
"Vocabulary" Pattern
             taw    shooter      marksman




              marble   marksman

http://wordnet.freebaseapps.com
                          photo: http://sarabbit.openphoto.net
Review (neighborhood) Pattern
                           Eric Schlosser


                     E. Coli


                          Michael Pollan

                                   Robert Kenner
Text Analytic Summit 2010
Text Analytic Summit 2010

Text Analytic Summit 2010