Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns
In fact, some of these searches are so hard that the users don’t even try them anymore
Results are good, but consider the ads: First ad says: Virgins. Looking for virgins? Find exactly what you want today. Ebay.com Second ad: Virgins. …Find cheap tickets for Virgins. Third ad: Adspam… these people buy Yahoo! traffic and sell it to Google.
SW: Representing and reasoning with structured data on the Web Both a relational and graph view on information IR:: Aggregating information at a document-level based on ad-hoc information needs DB: Representing and querying information in a relational model NLP: from text to information One reference to Semantic Search
Entity-independent measures: M1: probability of fix given type M2: probability of fix given type, normalized by probability of fix (the more uncommon the fix, the better) M3: binary entropy function
[will be animated]
Making the Web Searchable Peter Mika Researcher, Data Architect Yahoo! Research
There are approximately 500 million users of Yahoo! branded services, meaning we reach 50 percent – or 1 out of every 2 users – online, the largest audience on the Internet (Yahoo! Internal Data).
Yahoo! is the most visited site online with nearly 4 billion visits and an average of 30 visits per user per month in the U.S. and leads all competitors in audience reach, frequency and engagement (comScore Media Metrix, US, Feb. 2007).
Yahoo! accounts for the largest share of time Americans spend on the Internet with 12 percent (comScore Media Metrix, US, Feb. 2007) and approximately 8 percent of the world’s online time (comScore WorldMetrix, Feb. 2007).
Yahoo! is the #1 home page with 85 million average daily visitors on Yahoo! homepages around the world, an increase of nearly 5 million visitors in a month (comScore WorldMetrix, Feb. 2007).
Yahoo!’s social media properties (Flickr, delicious, Answers, 360, Video, MyBlogLog, Jumpcut and Bix) have 115 million unique visitors worldwide (comScore WorldMetrix, Feb. 2007).
Yahoo! Answers is the largest collection of human knowledge on the Web with more than 90 million unique users and 250 million answers worldwide (Yahoo! Internal Data).
There are more than 450 million photos in Flickr in total and 1 million photos are uploaded daily. 80 percent of the photos are public (Yahoo! Internal Data).
Yahoo! Mail is the #1 Web mail provider in the world with 243 million users (comScore WorldMetrix, Feb. 2007) and nearly 80 million users in the U.S. (comScore Media Metrix, US, Feb. 2007)
Interoperability between Yahoo! Messenger and Windows Live Messenger has formed the largest IM community approaching 350 million user accounts (Yahoo! Internal Data).
Yahoo! Messenger is the most popular in time spent with an average of 50 minutes per user, per day (comScore WorldMetrix, Feb. 2007).
Nearly 1 in 10 Internet users is a member of a Yahoo! Groups (Yahoo! Internal Data).
Yahoo! is one of only 26 companies to be on both the Fortune 500 list and the Fortune’s “Best Place to Work” List (2006).
Examples: many, including SlideShare, YouTube, LinkedIn, Digg, Myspace, Facebook…
Peter Mika was born in Budapest. Peter Mika was born in Budapest. #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
<HEAD> <META HTTP-EQUIV="Instance-Key" CONTENT="http://www.cs.umd.edu/~george"> <USE-ONTOLOGY "our-ontology" VERSION="1.0" PREFIX="our" URL="http://ont.org/our-ont.html"> </HEAD> <BODY> <CATEGORY "our.Person"> <RELATION "our.marriedTo" TO="http://www.cs.umd.edu/~helena"> <RELATION "our.employee" FROM="http://www.cs.umd.edu"> My name is <ATTRIBUTE "our.firstName"> George </ATTRIBUTE> <ATTRIBUTE "our.lastName"> Cook </ATTRIBUTE> and I live at...
World Wide Web Consortium (W3C) recommendation for encoding RDF triples in HTML
Full RDF support
Recommendation specifies the algorithm for parsing the triples out of HTML
Requires XHTML in principle
In practice, no one cares
RDFa in a slide <p xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/” typeof=”foaf:Person " about="http://example.org/staff/jo" > <span property= " rdfs:label foaf:name "> Jo Smith </span>. <span property= " foaf:title "> Web hacker </span> at <a rel=”vcard:org" property= " foaf:name " href="http://example.org"> Acme Corp </a>. You can contact me <a rel= " foaf:mbox" href="mailto:firstname.lastname@example.org"> via email </a>. </p> ... Assign the prefixes rdfs and foaf to the RDFS and FOAF namespaces (as in XML, RDF/XML etc.) Create a new resource of type foaf:Person Assign a value to a property Give it a URI Link to another resource and assign a name to it
Some predefined vocabularies with central registration
Some of the flexibility of RDFa
Introduce new terms using reverse domain names or full URIs
Semantic HTML elements such as <time>, <video>, <article>…
Microdata example <div item=“http://www.yahoo.com/resource/person”> <p>My name is <span itemprop=" name "> Neil </span>.</p> <p>My band is called <span itemprop=" band "> Four Parts Water </span>. I was born on <time itemprop=" birthday " datetime=" 2009-05-10 ">May 10th 2009</time>. <img itemprop=" image " src=” me.png " alt=”me”> </p> </div
Def. matching the user’s query with the Web’s content at a conceptual level, often with the help of world knowledge
R. Guha, R. McCool: Semantic Search, WWW2003
Semantic Web, IR, Databases, NLP, IE
As a field
ISWC/ESWC/ASWC, WWW, SIGIR
Exploring Semantic Annotations in Information Retrieval (ECIR08, WSDM09)
Semantic Search Workshop (ESWC08, WWW09)
Future of Web Search: Semantic Search (FoWS09)
Semantics at every step of the IR process bla bla bla? q=“bla” * 3 Document processing bla bla bla Ranking Query processing Search interface The IR engine The Web The Semantic Web bla bla bla bla bla bla “ bla” θ (q,d)
But one of those cases where a little semantics can go a long way…
SearchMonkey Acme.com’s database Index RDF/Microformat Markup site owners/publishers share structured data with Yahoo!. 1 consumers customize their search experience with Enhanced Results or Infobars 3 site owners & third-party developers build SearchMonkey apps. 2 DataRSS feed Web Services Page Extraction Acme.com’s Web Pages