Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns
As Vish discussed, SearchMonkey is all about building richer, more useful search results. Here’s a few examples Enhanced Results.
And it allows the user to add the movie directly to their online movie rental queue
[will be animated]
[will be animated]
[will be animated]
[will be animated]
[will be animated]
[will be animated]
[will be animated]
Results are good, but consider the ads: First ad says: Virgins. Looking for virgins? Find exactly what you want today. Ebay.com Second ad: Virgins. …Find cheap tickets for Virgins. Third ad: Adspam… these people buy Yahoo! traffic and sell it to Google.
Yahoo Making The Web Searchable - Presentation Transcript
Making the Web Searchable Peter Mika Researcher, Data Architect Yahoo! Research
Yahoo! Research (research.yahoo.com)
Yahoo! Research Barcelona
Established January, 2006
Led by Ricardo Baeza-Yates
Research areas
Web Mining
content, structure, usage
Distributed Web retrieval
Multimedia retrieval
NLP and Semantics
Yahoo! by numbers (April, 2007)
There are approximately 500 million users of Yahoo! branded services, meaning we reach 50 percent – or 1 out of every 2 users – online, the largest audience on the Internet (Yahoo! Internal Data).
Yahoo! is the most visited site online with nearly 4 billion visits and an average of 30 visits per user per month in the U.S. and leads all competitors in audience reach, frequency and engagement (comScore Media Metrix, US, Feb. 2007).
Yahoo! accounts for the largest share of time Americans spend on the Internet with 12 percent (comScore Media Metrix, US, Feb. 2007) and approximately 8 percent of the world’s online time (comScore WorldMetrix, Feb. 2007).
Yahoo! is the #1 home page with 85 million average daily visitors on Yahoo! homepages around the world, an increase of nearly 5 million visitors in a month (comScore WorldMetrix, Feb. 2007).
Yahoo!’s social media properties (Flickr, delicious, Answers, 360, Video, MyBlogLog, Jumpcut and Bix) have 115 million unique visitors worldwide (comScore WorldMetrix, Feb. 2007).
Yahoo! Answers is the largest collection of human knowledge on the Web with more than 90 million unique users and 250 million answers worldwide (Yahoo! Internal Data).
There are more than 450 million photos in Flickr in total and 1 million photos are uploaded daily. 80 percent of the photos are public (Yahoo! Internal Data).
Yahoo! Mail is the #1 Web mail provider in the world with 243 million users (comScore WorldMetrix, Feb. 2007) and nearly 80 million users in the U.S. (comScore Media Metrix, US, Feb. 2007)
Interoperability between Yahoo! Messenger and Windows Live Messenger has formed the largest IM community approaching 350 million user accounts (Yahoo! Internal Data).
Yahoo! Messenger is the most popular in time spent with an average of 50 minutes per user, per day (comScore WorldMetrix, Feb. 2007).
Nearly 1 in 10 Internet users is a member of a Yahoo! Groups (Yahoo! Internal Data).
Yahoo! is one of only 26 companies to be on both the Fortune 500 list and the Fortune’s “Best Place to Work” List (2006).
Agenda
The Annotated Web
SearchMonkey
Demo
Technology
DataRSS format
Query language
Lessons learned
Toward Semantic Search
BOSS
Build your Own Search Service
Y!OS 1.0
Yahoo! Open Strategy
The Annotated Web
Previously in search
Horizontal search
Yahoo…
Keyword-based indexing
Minimal natural language processing
Limited experiments with ontologies (query expansion)
Vertical search
e.g. shopping.com, Kelkoo
Faceted search, browsing
Fixed ontology
Combinations
Google Base, Google Co-op
Web-scale, but fixed ontologies
Proprietary technology
Can we do better with the Semantic Web?
Address the long tail of queries (88% of queries)
Use standard technology
Not a new question. But the answer may be new.
Which Semantic Web?
Two visions
Data Web
Bringing the content of databases to the Web (linkeddata.org)
Rich data, heavyweight semantics
Deep Web
Annotated Web
Annotating the content of Web resources (documents, mm)
<HEAD> <META HTTP-EQUIV="Instance-Key" CONTENT="http://www.cs.umd.edu/~george"> <USE-ONTOLOGY "our-ontology" VERSION="1.0" PREFIX="our" URL="http://ont.org/our-ont.html"> </HEAD> <BODY> <CATEGORY "our.Person"> <RELATION "our.marriedTo" TO="http://www.cs.umd.edu/~helena"> <RELATION "our.employee" FROM="http://www.cs.umd.edu"> My name is <ATTRIBUTE "our.firstName"> George </ATTRIBUTE> <ATTRIBUTE "our.lastName"> Cook </ATTRIBUTE> and I live at...
This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/us/">Creative Commons Attribution 3.0 United States License</a>.
Use of the “rel” attribute for semantic annotation is the birth of the microformat…
Example: microformats <cite class=" vcard "> <a class=" fn url " rel="friend colleague met" href="http://meyerweb.com/"> Eric Meyer </a> </cite> wrote a post ( <cite> <a href="http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/"> Tax Relief </a></cite> ) about an unintentionally humorous letter he received from the <span class=" vcard "> <a class=" fn org url " href="http://irs.gov/"> Internal Revenue Service </a> </span>. <div class=" vcard "> <a class=" email fn " href="mailto:jfriday@host.com"> Joe Friday </a> <div class=" tel "> +1-919-555-7878 </div> <div class=" title "> Area Administrator, Assistant </div> </div>
microformats
microformats.org
Originated by Tantek Celik and others
Agreements on the way to encode certain kinds metadata in HTML
Reuse of semantic-bearing HTML elements
Based on existing standards
Community process
Persons, events , listings etc. but also syntactic metadata: licenses , tags
Microformats have no shared syntax
Each microformat has a separate syntax tailored to the vocabulary
Microformats are not ontologies
No formal descriptions of schema , only text
Limited reuse, extensibility of schemas
No datatypes
No namespaces, unique identifiers (URIs)
no interlinking
mapping between instances is required
Relationship to page context is unclear
Widely used in millions of documents
User-generated as well as automatically generated
Example: tags and machine tags
Example: Tags and machine tags
Tags
User defined keywords
Minimal agreement
Is ‘rock’ on Flickr same as ‘rock’ on myspace?
Is ‘rock’ by me on Flickr is the same as ‘rock’ by you on Flickr?
Is ‘rock’ by me on Flickr today the same as ‘rock’ by me on myspace tomorrow?
Machine tags
User defined values for user defined properties
Possibility to define the namespace (but not enforced)
Limited use
RDF-based annotation #1: eRDF
eRDF
Ian Davis (Talis)
Embedding RDF in HTML
Straightforward mapping to RDF triples (XSLT available)
HTML4 compatible
More complex than microformats
Use any RDF/OWL vocabulary
Reuse of semantic-bearing HTML elements is limited
More limited than RDF
No blank nodes
No data types
No statements about subjects other than the current document
Limited usage
RDF-based annotation #2: RDFa
RDFa
World Wide Web Consortium (W3C) last call document
Similar intent as eRDF, but full RDF support
Requires XHTML
Big question: user complexity ( data quality)
<p typeof="contact:Info" about="http://example.org/staff/jo"> <span property="contact:fn"> Jo Smith </span>. <span property="contact:title"> Web hacker </span> at <a rel="contact:org" href="http://example.org"> Example.org </a>. You can contact me <a rel="contact:email" href="mailto:jo@example.org"> via email </a>. </p> ...
SearchMonkey
Creating an ecosystem of publishers, developers and end-users
Motivating and helping publishers to implement semantic annotation
Providing tools for developers to create compelling applications
Focusing on end-user experience
Rich abstracts as a first application
Addressing the long tail of query and content production
Standard Semantic Web technology
dataRSS = Atom + RDFa
Industry standard vocabularies
http://developer.yahoo.com/searchmonkey/
SearchMonkey
Before After an open platform for using structured data to build more useful and relevant search results What is SearchMonkey?
image deep links name/value pairs or abstract Enhanced Result
YAHOO! CONFIDENTIAL | Infobar
SearchMonkey Acme.com’s database Index RDF/Microformat Markup site owners/publishers share structured data with Yahoo!. 1 consumers customize their search experience with Enhanced Results or Infobars 3 site owners & third-party developers build SearchMonkey apps. 2 DataRSS feed Web Services Page Extraction Acme.com’s Web Pages
Developer tool
Developer tool
Developer tool
Developer tool
Developer tool
Gallery
Example apps
LinkedIn
hCard plus feed data
Creative Commons by Ben Adida
CC in RDFa
Example apps. II.
Other me by Dan Brickley
Google Social Graph API wrapped using a Web Service
0 comments
Post a comment