SlideShare a Scribd company logo
1 of 69
Download to read offline
Making the Web Searchable
                                   Peter Mika
          Senior Researcher and Data Architect
                                   Yahoo! Inc.
Agenda

  •  Web Directions
     –  Convergence of Search and Online Media
  •  Semantic technologies (th)at work
     –  Semantics for search
         •  RDFa, microdata
     –  Semantics for data integration
         •  RDF, OWL, SPARQL
  •  Take home: use what works!




                                  -2-
More than just ten blue links
It used to be pretty simple…




                               -4-
Yahoo! today is a global network of online media sites




                             -5-
... with search as an important entry point to content




         Points of                       Faceted
         interest in            Information
                                         search for   Information box with
         Vienna,                from the Shopping     content from and
         Austria                Knowledgeresults      links to Yahoo!
                                Graph                 Travel
                                            Since Aug,
                                            2010, regular
                                            search results
                                            are Powered
                                            by Bing




                              -6-
Conversely, online media as an entry point to search




                                    Hovering over an underlined
                                    phrase triggers a search for
                                    related news items.




                            -7-
Aggregation across space: hyperlocal pages




                          Hyperlocal: showing content
                          from across Yahoo that is
                          relevant to a particular
                          neighbourhood.




                            -8-
Aggregation across entity types: special events




                            -9-
Personalization

                  Yahoo s Content Optimization
                  Relevance Engine (CORE)
                  technology uses machine
                  learning to predict click behavior
                  based on user profile
                                                       Display advertizing is also
                                                       personalized by default.
                                                       Users can opt-out of behavioral
                                                       targeting through AdChoices.




                           - 10 -
Show related content
Contextualization




                                Social discovery: connect
                                with friends watching the
                                same




                    - 11 -
Convergence of search and online media

   •  Complex answers in search
      –  Using structured data, not just text
      –  Search over owned content and the best of the Web
   •  Aggregation
      –  Content aggregation around events, persons, other entities
      –  From creating topic pages to creating entire new websites
   •  Personalization and contextualization
      –  Understand user interests at a fine grained level
      –  Build and carry user profiles across search and media


   •  Common to these is a need for a more advanced
      understanding of the Web and our content

                                     - 12 -
Semantic technologies for Search
Search is really fast, without necessarily being intelligent




                              - 14 -
State of Search

   •  Improvements in search are harder and harder to come by
      –  Machine learning using hundreds of signals
          •  From text to the web graph
      –  Heavy investment in computational power
          •  e.g. real-time indexing and instant search
   •  Remaining challenges are not computational, but in
      modeling human understanding
      –  A machine is intelligent if it reasons and acts the way we would
      –  But could Watson explain why the answer is Toronto?
   •  How do we teach the computer about our world?
      –  How do we give meaning to documents and data?



                                      - 15 -
Not just search…




                   - 16 -
What it s like to be a machine?




              Roi Blanco




                             - 17 -
What it s like to be a machine?
                      ë✜Θ♬♬ţğ√∞ñ§®ÇĤĪ✜★¤♬☐✓✓

                      ţğ★¤✜èééééñ


                      u✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫¤Γ
                      ≠=⅚©§★✓♪ΒΓΕññ¤℠

                      ¢✖Γ♫⅜ ⏎↵⏏v☐ģğğğμλκσςτn
                      nnnu⏎ñ⌥°¶§ΥΦΦΦ✗✕☐vuwwwww




              ë✜Θ♬♬ţğ




                              - 18 -
If machines are dumb, how to make their job easier?

   •  HTML is intended for human consumption
       –  A mix of text, data and styling
   •  Let s make it easier to process for machines
       –  Languages to publish data in HTML
   •  Agree between publishers and search engines on the meaning of
      certain symbols (ontologies)
           •  e.g. ⏎⅙¥ means that this page describes a Person
       –  Annotate HTML pages using these symbols
       –  (This is just an example… the actual markup is human readable)
   •  For data in particular, agree on what the types of objects are in the
      world, and what their attributes are
       –  e.g. between §℗ and §⌥⌘ is the age of the Person
   •  Leverage this understanding for more precise matching and
      ranking


                                     - 19 -
Semantic Web

  •  Publish information in a way that is easier to process for machines
  •  Web of Data instead of Web of Documents
  •  Two main architectural challenges
      –  A common format for sharing data
      –  Sharing the meaning of data
          •  Through social means (shared schemas)
          •  By using powerful schema languages
  •  Semantic Web standards from W3C
      –  Languages (RDF, OWL, RIF)
      –  Serializations (RDF/XML, RDFa)
      –  Protocols (SPARQL, HTTP)
  •  Semantic Web research into knowledge representation and
     reasoning, data integration, data quality and many other topics
  •  Community efforts to publish data and develop schemas
                                       - 20 -
Resource Description Framework (RDF)

  •  Each resource (thing, entity) is identified by a URI
     –  Globally unique identifiers
  •  RDF represents knowledge as a set of triples
     –  Each triple is a single fact about the entity (an attribute or a
        relationship)
  •  A set of triples forms an RDF graph

     RDF document
                                type           foaf:Person

                example:roi        name

                                               Roi Blanco
                                      - 21 -
Linking across the Web
 Roi s homepage                                  Friend-of-a-Friend ontology

                                 type
    example:roi                                        foaf:Person
                      name


                             Roi Blanco                        knows
           sameAs



 Yahoo! s website
                                                type

                    worksWith
   #roi2                        #peter

                                              email

                                                  pmika@yahoo-inc.com
                                     - 22 -
History of metadata in HTML

   •  1995: HTML meta tags
   •  1998: RDF/XML
      –  RDF/XML in HTML
      –  RDF linked from HTML
   •  2003: Web 2.0
      –  Tagging, machine tags
      –  Microformats
   •  2005: eRDF
   •  2008: RDFa 1.0
   •  2011: RDFa 1.1,
            Microdata



                                 - 23 -
HTML meta tags

<HTML>
<HEAD profile="http://dublincore.org/documents/dcq-html/">
<META name="DC.author" content="Peter Mika">
<LINK rel="DC.rights copyright" href="http://
  www.example.org/rights.html" />
<LINK rel="meta" type="application/rdf+xml" title="FOAF"
     href= "http://www.cs.vu.nl/~pmika/foaf.rdf">
</HEAD>
…
</HTML>




                            - 24 -
Microformats (µf)

   •  Agreements on the way to encode describe certain objects
      in HTML (persons, events, recipes…)
      –  Reuse of semantic-bearing HTML elements, e.g. class
      –  Based on existing standards, e.g. hCard
      –  Minimal: small number of types, most common attributes


   •  Community centered around microformats.org
      –  Centralized process, but not a formal standards body
      –  Wiki for specifications, mailing list




                                      - 25 -
Example: the hCard microformat

 <div class="vcard">
  <a class="email fn" href="mailto:jfriday@host.com">Joe Friday</a>
  <div class="tel">+1-919-555-7878</div>
  <div class="title">Area Administrator, Assistant</div>
 </div>


 <cite class="vcard">
 <a class="fn url" rel="friend colleague met href="http://meyerweb.com/">
 Eric Meyer</a> </cite> wrote a post (<cite>
 <a href="http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/">
 Tax Relief</a></cite>) about an unintentionally humorous letter he received from
 the <span class="vcard > <a class="fn org url" href="http://irs.gov/">
 Internal Revenue Service</a>
 </span>.




                                       - 26 -
Microformats: limitations

   •  Syntax shared with HTML
       –  You need to implement extraction for each microformat separately
   •  Lack of formal schemas
       –  Limited reuse, extensibility of schemas
       –  Unclear which combinations are allowed
   •  Lack of a datatype system
   •  No unique identifiers (URIs)
       –  No linking, e.g. sameAs
   •  Always appears in the HTML <body>
       –  Not always clear how it relates to the main topic of the page
   •  Instability
           •  Everything is a draft…
           •  Varying degrees of support

                                           - 27 -
RDFa
  •  W3C recommendation for embedding RDF data in HTML
      –  A set of new HTML attributes to be used in head or body
      –  A specification of how to extract the data from these attributes
      –  RDFa is just a syntax, you have to choose (or create) a vocabulary
         separately
  •  Addresses the limitations of microformats
      –  Syntax different from HTML
      –  Semantic Web schema languages (reuse, extend schemas)
      –  Unique identifiers for objects (interlinking, sameAs)
      –  Markup in head or body
  •  Alternative to publishing data as RDF/XML (Linked Data)
      –  Search engine friendly
  •  See also
      –  http://rdfa.info/



                                     - 28 -
RDFa evolution

  •  RDFa 1.0 is a W3C Recommendation since October, 2008
  •  RDFa 1.1 is a small update on RDFa to reduce complexity, make it
     compatible with HTML5
       –  Recommendation (June 7, 2012)
       –  Updated version of the RDFa Primer (June 7, 2012)
       –  HTML+RDFa Working Draft (Sept 11, 2012)
  •  New in RDFa 1.1
       –  New vocab attribute to define the default namespace for the
          document or subtree
       –  The prefix attribute as a recommended replacement of xmlns
       –  You can use URIs even where only CURIEs were allowed before
  •  RDFa API for accessing RDFa data in a webpage in the browser
     from JavaScript
       –  Currently Working Draft (April 19, 2011)


                                  - 29 -
RDFa intro: metadata in the header




<html•  More info in the
       prefix="og: http://ogp.me/ns#">
<head>
 <title>The Trouble with Bob</title>
 <meta property="og:title" content="The Trouble with Bob" />
 <meta property="og:type" content="text" />
 <meta property="og:image" content="http://example.com/alice/bob-ugly.jpg" />
 ...
</head>




                                      - 30 -
RDFa intro: links with a flavor




     •  More info in the
All content on this site is licensed under <a rel="license"
href="http://creativecommons.org/licenses/by/3.0/"> a Creative
Commons License </a>.




                               - 31 -
RDFa links: talking about subjects other than the page




The trouble with Bob is that he takes much better photos than me:
    •  More info in the
<div about="http://example.com/bob/photos/sunset.jpg">
<img src="http://example.com/bob/photos/sunset.jpg" />
<span property="og:title">Beautiful Sunset</span> by <span
property="dc:creator">Bob</span>.
</div>




                               - 32 -
RDFa links: talking about subjects other than the page



<div typeof= foaf:Person">
 <p property= foaf:name"> Alice Birpemswick </p>
 <p> Email:
 <a rel=More info in the
     •  foaf:mbox href="mailto:alice@example.com">
  alice@example.com
 </a>
 </p>
 <p> Phone:
   <a rel= foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a>
 </p>
</div>




                                    - 33 -
The process of annotating with RDFa

   •    Find a vocabulary that fits your needs and supported by your consumers
         –  A vocabulary describes a set of types and attributes within a given domain
         –  If you don t find a good candidate, extend an existing one or create a new one
   •    Annotate your page
         –  Before you start, you might want to validate your page for (X)HTML
            conformance using the W3C s (X)HTML Validator to reduce the chance of
            errors. Choose Document Type XHTML + RDFa.
         –  Use an HTML or XML editor that supports DTDs, or an RDFa editor such as
            RDFaCE
         –  Use the RDFa Distiller to validate which data can be extracted from your page.
         –  If you fancy, use the RDF Validator to graphically visualize the RDF graph that
            is outputted.
   •    Put the annotated page online
         –  The data will be extracted by your favorite search engine the next time your
            page is crawled and indexed
         –  The data will be available to browser extensions, bookmarklets etc.
   •    See http://rdfa.info/rdfa-implementations for new tools and APIs
                                              - 34 -
Example: Yahoo! Enhanced Results (was: SearchMonkey)
  •  First major adopter of
     RDFa
     –  Launched in May, 2008
  •  Guide for publishers to
     mark-up their pages for
     common types of objects
     –  Product, Local, News,
        Video, Events, Documents,
        Discussion, Games
  •  Using popular microformats
     and RDF vocabularies
     –  Copy-paste code
     –  Validator
  •  Yahoo as a consumer
     –  Enhanced Results


                                    - 35 -
Example: Google s Rich Snippets

  •  Launched in May, 2009
  •  Google encourages publishers to use popular microformats
     and its own RDFa vocabulary
     –  data-vocabulary.org
  •  Validator to check if the markup is correct
  •  Google displays enhanced results based on this metadata
     –  Rich Snippets




                                 - 36 -
Example: Facebook s Like and the Open Graph Protocol

  •  Launched April, 2010
  •  The Like button provides publishers with a way to promote
     their content on Facebook and build communities
     –  Shows up in profiles and news feed
     –  Site owners can later reach users who have liked an object
     –  Facebook Graph API allows 3rd party developers to access the
        data
  •  Open Graph Protocol is an RDFa-based format that allows
     to describe the object that the user Likes




                                  - 37 -
Example: Facebook s Open Graph Protocol

  •  RDF vocabulary to be used in conjunction with RDFa
      –  Simplify the work of developers by restricting the freedom in RDFa
  •  Activities, Businesses, Groups, Organizations, People, Places,
     Products and Entertainment
  •  Only HTML <head> accepted


 <html xmlns:og="http://opengraphprotocol.org/schema/">
 <head>
    <title>The Rock (1996)</title>
    <meta property="og:title" content="The Rock" />
    <meta property="og:type" content="movie" />
    <meta property="og:url" content="http://www.imdb.com/title/
    tt0117500/" />
    <meta property="og:image" content="http://ia.media-imdb.com/
    images/rock.jpg" /> …
 </head> ...
                                      - 38 -
Example: rNews

  •  RDFa vocabulary for
     news articles
     –  Easier to implement than
        NewsML
     –  Easier to consume for
        news search and other
        readers, aggregators
  •  Under development at
     the IPTC
     –  Version 0.5




                                   - 39 -
Microdata

    •  Developed by the HTML5 working group at the W3C
        –  RDFa was perceived as too complex and thus error prone
    •  Currently a companion document to HTML5 (working draft)
    •  Incompatible with RDFa

<div itemscope itemid= http://www.yahoo.com/resource/person >
<p>My name is <span itemprop="name">Neil</span>.</p>
<p>My band is called
<span itemprop="band">Four Parts Water</span>.
I was born on
<time itemprop="birthday" datetime="2009-05-10">May 10th 2009</time>.
<img itemprop="image" src= me.png" alt= me >
</p>
</div


                                   - 40 -
Competing formats, competing schemas

  •  Multiple incompatible formats: microformats, RDFa, microdata
     –  Varying degrees of adoption
     –  Not all formats are supported by all search engines
  •  Multiple competing schemas (ontologies)
     –  Different schemas for marking up the same information (RDFa
        and microdata)
         •  Major search engines support different existing alternatives or create
            their own (Google, Facebook)
     –  Not clear which schemas have adoption, who is responsible for
        maintaining them
     –  Slow convergence




                                      - 41 -
schema.org

  •  Agreement on a shared set of schemas for common types of
     web content
     –  Bing, Google, and Yahoo! as initial founders (June, 2011)
     –  Similar in intent to sitemaps.org
         •  Use a single format to communicate the same information to all
            three search engines
  •  schema.org covers areas of interest to all search engines
     –  Business listings (local), creative works (video), recipes,
        reviews




                                     - 42 -
schema.org evolution

   •    Yandex joins schema.org in Nov, 2011
         –  Yandex.Slovari, Yandex.Spravochnik, Yandex.Kartinki, Yandex.Video
   •    RDFa Lite 1.1
         –  Subset of the features of RDFa 1.1
         –  W3C Recommendation since June, 2012
   •    Two W3C task forces within the SW Interest Group (SWIG)
         –  Web schemas TF for ongoing collaborations on schema extensions, mappings, tooling etc.
              •    schema.org discussions are at public-vocabs@w3.org
         –  HTML Data TF finished in December, 2011
              •  HTML Data Guide
              •  Microdata RDF: Transformation from HTML+Microdata to RDF
   •    Growing number of 3rd party contributions
         –  rNews (news)
         –  GoodRelations (e-commerce)
         –  Health and Life Sciences
         –  Technical Publishing


                                                      - 43 -
Documentation and OWL ontology




                         - 44 -
Current state of semantic search

   •  Limited usage in commercial search engines
      –  Enhanced results
      –  Faceted search
          •  Google s Recipe Search
      –  Navigation to related entities
          •  Yahoo s Vertical Intent Search
   •  Positive SEO effects
      –  Enhanced results are clicked more
      –  Enhanced results help users find relevant results
   •  Increased adoption of data markup




                                          - 45 -
Semantic Search development

  •  Research
     –  RDF indexing and ranking
     –  Searching over annotated web pages
     –  Search result summarization
     –  Question answering
     –  Task completion
     –  Semantic log analysis
  •  Prototype pure RDF search engines
     –  Sindice and Sig.ma from DERI




                                   - 46 -
Current state of metadata on the Web

   •  31% of webpages, 5% of domains contain some
      metadata
       –  Analysis of the Bing Crawl (US crawl, January, 2012)
       –  RDFa is most common format
   •  By URL: 25% RDFa, 7% microdata, 9% microformat
   •  By eTLD (PLD): 4% RDFa, 0.3% microdata, 5.4% microformat
       –  Adoption is stronger among large publishers
   •  Especially for RDFa and microdata
   •  See also
       –  P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus,
          LDOW 2012
       –  H.Mühleisen, C.Bizer.
          Web Data Commons - Extracting Structured Data from Two Large
          Web Corpora, LDOW 2012


                                          - 47 -
Exponential growth in RDFa data

                     Another five-fold increase
                     between October 2010 and
                     January, 2012



                  Five-fold increase between
                  March, 2009 and October,
                  2010




     Percentage of URLs with embedded metadata in various formats
                                    - 48 -
Semantic technologies for Data Integration
Today s world is a Web of Pages




                           - 50 -
All these pages come from structured knowledge about
people, places, and things




                                           MLB team

                                                                  10% off tickets
                                                    Is a
                                                           for


                                                Chicago Cubs

                                    plays for                    plays in

                                                               Chicago
                           Carlos Zambrano
                                                                            from

                                                                       Barack Obama




                           - 51 -
This underlying world is WOO—the Web of Objects




                                          MLB team

                                                                 10% off tickets
                                                   Is a
                                                          for


                                               Chicago Cubs

                                   plays for                    plays in

                                                              Chicago
                          Carlos Zambrano
                                                                           from

                                                                      Barack Obama




                          - 52 -
Today our knowledge of this world is siloed, incomplete,
inconsistent, inaccurate, and hard to reuse




                                                       MLB team
                                                                            10% off tickets




                                       Entertainment
                                                             isa
                                                                   for




                                                                                                   Upcoming
                                                                                 Shopping
                                                         Finance
                            Sports




                                                                   Local
                                                        Chicago Cubs
                                      plays for                            plays in

                                                                         Chicago
                            Carlos Zambrano
                                                                                            from

                                                                                        Scott Roy




                             - 53 -
Our vision is a single shared knowledge base—accurate,
scalable, and easy to reuse




                                                     MLB team
                                                                        10% off tickets
                                                         isa
                                                                for



                                                      Chicago Cubs
                                                                       plays in
                                         plays for

                                                                      Chicago
                                    Carlos Zambrano                               from



                                                                              Barack Obama




                           - 54 -
Knowledge comes from many sources
                                 Show times and other information for
                                 US movies from source B




                                          Show times
             Attributes




                          Show times for Harry Potter and
                          the Deathly Hallows part II




                            Harry Potter and the Deathly
                            Hallows part II




                              Entities

                              - 55 -
Combining these requires working with complementary,
parallel, and overlapping sources




    Cast information for
    US movies from
    source A                                       Cast and show time
                           Attributes

                                                   information for global movies
                                                   from licensed feeds




                                                                     Cast information for
                                                                     global movies from
                                                                     Wikipedia

                                        Entities



                                        - 56 -
There is a tremendous opportunity to do this directly
from Web pages, reverse engineering the Web




            Attributes



                                             Information from
                                             structured data
                                             extraction on billions of
                                             Web pages




                          Entities



                          - 57 -
Semantic technologies for data integration

   •  Semantic Web provides the basic technologies for Linked
      Data
      –  URIs as unique identifiers
          •  Retrieve data from the (internal) web
          •  Follow links in the data that is returned
      –  RDF as a common data format
      –  OWL as a powerful schema language for validation and
         reasoning
      –  SPARQL for queries, reasoning and transformations




                                        - 58 -
Components

  •    Data is ingested from web extraction, feeds, editorial content (billions of
       objects)
  •    Data integration using Hadoop clusters
        –  Schema matching to the WOO ontology
        –  Object reconciliation
        –  Blending
  •    Data quality assessment
  •    Information extraction
        –  Text, e.g. news content
        –  Webpages
  •    Enrichment
        –  Feature computation based on user behavior, social signals and web content
  •    Serving and ranking
        –  Selecting the right objects to show by query, user, geography etc.


                                             - 59 -
WOO ontology

  •  Primary use case is data validation
     –  During information extraction and throughout the WOO
        platform
     –  No reasoning
  •  OWL2 ontology
     –  Automatic documentation
     –  Change management
     –  Conversion to Yahoo internal schema language
     –  Protégé OWL as editorial tool




                                  - 60 -
WOO ontology cntd.

  •  Covers Yahoo s
     domains of interest
     –  Movies, Music, TV,
        Business listings, Events,
        Finance, Sports, Autos, …
     –  250 classes and 800
        properties (Sept, 2011)
     –  Available only internally
  •  Developed over 1.5
     years by Yahoo s
     editorial team
  •  Aligned with schema.org
     –  schema.org covers only a
        subset of the WOO
        ontology


                                     - 61 -
Value #1 — Breadth, depth, and accuracy at scale
                                                                              We show many
                                                                              entities we shouldn t
        Up-to-date correct entities




                                Real entities                 Dups, errors, and outdated entities




                                                         No photo



   Incorrect store
   URL

                                                                    WOO improves our breadth, depth,
                                                                    and accuracy by combining
                                                                    knowledge from alternative sources,
                                                                    and by modernizing how we do
                                                                    matching, blending, and de-duping


                        No business hours


                                                - 62 -
Value #2 — Agility launching new experiences

    Answers instead of links                                   Related knowledge in context
                        WOO lets us quickly create
                        entity centric DD modules
                        using the existing
                        knowledge in the KB




                                                                        The integrated KB lets us show
                  Emerging markets and tail pages                       relevant knowledge from one
                                                                        Yahoo property on other
                                                                        properties and off network




                    The KB gets us deep into the tail by combining
                    and blending knowledge from many sources



                                                - 63 -
Other potential benefits

   •  Dynamic interlinking of content
      –  E.g. direct links from Yahoo! News to background information
         in Yahoo! Music about an artist
   •  Dynamic composition of web pages
      –  Topic-entity pages
   •  Better understanding of user intent
      –  Semantic analysis of query logs
      –  Semantic analysis of navigation paths
   •  Exposure of Yahoo! content using standard technologies
      –  Linking to external sources to make it part of the Linked Data
         cloud



                                    - 64 -
Innovative media companies are moving in this direction




Courtesy of Silver Oliver (BBC)   - 65 -
Innovative media companies are moving in this direction




Courtesy of Evan Sandhaus (NYT).   - 66 -
Take home: use what works!


  •  The W3C s semantic technology stack is daunting
     –  The basics are simple:
         •  URIs for entity identifiers, RDF for data exchange
  •  Standards for embedding data in HTML
     –  Useful in search and at other points of content consumption
  •  Standards for expressing the meaning of data
     –  Useful in data integration
  •  Do your bit!




                                      - 67 -
The End

  •  Credits to many people from Yahoo! around the world
  •  Contact me at
     –  pmika@yahoo-inc.com
     –  @pmika




                               - 68 -
Питер Мика "Making the web searchable"

More Related Content

What's hot

Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Peter Mika
 
Linked data for Libraries
Linked data for LibrariesLinked data for Libraries
Linked data for Librariesrobin fay
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communicationSören Auer
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upDavide Palmisano
 
Cataloging101 foundations frbr - 2019 version
Cataloging101 foundations frbr - 2019 versionCataloging101 foundations frbr - 2019 version
Cataloging101 foundations frbr - 2019 versionrobin fay
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Roi Blanco
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?Peter Mika
 
Finding and managing livestock research information
Finding and managing livestock research information Finding and managing livestock research information
Finding and managing livestock research information ILRI
 
Semantic Search
Semantic SearchSemantic Search
Semantic Searchsssw2012
 
Internet browsing techniques
Internet browsing techniquesInternet browsing techniques
Internet browsing techniquesTola Odugbesan
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationSören Auer
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
Linked data and the future of libraries
Linked data and the future of librariesLinked data and the future of libraries
Linked data and the future of librariesRegan Harper
 
Controlled Vocabularies & Cataloging
Controlled Vocabularies & Cataloging Controlled Vocabularies & Cataloging
Controlled Vocabularies & Cataloging robin fay
 
Linked Data on the Web
Linked Data on the WebLinked Data on the Web
Linked Data on the WebOlaf Hartig
 
Linked Data Modeling for Beginner
Linked Data Modeling for BeginnerLinked Data Modeling for Beginner
Linked Data Modeling for BeginnerMyungjin Lee
 

What's hot (20)

Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
 
Linked data for Libraries
Linked data for LibrariesLinked data for Libraries
Linked data for Libraries
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
 
Cataloging101 foundations frbr - 2019 version
Cataloging101 foundations frbr - 2019 versionCataloging101 foundations frbr - 2019 version
Cataloging101 foundations frbr - 2019 version
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search
 
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
 
Finding and managing livestock research information
Finding and managing livestock research information Finding and managing livestock research information
Finding and managing livestock research information
 
Semantic Search
Semantic SearchSemantic Search
Semantic Search
 
Internet browsing techniques
Internet browsing techniquesInternet browsing techniques
Internet browsing techniques
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data Integration
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti... NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 
Linked data and the future of libraries
Linked data and the future of librariesLinked data and the future of libraries
Linked data and the future of libraries
 
Controlled Vocabularies & Cataloging
Controlled Vocabularies & Cataloging Controlled Vocabularies & Cataloging
Controlled Vocabularies & Cataloging
 
Linked Data on the Web
Linked Data on the WebLinked Data on the Web
Linked Data on the Web
 
Linked Data Modeling for Beginner
Linked Data Modeling for BeginnerLinked Data Modeling for Beginner
Linked Data Modeling for Beginner
 
NISO DCMI Webinar bibframe-20130123
NISO DCMI Webinar bibframe-20130123NISO DCMI Webinar bibframe-20130123
NISO DCMI Webinar bibframe-20130123
 

Viewers also liked

Антон Забанных "Данные для Яндекс.Почты"
Антон Забанных "Данные для Яндекс.Почты"Антон Забанных "Данные для Яндекс.Почты"
Антон Забанных "Данные для Яндекс.Почты"Yandex
 
Сергей Сергеев — «Системы контроля версий»
Сергей Сергеев — «Системы контроля версий»Сергей Сергеев — «Системы контроля версий»
Сергей Сергеев — «Системы контроля версий»Yandex
 
Niko Neufeld "A 32 Tbit/s Data Acquisition System"
Niko Neufeld "A 32 Tbit/s Data Acquisition System"Niko Neufeld "A 32 Tbit/s Data Acquisition System"
Niko Neufeld "A 32 Tbit/s Data Acquisition System"Yandex
 
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-по...
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-по...Арис Гионис «Методы анализа поведения пользователей и его применение в веб-по...
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-по...Yandex
 
Александр Ларьяновский "Виджеты для главной страницы Яндекса. Кто у кого «отн...
Александр Ларьяновский "Виджеты для главной страницы Яндекса. Кто у кого «отн...Александр Ларьяновский "Виджеты для главной страницы Яндекса. Кто у кого «отн...
Александр Ларьяновский "Виджеты для главной страницы Яндекса. Кто у кого «отн...Yandex
 
Михаил Трошев - Дизайн глазами разработчика
Михаил Трошев - Дизайн глазами разработчикаМихаил Трошев - Дизайн глазами разработчика
Михаил Трошев - Дизайн глазами разработчикаYandex
 
Сергей Белов "bem server: система сборки фронтенда"
Сергей Белов "bem server: система сборки фронтенда"Сергей Белов "bem server: система сборки фронтенда"
Сергей Белов "bem server: система сборки фронтенда"Yandex
 

Viewers also liked (7)

Антон Забанных "Данные для Яндекс.Почты"
Антон Забанных "Данные для Яндекс.Почты"Антон Забанных "Данные для Яндекс.Почты"
Антон Забанных "Данные для Яндекс.Почты"
 
Сергей Сергеев — «Системы контроля версий»
Сергей Сергеев — «Системы контроля версий»Сергей Сергеев — «Системы контроля версий»
Сергей Сергеев — «Системы контроля версий»
 
Niko Neufeld "A 32 Tbit/s Data Acquisition System"
Niko Neufeld "A 32 Tbit/s Data Acquisition System"Niko Neufeld "A 32 Tbit/s Data Acquisition System"
Niko Neufeld "A 32 Tbit/s Data Acquisition System"
 
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-по...
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-по...Арис Гионис «Методы анализа поведения пользователей и его применение в веб-по...
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-по...
 
Александр Ларьяновский "Виджеты для главной страницы Яндекса. Кто у кого «отн...
Александр Ларьяновский "Виджеты для главной страницы Яндекса. Кто у кого «отн...Александр Ларьяновский "Виджеты для главной страницы Яндекса. Кто у кого «отн...
Александр Ларьяновский "Виджеты для главной страницы Яндекса. Кто у кого «отн...
 
Михаил Трошев - Дизайн глазами разработчика
Михаил Трошев - Дизайн глазами разработчикаМихаил Трошев - Дизайн глазами разработчика
Михаил Трошев - Дизайн глазами разработчика
 
Сергей Белов "bem server: система сборки фронтенда"
Сергей Белов "bem server: система сборки фронтенда"Сергей Белов "bem server: система сборки фронтенда"
Сергей Белов "bem server: система сборки фронтенда"
 

Similar to Питер Мика "Making the web searchable"

Web search engines and search technology
Web search engines and search technologyWeb search engines and search technology
Web search engines and search technologyStefanos Anastasiadis
 
Breaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social SemanticsBreaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social SemanticsJohn Breslin
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011sssw2011
 
Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationRichard Wallis
 
How to Future-proof Your Content by Sarah Beckley
How to Future-proof Your Content by Sarah BeckleyHow to Future-proof Your Content by Sarah Beckley
How to Future-proof Your Content by Sarah BeckleyContent Strategy Workshops
 
Schema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowRichard Wallis
 
Contextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesRichard Wallis
 
Related Entity Finding on the Web
Related Entity Finding on the WebRelated Entity Finding on the Web
Related Entity Finding on the WebPeter Mika
 
How google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowHow google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowVasu Jain
 
Skb web2.0
Skb web2.0Skb web2.0
Skb web2.0animove
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic SearchRoi Blanco
 
2017 01-11 intelligent search and intranet - chihuahuas vs muffins v1
2017 01-11 intelligent search and intranet - chihuahuas vs muffins v12017 01-11 intelligent search and intranet - chihuahuas vs muffins v1
2017 01-11 intelligent search and intranet - chihuahuas vs muffins v1Don Miller
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 
Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...
Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...
Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...PhiloWeb
 

Similar to Питер Мика "Making the web searchable" (20)

Web search engines and search technology
Web search engines and search technologyWeb search engines and search technology
Web search engines and search technology
 
Search engines
Search enginesSearch engines
Search engines
 
Breaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social SemanticsBreaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social Semantics
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
 
Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data Foundation
 
How to Future-proof Your Content by Sarah Beckley
How to Future-proof Your Content by Sarah BeckleyHow to Future-proof Your Content by Sarah Beckley
How to Future-proof Your Content by Sarah Beckley
 
Schema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & How
 
Contextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of Entities
 
Related Entity Finding on the Web
Related Entity Finding on the WebRelated Entity Finding on the Web
Related Entity Finding on the Web
 
How google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowHow google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrow
 
Skb web2.0
Skb web2.0Skb web2.0
Skb web2.0
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic Search
 
2017 01-11 intelligent search and intranet - chihuahuas vs muffins v1
2017 01-11 intelligent search and intranet - chihuahuas vs muffins v12017 01-11 intelligent search and intranet - chihuahuas vs muffins v1
2017 01-11 intelligent search and intranet - chihuahuas vs muffins v1
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Semantic web
Semantic webSemantic web
Semantic web
 
Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...
Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...
Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...
 
Social web Ontologies
Social web OntologiesSocial web Ontologies
Social web Ontologies
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 

More from Yandex

Предсказание оттока игроков из World of Tanks
Предсказание оттока игроков из World of TanksПредсказание оттока игроков из World of Tanks
Предсказание оттока игроков из World of TanksYandex
 
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...Yandex
 
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров ЯндексаСтруктурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров ЯндексаYandex
 
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров ЯндексаПредставление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров ЯндексаYandex
 
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...Yandex
 
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...Yandex
 
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...Yandex
 
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...Yandex
 
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...Yandex
 
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...Yandex
 
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...Yandex
 
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...Yandex
 
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеровКак защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеровYandex
 
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...Yandex
 
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...Yandex
 
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...Yandex
 
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...Yandex
 
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...Yandex
 
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...Yandex
 
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...Yandex
 

More from Yandex (20)

Предсказание оттока игроков из World of Tanks
Предсказание оттока игроков из World of TanksПредсказание оттока игроков из World of Tanks
Предсказание оттока игроков из World of Tanks
 
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
 
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров ЯндексаСтруктурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
 
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров ЯндексаПредставление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
 
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
 
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
 
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
 
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
 
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
 
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
 
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
 
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
 
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеровКак защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
 
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
 
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
 
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
 
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
 
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
 
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
 
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Питер Мика "Making the web searchable"

  • 1. Making the Web Searchable Peter Mika Senior Researcher and Data Architect Yahoo! Inc.
  • 2. Agenda •  Web Directions –  Convergence of Search and Online Media •  Semantic technologies (th)at work –  Semantics for search •  RDFa, microdata –  Semantics for data integration •  RDF, OWL, SPARQL •  Take home: use what works! -2-
  • 3. More than just ten blue links
  • 4. It used to be pretty simple… -4-
  • 5. Yahoo! today is a global network of online media sites -5-
  • 6. ... with search as an important entry point to content Points of Faceted interest in Information search for Information box with Vienna, from the Shopping content from and Austria Knowledgeresults links to Yahoo! Graph Travel Since Aug, 2010, regular search results are Powered by Bing -6-
  • 7. Conversely, online media as an entry point to search Hovering over an underlined phrase triggers a search for related news items. -7-
  • 8. Aggregation across space: hyperlocal pages Hyperlocal: showing content from across Yahoo that is relevant to a particular neighbourhood. -8-
  • 9. Aggregation across entity types: special events -9-
  • 10. Personalization Yahoo s Content Optimization Relevance Engine (CORE) technology uses machine learning to predict click behavior based on user profile Display advertizing is also personalized by default. Users can opt-out of behavioral targeting through AdChoices. - 10 -
  • 11. Show related content Contextualization Social discovery: connect with friends watching the same - 11 -
  • 12. Convergence of search and online media •  Complex answers in search –  Using structured data, not just text –  Search over owned content and the best of the Web •  Aggregation –  Content aggregation around events, persons, other entities –  From creating topic pages to creating entire new websites •  Personalization and contextualization –  Understand user interests at a fine grained level –  Build and carry user profiles across search and media •  Common to these is a need for a more advanced understanding of the Web and our content - 12 -
  • 14. Search is really fast, without necessarily being intelligent - 14 -
  • 15. State of Search •  Improvements in search are harder and harder to come by –  Machine learning using hundreds of signals •  From text to the web graph –  Heavy investment in computational power •  e.g. real-time indexing and instant search •  Remaining challenges are not computational, but in modeling human understanding –  A machine is intelligent if it reasons and acts the way we would –  But could Watson explain why the answer is Toronto? •  How do we teach the computer about our world? –  How do we give meaning to documents and data? - 15 -
  • 17. What it s like to be a machine? Roi Blanco - 17 -
  • 18. What it s like to be a machine? ë✜Θ♬♬ţğ√∞ñ§®ÇĤĪ✜★¤♬☐✓✓ ţğ★¤✜èééééñ u✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫¤Γ ≠=⅚©§★✓♪ΒΓΕññ¤℠ ¢✖Γ♫⅜ ⏎↵⏏v☐ģğğğμλκσςτn nnnu⏎ñ⌥°¶§ΥΦΦΦ✗✕☐vuwwwww ë✜Θ♬♬ţğ - 18 -
  • 19. If machines are dumb, how to make their job easier? •  HTML is intended for human consumption –  A mix of text, data and styling •  Let s make it easier to process for machines –  Languages to publish data in HTML •  Agree between publishers and search engines on the meaning of certain symbols (ontologies) •  e.g. ⏎⅙¥ means that this page describes a Person –  Annotate HTML pages using these symbols –  (This is just an example… the actual markup is human readable) •  For data in particular, agree on what the types of objects are in the world, and what their attributes are –  e.g. between §℗ and §⌥⌘ is the age of the Person •  Leverage this understanding for more precise matching and ranking - 19 -
  • 20. Semantic Web •  Publish information in a way that is easier to process for machines •  Web of Data instead of Web of Documents •  Two main architectural challenges –  A common format for sharing data –  Sharing the meaning of data •  Through social means (shared schemas) •  By using powerful schema languages •  Semantic Web standards from W3C –  Languages (RDF, OWL, RIF) –  Serializations (RDF/XML, RDFa) –  Protocols (SPARQL, HTTP) •  Semantic Web research into knowledge representation and reasoning, data integration, data quality and many other topics •  Community efforts to publish data and develop schemas - 20 -
  • 21. Resource Description Framework (RDF) •  Each resource (thing, entity) is identified by a URI –  Globally unique identifiers •  RDF represents knowledge as a set of triples –  Each triple is a single fact about the entity (an attribute or a relationship) •  A set of triples forms an RDF graph RDF document type foaf:Person example:roi name Roi Blanco - 21 -
  • 22. Linking across the Web Roi s homepage Friend-of-a-Friend ontology type example:roi foaf:Person name Roi Blanco knows sameAs Yahoo! s website type worksWith #roi2 #peter email pmika@yahoo-inc.com - 22 -
  • 23. History of metadata in HTML •  1995: HTML meta tags •  1998: RDF/XML –  RDF/XML in HTML –  RDF linked from HTML •  2003: Web 2.0 –  Tagging, machine tags –  Microformats •  2005: eRDF •  2008: RDFa 1.0 •  2011: RDFa 1.1, Microdata - 23 -
  • 24. HTML meta tags <HTML> <HEAD profile="http://dublincore.org/documents/dcq-html/"> <META name="DC.author" content="Peter Mika"> <LINK rel="DC.rights copyright" href="http:// www.example.org/rights.html" /> <LINK rel="meta" type="application/rdf+xml" title="FOAF" href= "http://www.cs.vu.nl/~pmika/foaf.rdf"> </HEAD> … </HTML> - 24 -
  • 25. Microformats (µf) •  Agreements on the way to encode describe certain objects in HTML (persons, events, recipes…) –  Reuse of semantic-bearing HTML elements, e.g. class –  Based on existing standards, e.g. hCard –  Minimal: small number of types, most common attributes •  Community centered around microformats.org –  Centralized process, but not a formal standards body –  Wiki for specifications, mailing list - 25 -
  • 26. Example: the hCard microformat <div class="vcard"> <a class="email fn" href="mailto:jfriday@host.com">Joe Friday</a> <div class="tel">+1-919-555-7878</div> <div class="title">Area Administrator, Assistant</div> </div> <cite class="vcard"> <a class="fn url" rel="friend colleague met href="http://meyerweb.com/"> Eric Meyer</a> </cite> wrote a post (<cite> <a href="http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/"> Tax Relief</a></cite>) about an unintentionally humorous letter he received from the <span class="vcard > <a class="fn org url" href="http://irs.gov/"> Internal Revenue Service</a> </span>. - 26 -
  • 27. Microformats: limitations •  Syntax shared with HTML –  You need to implement extraction for each microformat separately •  Lack of formal schemas –  Limited reuse, extensibility of schemas –  Unclear which combinations are allowed •  Lack of a datatype system •  No unique identifiers (URIs) –  No linking, e.g. sameAs •  Always appears in the HTML <body> –  Not always clear how it relates to the main topic of the page •  Instability •  Everything is a draft… •  Varying degrees of support - 27 -
  • 28. RDFa •  W3C recommendation for embedding RDF data in HTML –  A set of new HTML attributes to be used in head or body –  A specification of how to extract the data from these attributes –  RDFa is just a syntax, you have to choose (or create) a vocabulary separately •  Addresses the limitations of microformats –  Syntax different from HTML –  Semantic Web schema languages (reuse, extend schemas) –  Unique identifiers for objects (interlinking, sameAs) –  Markup in head or body •  Alternative to publishing data as RDF/XML (Linked Data) –  Search engine friendly •  See also –  http://rdfa.info/ - 28 -
  • 29. RDFa evolution •  RDFa 1.0 is a W3C Recommendation since October, 2008 •  RDFa 1.1 is a small update on RDFa to reduce complexity, make it compatible with HTML5 –  Recommendation (June 7, 2012) –  Updated version of the RDFa Primer (June 7, 2012) –  HTML+RDFa Working Draft (Sept 11, 2012) •  New in RDFa 1.1 –  New vocab attribute to define the default namespace for the document or subtree –  The prefix attribute as a recommended replacement of xmlns –  You can use URIs even where only CURIEs were allowed before •  RDFa API for accessing RDFa data in a webpage in the browser from JavaScript –  Currently Working Draft (April 19, 2011) - 29 -
  • 30. RDFa intro: metadata in the header <html•  More info in the prefix="og: http://ogp.me/ns#"> <head> <title>The Trouble with Bob</title> <meta property="og:title" content="The Trouble with Bob" /> <meta property="og:type" content="text" /> <meta property="og:image" content="http://example.com/alice/bob-ugly.jpg" /> ... </head> - 30 -
  • 31. RDFa intro: links with a flavor •  More info in the All content on this site is licensed under <a rel="license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License </a>. - 31 -
  • 32. RDFa links: talking about subjects other than the page The trouble with Bob is that he takes much better photos than me: •  More info in the <div about="http://example.com/bob/photos/sunset.jpg"> <img src="http://example.com/bob/photos/sunset.jpg" /> <span property="og:title">Beautiful Sunset</span> by <span property="dc:creator">Bob</span>. </div> - 32 -
  • 33. RDFa links: talking about subjects other than the page <div typeof= foaf:Person"> <p property= foaf:name"> Alice Birpemswick </p> <p> Email: <a rel=More info in the •  foaf:mbox href="mailto:alice@example.com"> alice@example.com </a> </p> <p> Phone: <a rel= foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p> </div> - 33 -
  • 34. The process of annotating with RDFa •  Find a vocabulary that fits your needs and supported by your consumers –  A vocabulary describes a set of types and attributes within a given domain –  If you don t find a good candidate, extend an existing one or create a new one •  Annotate your page –  Before you start, you might want to validate your page for (X)HTML conformance using the W3C s (X)HTML Validator to reduce the chance of errors. Choose Document Type XHTML + RDFa. –  Use an HTML or XML editor that supports DTDs, or an RDFa editor such as RDFaCE –  Use the RDFa Distiller to validate which data can be extracted from your page. –  If you fancy, use the RDF Validator to graphically visualize the RDF graph that is outputted. •  Put the annotated page online –  The data will be extracted by your favorite search engine the next time your page is crawled and indexed –  The data will be available to browser extensions, bookmarklets etc. •  See http://rdfa.info/rdfa-implementations for new tools and APIs - 34 -
  • 35. Example: Yahoo! Enhanced Results (was: SearchMonkey) •  First major adopter of RDFa –  Launched in May, 2008 •  Guide for publishers to mark-up their pages for common types of objects –  Product, Local, News, Video, Events, Documents, Discussion, Games •  Using popular microformats and RDF vocabularies –  Copy-paste code –  Validator •  Yahoo as a consumer –  Enhanced Results - 35 -
  • 36. Example: Google s Rich Snippets •  Launched in May, 2009 •  Google encourages publishers to use popular microformats and its own RDFa vocabulary –  data-vocabulary.org •  Validator to check if the markup is correct •  Google displays enhanced results based on this metadata –  Rich Snippets - 36 -
  • 37. Example: Facebook s Like and the Open Graph Protocol •  Launched April, 2010 •  The Like button provides publishers with a way to promote their content on Facebook and build communities –  Shows up in profiles and news feed –  Site owners can later reach users who have liked an object –  Facebook Graph API allows 3rd party developers to access the data •  Open Graph Protocol is an RDFa-based format that allows to describe the object that the user Likes - 37 -
  • 38. Example: Facebook s Open Graph Protocol •  RDF vocabulary to be used in conjunction with RDFa –  Simplify the work of developers by restricting the freedom in RDFa •  Activities, Businesses, Groups, Organizations, People, Places, Products and Entertainment •  Only HTML <head> accepted <html xmlns:og="http://opengraphprotocol.org/schema/"> <head> <title>The Rock (1996)</title> <meta property="og:title" content="The Rock" /> <meta property="og:type" content="movie" /> <meta property="og:url" content="http://www.imdb.com/title/ tt0117500/" /> <meta property="og:image" content="http://ia.media-imdb.com/ images/rock.jpg" /> … </head> ... - 38 -
  • 39. Example: rNews •  RDFa vocabulary for news articles –  Easier to implement than NewsML –  Easier to consume for news search and other readers, aggregators •  Under development at the IPTC –  Version 0.5 - 39 -
  • 40. Microdata •  Developed by the HTML5 working group at the W3C –  RDFa was perceived as too complex and thus error prone •  Currently a companion document to HTML5 (working draft) •  Incompatible with RDFa <div itemscope itemid= http://www.yahoo.com/resource/person > <p>My name is <span itemprop="name">Neil</span>.</p> <p>My band is called <span itemprop="band">Four Parts Water</span>. I was born on <time itemprop="birthday" datetime="2009-05-10">May 10th 2009</time>. <img itemprop="image" src= me.png" alt= me > </p> </div - 40 -
  • 41. Competing formats, competing schemas •  Multiple incompatible formats: microformats, RDFa, microdata –  Varying degrees of adoption –  Not all formats are supported by all search engines •  Multiple competing schemas (ontologies) –  Different schemas for marking up the same information (RDFa and microdata) •  Major search engines support different existing alternatives or create their own (Google, Facebook) –  Not clear which schemas have adoption, who is responsible for maintaining them –  Slow convergence - 41 -
  • 42. schema.org •  Agreement on a shared set of schemas for common types of web content –  Bing, Google, and Yahoo! as initial founders (June, 2011) –  Similar in intent to sitemaps.org •  Use a single format to communicate the same information to all three search engines •  schema.org covers areas of interest to all search engines –  Business listings (local), creative works (video), recipes, reviews - 42 -
  • 43. schema.org evolution •  Yandex joins schema.org in Nov, 2011 –  Yandex.Slovari, Yandex.Spravochnik, Yandex.Kartinki, Yandex.Video •  RDFa Lite 1.1 –  Subset of the features of RDFa 1.1 –  W3C Recommendation since June, 2012 •  Two W3C task forces within the SW Interest Group (SWIG) –  Web schemas TF for ongoing collaborations on schema extensions, mappings, tooling etc. •  schema.org discussions are at public-vocabs@w3.org –  HTML Data TF finished in December, 2011 •  HTML Data Guide •  Microdata RDF: Transformation from HTML+Microdata to RDF •  Growing number of 3rd party contributions –  rNews (news) –  GoodRelations (e-commerce) –  Health and Life Sciences –  Technical Publishing - 43 -
  • 44. Documentation and OWL ontology - 44 -
  • 45. Current state of semantic search •  Limited usage in commercial search engines –  Enhanced results –  Faceted search •  Google s Recipe Search –  Navigation to related entities •  Yahoo s Vertical Intent Search •  Positive SEO effects –  Enhanced results are clicked more –  Enhanced results help users find relevant results •  Increased adoption of data markup - 45 -
  • 46. Semantic Search development •  Research –  RDF indexing and ranking –  Searching over annotated web pages –  Search result summarization –  Question answering –  Task completion –  Semantic log analysis •  Prototype pure RDF search engines –  Sindice and Sig.ma from DERI - 46 -
  • 47. Current state of metadata on the Web •  31% of webpages, 5% of domains contain some metadata –  Analysis of the Bing Crawl (US crawl, January, 2012) –  RDFa is most common format •  By URL: 25% RDFa, 7% microdata, 9% microformat •  By eTLD (PLD): 4% RDFa, 0.3% microdata, 5.4% microformat –  Adoption is stronger among large publishers •  Especially for RDFa and microdata •  See also –  P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012 –  H.Mühleisen, C.Bizer. Web Data Commons - Extracting Structured Data from Two Large Web Corpora, LDOW 2012 - 47 -
  • 48. Exponential growth in RDFa data Another five-fold increase between October 2010 and January, 2012 Five-fold increase between March, 2009 and October, 2010 Percentage of URLs with embedded metadata in various formats - 48 -
  • 49. Semantic technologies for Data Integration
  • 50. Today s world is a Web of Pages - 50 -
  • 51. All these pages come from structured knowledge about people, places, and things MLB team 10% off tickets Is a for Chicago Cubs plays for plays in Chicago Carlos Zambrano from Barack Obama - 51 -
  • 52. This underlying world is WOO—the Web of Objects MLB team 10% off tickets Is a for Chicago Cubs plays for plays in Chicago Carlos Zambrano from Barack Obama - 52 -
  • 53. Today our knowledge of this world is siloed, incomplete, inconsistent, inaccurate, and hard to reuse MLB team 10% off tickets Entertainment isa for Upcoming Shopping Finance Sports Local Chicago Cubs plays for plays in Chicago Carlos Zambrano from Scott Roy - 53 -
  • 54. Our vision is a single shared knowledge base—accurate, scalable, and easy to reuse MLB team 10% off tickets isa for Chicago Cubs plays in plays for Chicago Carlos Zambrano from Barack Obama - 54 -
  • 55. Knowledge comes from many sources Show times and other information for US movies from source B Show times Attributes Show times for Harry Potter and the Deathly Hallows part II Harry Potter and the Deathly Hallows part II Entities - 55 -
  • 56. Combining these requires working with complementary, parallel, and overlapping sources Cast information for US movies from source A Cast and show time Attributes information for global movies from licensed feeds Cast information for global movies from Wikipedia Entities - 56 -
  • 57. There is a tremendous opportunity to do this directly from Web pages, reverse engineering the Web Attributes Information from structured data extraction on billions of Web pages Entities - 57 -
  • 58. Semantic technologies for data integration •  Semantic Web provides the basic technologies for Linked Data –  URIs as unique identifiers •  Retrieve data from the (internal) web •  Follow links in the data that is returned –  RDF as a common data format –  OWL as a powerful schema language for validation and reasoning –  SPARQL for queries, reasoning and transformations - 58 -
  • 59. Components •  Data is ingested from web extraction, feeds, editorial content (billions of objects) •  Data integration using Hadoop clusters –  Schema matching to the WOO ontology –  Object reconciliation –  Blending •  Data quality assessment •  Information extraction –  Text, e.g. news content –  Webpages •  Enrichment –  Feature computation based on user behavior, social signals and web content •  Serving and ranking –  Selecting the right objects to show by query, user, geography etc. - 59 -
  • 60. WOO ontology •  Primary use case is data validation –  During information extraction and throughout the WOO platform –  No reasoning •  OWL2 ontology –  Automatic documentation –  Change management –  Conversion to Yahoo internal schema language –  Protégé OWL as editorial tool - 60 -
  • 61. WOO ontology cntd. •  Covers Yahoo s domains of interest –  Movies, Music, TV, Business listings, Events, Finance, Sports, Autos, … –  250 classes and 800 properties (Sept, 2011) –  Available only internally •  Developed over 1.5 years by Yahoo s editorial team •  Aligned with schema.org –  schema.org covers only a subset of the WOO ontology - 61 -
  • 62. Value #1 — Breadth, depth, and accuracy at scale We show many entities we shouldn t Up-to-date correct entities Real entities Dups, errors, and outdated entities No photo Incorrect store URL WOO improves our breadth, depth, and accuracy by combining knowledge from alternative sources, and by modernizing how we do matching, blending, and de-duping No business hours - 62 -
  • 63. Value #2 — Agility launching new experiences Answers instead of links Related knowledge in context WOO lets us quickly create entity centric DD modules using the existing knowledge in the KB The integrated KB lets us show Emerging markets and tail pages relevant knowledge from one Yahoo property on other properties and off network The KB gets us deep into the tail by combining and blending knowledge from many sources - 63 -
  • 64. Other potential benefits •  Dynamic interlinking of content –  E.g. direct links from Yahoo! News to background information in Yahoo! Music about an artist •  Dynamic composition of web pages –  Topic-entity pages •  Better understanding of user intent –  Semantic analysis of query logs –  Semantic analysis of navigation paths •  Exposure of Yahoo! content using standard technologies –  Linking to external sources to make it part of the Linked Data cloud - 64 -
  • 65. Innovative media companies are moving in this direction Courtesy of Silver Oliver (BBC) - 65 -
  • 66. Innovative media companies are moving in this direction Courtesy of Evan Sandhaus (NYT). - 66 -
  • 67. Take home: use what works! •  The W3C s semantic technology stack is daunting –  The basics are simple: •  URIs for entity identifiers, RDF for data exchange •  Standards for embedding data in HTML –  Useful in search and at other points of content consumption •  Standards for expressing the meaning of data –  Useful in data integration •  Do your bit! - 67 -
  • 68. The End •  Credits to many people from Yahoo! around the world •  Contact me at –  pmika@yahoo-inc.com –  @pmika - 68 -