Ravish Bhagdev



The Invisible Web of

Linked Data
WWW     Web of      Our Role
      Linked Data
World Wide Web



                 HTTP, URL, HTML
WWW as we know

                                   A global repository of
           Web Page X              interconnected documents
   Text Text Text Text Text Text
   Link Text Describing Page Y     Documents linked by hyperlinks
   Text Text Text Text Text Text
   Text Text Text Text Text Text   – Implicit meaning assigned to links
                                     with a few words or pictures

                                   – Anyone can link a new page to any
                                     other available page
          Web Page Y
                                   – Links can also be made to specific
  Text Text Text Text Text Text      sections of a page (anchor tags)
  Text Text Text Text Text Text
  Text Text Text Text Text Text
  Text Text Text Text Text Text
WWW as we know
Search Engines and power of links


                      •   Use links between web-pages to rank
                          results

                      •   Each search performed results in
                          generation of a new page with more
                          hyperlinks

                      •   Click-through relevance information

                      •   Redundancy of information

                      •   Focus on the most useful and relevant page

                      •   Freshness

                      •   Links are the key
Web of Linked Data



            Linking more than just pages
Structured vs Unstructured Data
      Structured           Simply defined as
                              Data that can be parsed
   XML        DB              and processed by
                              machines to automate
                              operations like matching,
   CSV       RDF              classification, querying etc.

     Unstructured
                              Data that cannot be parsed
  HTML       Doc              in this manner without loss
                              of information or
                              requirement of input from
   PDF       PPT              external agents
Semi-Structured Data

                       Most however is semi-
   XML       DB        structured
                         – HTML (HEAD, TITLE, BODY etc)
   CSV      RDF          – Email (Header vs Body)
                         – DBs with free text fields
                         – Forms with descriptive fields
                         – CSVs, Spreadsheets
  HTML       Doc         – Some more structured than
                           others
                         – Not possible to express
   PDF       PPT           everything in structured form
Web: Giant Data Shredder




                           Web Page
     DBs                    (HTML)
Creating Links at Level of Entities

                                      Requires Semantic Markup
                                        – Uniquely Identify Entities in a
               Name:
              <String>                    Document (URIs)
             John Smith
                                        – Make Relations Between
                                          Entities Explicit
                                        – Shared Vocabulary
              Person:
              <Person>                    (schema.org, FOAF, SKOS,
             Person3456                   GoodRelations etc.)
Residence:                  Age:
<Country>                 <Integer>     – Reuse existing vocabularies
   UK                        24           instead of inventing new ones
                                        – Both with-in the same
                                          document and with other
                                          Entities in other Documents
Integrated Data Across Applications



   Where should I go on vacation?
                                                                         How do I get the best fare?
      what
                                          What is it like there?
                                                                                how
                                                                                           Travel
     Travel                                where
                                                                                           Services
     Interests
     (FB)           who              Places
                                     to go                                     Where should I stay?
              People who             (lonely planet)           what
              have been there
              (foursquare)                                 What do I need to know about it?
                                                   Photos, blogs, news stories
How Search Engines Use Linked Data
How Search Engines Use Linked Data
Our Role



 Why I think we should engage in Linked Data Initivatives
We Build Web Apps
» That publish content

» We want the content to be

    » Visible to next generation apps

    » Ranked High by Search Engines

    » Presented clearly and unambiguously
      (think price comparision websites)

    » All the big players are doing it

    » Including most governments
The Visible Web

Is Dominated by Marketing

   – The web is seen through the lense of Search Engines and Social
     Networks

   – Not every relevant page can appear on first page of search results

   – Paid, Targetted advertising

   – Affiliate Programs and Contracts

   – Popularity overrides quality of matches

   – What is popular is decided by sites that are already popular

   – Visible web is more and more biased
Emerging marketing tactics, circa 2010
Creating Linked Data (How?)

» Implicitly

» Integrating Linked Data Standards with
  Publishing tools (CMS!), apps, gadgets,
                                                            Entity P
                                             • Entity A                • Entity X
  social networks etc.
                                             • Entity B   • Entity Q   • Entity Y
» Minimize the effort while maximizing                    • Entity R
  the return
                                               Entity C                  Entity Z

» Every time you add a new friend on FB
  or follow someone on twitter, you create
  linked data
Linking of information creates
 Insights
                                  Enterprise Search & KM
                                  Linked Data is even more relevant in the
                                  context of enterprise search because
Raw Data: symbols and chars       these data don’t even have simple
                                  hyperlinks
Informaton: Data in usable form

Knowledge: Information Enriched   Popularity of a document is not always
with Semantics                    the most important factor
Wisdom: Understanding,
Hindsight, Experience             Large companies struggle to make use of
                                  knowledge and information across
                                  departments and silos

                                  Timly linking of data across these silos
                                  can have a big impact
?             Questions?
                 @RavBhagdev




Theres more: Information Extraction, Social Search, Knowledge Capture etc.
What’s Your Message?
Thanks!

Linked Data: How it is changing the way data is published and accessed on web

  • 1.
    Ravish Bhagdev The InvisibleWeb of Linked Data
  • 2.
    WWW Web of Our Role Linked Data
  • 3.
    World Wide Web HTTP, URL, HTML
  • 4.
    WWW as weknow A global repository of Web Page X interconnected documents Text Text Text Text Text Text Link Text Describing Page Y Documents linked by hyperlinks Text Text Text Text Text Text Text Text Text Text Text Text – Implicit meaning assigned to links with a few words or pictures – Anyone can link a new page to any other available page Web Page Y – Links can also be made to specific Text Text Text Text Text Text sections of a page (anchor tags) Text Text Text Text Text Text Text Text Text Text Text Text Text Text Text Text Text Text
  • 5.
  • 6.
    Search Engines andpower of links • Use links between web-pages to rank results • Each search performed results in generation of a new page with more hyperlinks • Click-through relevance information • Redundancy of information • Focus on the most useful and relevant page • Freshness • Links are the key
  • 7.
    Web of LinkedData Linking more than just pages
  • 8.
    Structured vs UnstructuredData Structured Simply defined as Data that can be parsed XML DB and processed by machines to automate operations like matching, CSV RDF classification, querying etc. Unstructured Data that cannot be parsed HTML Doc in this manner without loss of information or requirement of input from PDF PPT external agents
  • 9.
    Semi-Structured Data Most however is semi- XML DB structured – HTML (HEAD, TITLE, BODY etc) CSV RDF – Email (Header vs Body) – DBs with free text fields – Forms with descriptive fields – CSVs, Spreadsheets HTML Doc – Some more structured than others – Not possible to express PDF PPT everything in structured form
  • 10.
    Web: Giant DataShredder Web Page DBs (HTML)
  • 11.
    Creating Links atLevel of Entities Requires Semantic Markup – Uniquely Identify Entities in a Name: <String> Document (URIs) John Smith – Make Relations Between Entities Explicit – Shared Vocabulary Person: <Person> (schema.org, FOAF, SKOS, Person3456 GoodRelations etc.) Residence: Age: <Country> <Integer> – Reuse existing vocabularies UK 24 instead of inventing new ones – Both with-in the same document and with other Entities in other Documents
  • 12.
    Integrated Data AcrossApplications  Where should I go on vacation?  How do I get the best fare? what  What is it like there? how Travel Travel where Services Interests (FB) who Places to go  Where should I stay? People who (lonely planet) what have been there (foursquare)  What do I need to know about it? Photos, blogs, news stories
  • 13.
    How Search EnginesUse Linked Data
  • 14.
    How Search EnginesUse Linked Data
  • 15.
    Our Role WhyI think we should engage in Linked Data Initivatives
  • 16.
    We Build WebApps » That publish content » We want the content to be » Visible to next generation apps » Ranked High by Search Engines » Presented clearly and unambiguously (think price comparision websites) » All the big players are doing it » Including most governments
  • 17.
    The Visible Web IsDominated by Marketing – The web is seen through the lense of Search Engines and Social Networks – Not every relevant page can appear on first page of search results – Paid, Targetted advertising – Affiliate Programs and Contracts – Popularity overrides quality of matches – What is popular is decided by sites that are already popular – Visible web is more and more biased
  • 18.
  • 19.
    Creating Linked Data(How?) » Implicitly » Integrating Linked Data Standards with Publishing tools (CMS!), apps, gadgets, Entity P • Entity A • Entity X social networks etc. • Entity B • Entity Q • Entity Y » Minimize the effort while maximizing • Entity R the return Entity C Entity Z » Every time you add a new friend on FB or follow someone on twitter, you create linked data
  • 21.
    Linking of informationcreates Insights Enterprise Search & KM Linked Data is even more relevant in the context of enterprise search because Raw Data: symbols and chars these data don’t even have simple hyperlinks Informaton: Data in usable form Knowledge: Information Enriched Popularity of a document is not always with Semantics the most important factor Wisdom: Understanding, Hindsight, Experience Large companies struggle to make use of knowledge and information across departments and silos Timly linking of data across these silos can have a big impact
  • 22.
    ? Questions? @RavBhagdev Theres more: Information Extraction, Social Search, Knowledge Capture etc.
  • 23.