Moving From Noise to Signal Semantic Web
Agenda Introduction Semantic Web What is Semantic Web? Why it matters? How to Semantify the Web? Web 3.0 Linked Data
Introduction 1.3+ billion people connected to the web 2006  161 EB of information created/replicated  (1 EB = 1 billion GB) Technical information doubled every 2 years By 2010  six times to 988Eb (approx = 1 ZB) Technical information will double every 72 hours Computers, mobile phones, intelligent devices  Internet is broken – not one web – unable to communicate
Information Overload Is that really how the Web experience is supposed to feel?  Key Problem – how to share meaning? Filtering, not aggregating.  Not more, just smarter.
Semantics? Related to Syntax Syntax – How you say something (letters, punctuation, grammar) eg. HTML Semantics – Meaning behind what you say Example: I Love Technology I  Technology
What’s the big deal? Internet std way to communicate Parrot – mimic w/o understanding The Web Store and retrieve docs on the internet syntax to display the doc (HTML) Search Engines Find any website that we want Life is good!!! Can we make it any better?? How??
The Answer – Semantic Web Understand the meaning behind webpages Web of Things vs Web of Documents Things can be ANYTHING – people, places, pets, events, music, movies, organizations…. Not only identify these things but also relationships (Human-like!!!) Embed semantics in html docs – microformats, RDF It’s not about the future…it’s about Today!!!
The Possibilities
 
Why Semantic Web? Spend less time searching Spend less time looking at things that do not matter Spend less time explaining what we want to computers Bottomline – improve the online experience!!!
Cartoon by  Geek and Poke
It’s all about the noise… Web 1.0:  Get  (hear & see) ‘Noise’ Web 2.0:  Make  Noise Web 3.0:  Filter  the Noise Web 4.0: Going deaf….or  SmartNoise
Semantifying the Web - Approaches Bottom Up Annotating information in web pages with machine readable tags Technical Challenges Representational Complexity How to create – manual/automatic? How much can be transformed? Standard Issue Business Challenges It’s primitive Consumer Value? How to market? Recent Wins: Yahoo search engine to support RDF, MF Dapper – automated annotation tool
Annotation Technologies  Trade-off between simplicity and completeness RDF Graph based – things, attributes, relationships Precise but complex Triple Microformats Uses specific CSS styles Compact Embedded in HTML gaining popularity because of their simplicity Popular microformats: hCard: describes personal and company contact information hReview: adds meta information to review pages hCalendar: used to describe events Limitations no way to described type hierarchies somewhat cryptic, because the focus is to keep the annotations to a minimum Flickr, Eventful, and LinkedIn
Semantifying the Web - Approaches Top Down Focused on leveraging information in existing web pages As – is NLP Tools (entity extraction) Calais & TextWise – APIs that recognize people, companies, places in docs Vertical Search Engines – ZoomInfo, Spock & Retrevo Dapper, BlueOrganizer, ClearForest – recognize objects in web pages & annotate them Yahoo! Shortcuts, Snap, Smartlinks – recognize objects in text and links Challenges Not 100% perfect, has ambiguities May not scale well
Map+ add-on for Firefox vertical search engine Spock
More Annotations    Structured Web    More Precise Top-Down
Web 3.0  =  Semantic Web  =  Linked Data Are They Equal??
 
Structured Data RDBMS Powerful and flexible Pre-defined relationships and usage of data Too constraining and too structured Schema changes are expensive Virtually impossible to make different DBs speak Linked Data Establish linkages at the data level(RDF) Bridges the gap between unstructured and structured data Does not add any semantic meaning to the information
Linked Data Medium for the semantic web  It does not create smart data, only enables it Relies on clean, granular, structured data Pre-Structured Pre-Connected
Further Reading RDF, OWL, Microformats, FOAF Linked Data Semantic APIs

Semantic Web

  • 1.
    Moving From Noiseto Signal Semantic Web
  • 2.
    Agenda Introduction SemanticWeb What is Semantic Web? Why it matters? How to Semantify the Web? Web 3.0 Linked Data
  • 3.
    Introduction 1.3+ billionpeople connected to the web 2006 161 EB of information created/replicated (1 EB = 1 billion GB) Technical information doubled every 2 years By 2010 six times to 988Eb (approx = 1 ZB) Technical information will double every 72 hours Computers, mobile phones, intelligent devices Internet is broken – not one web – unable to communicate
  • 4.
    Information Overload Isthat really how the Web experience is supposed to feel? Key Problem – how to share meaning? Filtering, not aggregating. Not more, just smarter.
  • 5.
    Semantics? Related toSyntax Syntax – How you say something (letters, punctuation, grammar) eg. HTML Semantics – Meaning behind what you say Example: I Love Technology I Technology
  • 6.
    What’s the bigdeal? Internet std way to communicate Parrot – mimic w/o understanding The Web Store and retrieve docs on the internet syntax to display the doc (HTML) Search Engines Find any website that we want Life is good!!! Can we make it any better?? How??
  • 7.
    The Answer –Semantic Web Understand the meaning behind webpages Web of Things vs Web of Documents Things can be ANYTHING – people, places, pets, events, music, movies, organizations…. Not only identify these things but also relationships (Human-like!!!) Embed semantics in html docs – microformats, RDF It’s not about the future…it’s about Today!!!
  • 8.
  • 9.
  • 10.
    Why Semantic Web?Spend less time searching Spend less time looking at things that do not matter Spend less time explaining what we want to computers Bottomline – improve the online experience!!!
  • 11.
    Cartoon by Geek and Poke
  • 12.
    It’s all aboutthe noise… Web 1.0: Get (hear & see) ‘Noise’ Web 2.0: Make Noise Web 3.0: Filter the Noise Web 4.0: Going deaf….or SmartNoise
  • 13.
    Semantifying the Web- Approaches Bottom Up Annotating information in web pages with machine readable tags Technical Challenges Representational Complexity How to create – manual/automatic? How much can be transformed? Standard Issue Business Challenges It’s primitive Consumer Value? How to market? Recent Wins: Yahoo search engine to support RDF, MF Dapper – automated annotation tool
  • 14.
    Annotation Technologies Trade-off between simplicity and completeness RDF Graph based – things, attributes, relationships Precise but complex Triple Microformats Uses specific CSS styles Compact Embedded in HTML gaining popularity because of their simplicity Popular microformats: hCard: describes personal and company contact information hReview: adds meta information to review pages hCalendar: used to describe events Limitations no way to described type hierarchies somewhat cryptic, because the focus is to keep the annotations to a minimum Flickr, Eventful, and LinkedIn
  • 15.
    Semantifying the Web- Approaches Top Down Focused on leveraging information in existing web pages As – is NLP Tools (entity extraction) Calais & TextWise – APIs that recognize people, companies, places in docs Vertical Search Engines – ZoomInfo, Spock & Retrevo Dapper, BlueOrganizer, ClearForest – recognize objects in web pages & annotate them Yahoo! Shortcuts, Snap, Smartlinks – recognize objects in text and links Challenges Not 100% perfect, has ambiguities May not scale well
  • 16.
    Map+ add-on forFirefox vertical search engine Spock
  • 17.
    More Annotations  Structured Web  More Precise Top-Down
  • 18.
    Web 3.0 = Semantic Web = Linked Data Are They Equal??
  • 19.
  • 20.
    Structured Data RDBMSPowerful and flexible Pre-defined relationships and usage of data Too constraining and too structured Schema changes are expensive Virtually impossible to make different DBs speak Linked Data Establish linkages at the data level(RDF) Bridges the gap between unstructured and structured data Does not add any semantic meaning to the information
  • 21.
    Linked Data Mediumfor the semantic web It does not create smart data, only enables it Relies on clean, granular, structured data Pre-Structured Pre-Connected
  • 22.
    Further Reading RDF,OWL, Microformats, FOAF Linked Data Semantic APIs