Web 3 Mark Greaves
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Web 3 Mark Greaves

on

  • 3,425 views

 

Statistics

Views

Total Views
3,425
Views on SlideShare
3,424
Embed Views
1

Actions

Likes
4
Downloads
64
Comments
1

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Web 3 Mark Greaves Presentation Transcript

  • 1. Dr. Mark Greaves Vulcan Inc. [email_address] slides © 2010 Vulcan Inc. The Evolving Semantic Web: From Military Technology to Venture Capital
  • 2. Defining Terms
    • Web 2.0 – “The Read/Write Web”
      • Web as platform
      • Social networks, collective intelligence, and crowdsourcing
      • Long Tail Markets
      • 1 st gen web data exploitation: mashups, tag clouds, RSS
  • 3. Defining Terms
    • Web 2.0 – “The Read/Write Web”
      • Web as platform
      • Social networks, collective intelligence, and crowdsourcing
      • Long Tail Markets
      • 1 st gen web data exploitation: mashups, tag clouds, RSS
    • Web 3.0 – “The World Wide Database”
      • Overall: The use of Semantic technology in Web applications
      • My talk: The use of Semantic Web technology in Web applications
  • 4. Defining Terms
    • Web 2.0 – “The Read/Write Web”
      • Web as platform
      • Social networks, collective intelligence, and crowdsourcing
      • Long Tail Markets
      • 1 st gen web data exploitation: mashups, tag clouds, RSS
    • Web 3.0 – “The World Wide Database”
      • Overall: The use of Semantic technology in Web applications
      • My talk: The use of Semantic Web technology in Web applications
    • Imagine the biggest database in the world…
      • Is it the Social Security database on steroids?
      • Or is it like the web?
        • Universal publication, web-scale pivot, multiple points of view
        • Always changing, always evolving, no central control
        • The Semantic Web is both geeky and transformative
  • 5. What is the Semantic Web? 4 Answers
    • The set of W3C standards for simple logic languages on the web
      • RDF, RDF/S, OWL, OWL2, SPARQL
      • Plus all the pages and web data sources that use these standards
  • 6. What is the Semantic Web? 4 Answers
    • The set of W3C standards for simple logic languages on the web
      • RDF, RDF/S, OWL, OWL2, SPARQL
      • Plus all the pages and web data sources that use these standards
    • The Web of Data
      • A fully distributed web-based system to publish logical assertions
      • A way to link to someone else’s data and query it across the web
      • Democratic, crowd-based, scalable knowledge engineering
      • The hottest area of web architecture right now
  • 7. What is the Semantic Web? 4 Answers
    • The set of W3C standards for simple logic languages on the web
      • RDF, RDF/S, OWL, OWL2, SPARQL
      • Plus all the pages and web data sources that use these standards
    • The Web of Data
      • A fully distributed web-based system to publish logical assertions
      • A way to link to someone else’s data and query it across the web
      • Democratic, crowd-based, scalable knowledge engineering
      • The hottest area of web architecture right now
    • The largest formal knowledge base on Earth
      • Also the messiest formal knowledge base on Earth
  • 8. What is the Semantic Web? 4 Answers
    • The set of W3C standards for simple logic languages on the web
      • RDF, RDF/S, OWL, OWL2, SPARQL
      • Plus all the pages and web data sources that use these standards
    • The Web of Data
      • A fully distributed web-based system to publish logical assertions
      • A way to link to someone else’s data and query it across the web
      • Democratic, crowd-based, scalable knowledge engineering
      • The hottest area of web architecture right now
    • The largest formal knowledge base on Earth
      • Also the messiest formal knowledge base on Earth
    • A revolution in the way we think of data, crowds, and schemas
      • Web techniques (social networks, collaboration, mashup, search) applied to data
      • Massive, partial, participatory, logically weak, dynamic, schema-last
      • A way to democratize and socially curate databases
  • 9. Talk Outline: The Evolving Semantic Web
    • The Origins of the Semantic Web
      • DARPA’s DAML Program
      • RDF and OWL
    • Semantic Web Evolution to 2010
    • Semantic Web Transformation in Three Application Areas
      • Markets and Companies
    • The Fourth Generation
      • Linked Data and the Scaling Revolution
  • 10. Talk Outline: The Evolving Semantic Web
    • The Origins of the Semantic Web
      • DARPA’s DAML Program
      • RDF and OWL
    • Semantic Web Evolution to 2010
    • Semantic Web Transformation in Three Application Areas
      • Markets and Companies
    • The Fourth Generation
      • Linked Data and the Scaling Revolution
  • 11. The Roots of the Semantic Web
    • Semantic technology has been a distinct research field for decades
      • Symbolic Logic (from Russell and Frege)
      • Knowledge Representation Systems in AI
        • Semantic Networks (Bill Woods, 1975)
        • US and EU R&D programs in information integration (80s and 90s)
        • Development of simple tractable “description logics” for classification
        • Conceptual Graphs and Formal Concept Analysis
      • Relational Algebras and Schemas in Database Systems
    • Library Science (classifications, thesauri, taxonomies)
    So, What Sparked the Semantic Web?
    • What’s new was the Web!
      • The material needed to answer almost any question is somewhere on the web
      • A massive infrastructure of data servers, protocols, authentication systems, presentation languages, and thin clients that can be leveraged
      • A way around needing the “big data warehouse”
  • 12. The Beginnings of the US Semantic Web: DARPA’s DAML Program
    • Solution:
    • Augment the web to link machine-readable knowledge to web pages
      • Extend RDF with Description Logic
      • Extensibility via frame-based language design
      • Create the first fully distributed web-scale knowledge base out of networks of hyperlinked facts and data
    • Approach:
    • Design a family of new web languages
      • Basic knowledge representation (OWL)
      • Reasoning (SWRL, OWL/P, OWL/T)
      • Process representation (OWL/S)
    • Build definition and markup tools
    • Link new knowledge to existing web page elements
    • Test design approach with operational pilots in US Government
    • Partner with parallel EU efforts to standardize the new web languages
    People use implicit knowledge to reason with web pages Computers require explicit knowledge to reason with web pages Existing Web (HTML/XML over HTTP) Semantic Web (OWL over HTTP) Links via URLs Problem: Computers cannot process most of the information stored on web pages
  • 13. What is RDF?
    • RDF is a dialect of XML, designed for dynamic web data
      • RDFa is a standard way to encode RDF in ordinary XHTML web pages
    • RDF and RDF Schema defines data entities and basic relations
      • Assertions are triples of (resource, property, resource) or (resource, property, value)
      • All elements in triples are either primitive values or URIs
      • Relations are represented directly, unlike most databases
      • Assertions are precise enough to be interpreted by machines
    typeOf isa VIN1234 hasManufacturer isa isa isa isa isa Vehicle Truck 1999 1999 Corolla Toyota Corolla Car Vehicle Manufacturer Year Mark’s Corolla manufactures manufacturedIn manufacturedIn hasVIN hasManufacturer Graphic from Mary Pulvermacher, MITRE Resource Property Property Value
  • 14. What Does OWL Add to RDF?
    • OWL extends RDF with more expressive power
      • Simple relations between classes: Equivalent class (e.g., US_President and PrincipalResidentofWhiteHouse), Disjoint class (e.g., Male and Female), subClassOf
      • Create new classes: intersectionOf, unionOf, complementOf)
      • Property characteristics (inverseOf, transitive, symmetric, subPropertyOf, etc.)
      • Range and Cardinality constraints (e.g., birthMother is exactly one person)
    • OWL allows its users to make new inferences
      • The extended expressive power allows OWL assertions to be combined according to the rules of OWL’s logic ( SHIQ )
    • OWL enables its users to surround raw data statements with a context (an ontology)
    Graphic from Mary Pulvermacher, MITRE If... And... spouseOf Can conclude... SymmetricProperty Jim Mary spouseOf rdf:type spouseOf Mary Jim
  • 15. Talk Outline: The Evolving Semantic Web
    • The Origins of the Semantic Web
      • DARPA’s DAML Program
      • RDF and OWL
    • Semantic Web Evolution to 2010
    • Semantic Web Transformation in Three Application Areas
      • Markets and Companies
    • The Fourth Generation
      • Linked Data and the Scaling Revolution
  • 16. The Semantic Web in 2010 Cutting Edge Mature Still Research “ The Famous Semantic Web Technology Stack”
  • 17. The Semantic Web in 2010 Commercial Cutting Edge Mature Active Research and Standards Activity “ The Famous Semantic Web Technology Stack”
  • 18. The Semantic Web in 2010 Commercial Cutting Edge Mature Active Research and Standards Activity “ The Famous Semantic Web Technology Stack”
  • 19. The Semantic Web in 2010 Mature Active Research and Standards Activity Commercial Cutting Edge “ The Famous Semantic Web Technology Stack”
  • 20. Completing the Semantic Web Picture Active Research and Standards Activity Mature Other Technologies Impact the Semantic Web More Ontologies Tag Systems Microformats Social Authorship Combined RDF/OWL and Database systems Scalable Reasoning Systems A Huge Base of RDF data Commercial Cutting Edge Description LP, SWRL, SILK...
  • 21. Beyond RDF and OWL: 2010 Semantic Web Infrastructure
    • Data Markup Languages
      • HTML-friendly semantic markup dialects: Microformats and RDFa
      • OWL 2 improves the original OWL spec
    • Triplestores and SPARQL Servers
      • Stores for 1B+ triples now available,
      • Commercial: AllegroGraph, Virtuoso, BigOWLIM, Oracle 11g Semantic Technologies, Talis Platform…
      • Open Source: 4Store, Sesame, Redland...
      • Next step is parallel web delivery architectures
    • Entity Name Service (Okkam, DBpedia)
    • Semantic Web Reasoners
      • Commercial: Oracle 11g RDFS/OWL engine, Ontobroker, Ontotext, RacerPro
      • Open Source: Pellet, FaCT++...
      • BRMS integration
    • Vocabularies and Design Tools
      • Ontologies: Dublin Core, FOAF, SIOC...
      • OpenSource: Protégé, SWOOP...
      • Commercial: TopBraid Composer, Knoodl…
    • Semweb Data Generation
      • RDF / RDBMS front-ends
      • Entity extractors and taggers (OpenCalais)
      • Zemanta-type author’s assistants
      • Semantic wikis
    • Semweb Data Exploitation
      • Semweb search engines (Sindice, Watson, Falcon...)
      • Yahoo SearchMonkey / Google Rich Snippets
      • Browser extensions and facet systems
    • Visualization Tools
      • Simile Project ( http://simile.mit.edu/ )
      • Several Commercial Companies
    User-layer Tools Server Infrastructure
  • 22. Evolving Conceptions for the Semantic Web
    • Semantic markup would be tightly associated with individual web pages
      • “ Translate the Web for machines”
      • Markup lives in associated RDF/XML
    • Core problem is labeling free-text web pages with a (pre-defined) markup vocabulary
      • Entity extraction and ontology-aware NLP
      • Machine learning technologies
      • Manual annotation tools
    • Need an all-encompassing ontology or set of logically compatible ontologies
      • Agreement was always elusive
    • Need to achieve scale and consistency in semantic page markup
      • Manual annotation has incentive, quality/ consistency, and cost/benefit issues
    Initial Semantic Web Conception* * By most people but not Tim Berners-Lee Hypertext Web (HTML/XML over HTTP) Semantic Web (RDF/OWL over HTTP) Links via URLs
  • 23. Evolving Conceptions for the Semantic Web
    • The Web is a publishing platform for formal knowledge as well as pages
      • RDFa allows mixed HTML and RDF
      • RDF/OWL documents can be published by web-connected databases
      • Huge numbers of knowledge publishers
      • Linkage via owl:sameAs, owl:equivalentClass, owl:equivalentProperty, skos:exactMatch, etc.
    • Core problem is maintaining a set of evolving and partial agreements on semantic models and labels
      • Semantic Web rule languages (SWRL, SILK, etc.) can be used to specify mappings and symbolic relations
      • Overall semantic cohesion is increased as more data is mapped together
      • Hard problem is cost-effectively maintaining semantic models
    • Supplemental semantics is carried in the free-text web
    The Semantic Web in 2010 Hypertext Web (HTML/XML over HTTP) Semantic Web (RDF/OWL over HTTP) Links via URLs ... Semantic Data Publishers (publication of RDF and SPARQL endpoints) RDFa RDFa
  • 24. Recapitulating the Semantic Web
    • Semantic Web = “The World Wide Database”
      • Anyone can read, anyone can write, with very loose version control
      • “ Schema-last” data engineering
        • The semweb is a database with a constantly-changing Entity/Relation diagram
        • High data flexibility at the cost of non-optimized queries
      • Most current Semantic Web data originates in traditional databases
    • Semantic Web encourages the collective publication of data ontologies and data links
      • Data contexts/ontologies can be created, published, and linked to others
      • Data items can be extended, linked, grouped, and analyzed within contexts
      • Easy to add value by linking data with relationships
      • Trust relations work in the same way as the text web
      • Network effects are beginning
    The Semantic Web is developing into a web-scale social platform for data integration and fusion
  • 25. 2 Upfront Challenges in the Semantic Web
    • The Semantic Web is not consistent
      • Semantic Web data is published by many people and there is no central control
        • The Semantic Web is a mix of formal and informal usages
        • Consistent usage is only found in islands
      • How do you select a data context (ontology/vocabulary)?
        • Sindice catalogs >36000 uses of the term “publication” in RDF, tags, RDFa, microformats); how do you choose?
        • How do you harmonize with other data publishers so that your data can be found and used?
        • An opportunity to dust off all the old classification standards!
    • RDF/OWL is too simple for many kinds of data
      • Numeric relations can be awkward to express
        • Example: “Jane has a 50% chance of having the H1N1 flu.”
      • RDF/OWL core semantics have no good support for units, probabilities, uncertainty, etc.
      • Statements about statements are complex
        • Example: “Amazon says Sony Bravia-series LCD TVs are Energy Star compliant.”
  • 26. Talk Outline: The Evolving Semantic Web
    • The Origins of the Semantic Web
      • DARPA’s DAML Program
      • RDF and OWL
    • Semantic Web Evolution to 2010
    • Semantic Web Transformation in Three Application Areas
      • Markets and Companies
    • The Fourth Generation
      • Linked Data and the Scaling Revolution
  • 27. First Generation Semantic Web Applications Semantically-Boosted Search and Presentation
  • 28. Semantic Web in Search and Presentation
    • Yahoo! SearchMonkey (see also Google’s Rich Snippets, Bing Captions) use semantic data to enhance snippet presentation
      • Yahoo’s crawler indexes and interprets user RDFa and microformats
      • GreaseMonkey-style presentation reformatting for search snippets
      • Display URL as an enhanced result, with standard or custom presentations
    • Nick Cox (Yahoo!) reported up to 15% greater CTR for rich presentation
      • SearchMonkey code seems not to be working currently
      • Incentives: “Structured data is the new SEO” (Dries Buytaert, Drupal)
    Alex Moskalyuk | Facebook Alex Moskalyuk is on Facebook. Join Facebook to connect with Alex Moskalyuk and others you may know. Facebook gives people the power to share and makes the ... www.facebook.com/.../Alex-Moskalyuk/100000220291330?... - Cached - Similar
  • 29. Why Should a Business Use Semantic Web Markup?
  • 30. Second Generation Semantic Web Applications
    • An only slightly newer problem type
      • Business exploitation of structured enterprise data (RDBMS, Spreadsheet, ERP data)
        • Backwards to Data Management to reduce cost of managing, migrating, integrating
        • Forwards to Business Process Management
      • Support for unified query, analytics, and application access
      • An original government use case for the semantic web
    • Markets Segments and Players
      • Gartner estimates that EII software and services alone is $14B/year, with 40% growth over 5 years (pre-recession numbers, though)
      • Very complex market space includes EAI, Entity Analytics, MDM, BI, BPM, CPM...
      • Huge entrenched players (IBM, SAP, Oracle...) and major consulting shops
    • Some lessons with applying semantic web technology in this space
      • Fundamental problem is understanding the semantics from legacy systems, not in KR
      • Pure Semantic technology companies tend to be unsophisticated about the customer
      • RDF/OWL is typically too weak and must be augmented by rules, quantities, etc.
      • Raw performance is typically inferior to a well-designed RDBMS
      • Tends to be an IT sale (not Line-of-Business sale), with attendant cost pressures
    More Agile Business Intelligence
  • 31. Layers of Semantic Data in Strategic Enterprise IT Semantic data modeling Automatic data management EII - Information integration leadership BI -- Better business intelligence Financial modeling & intelligence Semantic process modeling BAM and CPM w/ predictive analytics MDM – Master Data Management Increasing Value of Semantic Data Models
    • Sales Pitch for Semantic Data Models
      • Promote flexibility and improvisation in the face of dynamism
      • Expose business processes as rules, for governance and compliance
      • Can be driven all the way through the architecture, from SOA to CPM dashboards
    • This vision has not (to my knowledge) been proven in a large enterprise
  • 32. Semantic Web in Strategic Enterprise IT
    • A few companies are using semantic web technology in Enterprise IT contexts
      • Majority of revenue is usually from consulting engagements, not software products or services
      • Carefully-curated external ontologies with logic engines are used for flexible high-end data integration
      • RDF/OWL has been useful as a data language
      • OWL inferences complement SQL joins
    • Data.gov and data.gov.uk
      • Provide the data for agile “citizen analytics”
    • Lower-cost Data Pooling
      • Enterprises can use the Semantic Web as a tool to share the costs of administering foundational information
      • Reference information can be voluminous and expensive to maintain
      • Example: Linking Open Drug Data project
        • Every drug company has to maintain databases of proteins, pathways, genes, diseases, etc.
        • Several major pharma companies have experimented with allowing some of this data to be shared and maintained in the semantic web
        • Currently includes 8M RDF triples and 370K links
  • 33. Third Generation Semantic Web Applications
    • A new problem type
      • “ Semantic Web should allow people to have a better online experience” – Alex Iskold, AdaptiveBlue
      • Enhance the human activities of content creation, publishing, linking my data to other data, socializing, forming community, purchasing satisfying things, browsing, etc.
      • Improve the effectiveness of advertising
    • Market Segments and Players
      • Data outsourcing model (Evri, Freebase, ...)
      • Semantic enhancements to blogs and wikis (Zemanta, Faviki, Ontoprise, Radar, ...)
      • Semantics for Social Networking (MySpace RDF service and microformats, Facebook RDF models, etc.)
    • Some lessons with applying semantic web technology in this space
      • If we don’t have semantic convergence, then semantics isn’t a differentiator
      • No one really knows the design principles that allow some Web 2.0/3.0 sites to be successful and others to never get traction
    Web 2.0 and the Socio-Semantic Web
  • 34. Data Outsourcing on the Semantic Web
    • Do for general data what Bloomberg and Capital-IQ do for financial data
      • Collect data, clean it, fuse it with quality control
      • Sell subscriptions, custom datasets and feeds
      • Knowledge required for targeted advertising
    • Example: Evri
      • Use semantic technology to filter and classify the stream
      • Create semantic contexts and personalized points of view
      • Follow topic spaces, high-level abstractions, entity clusters
      • Used by: Washington Post, LetMeKnow, Yahoo
    • Example: Freebase
      • A huge collection of structured data harvested from many sources
      • Contributions from individuals in a wiki-style format
      • Building a global resource of “almanac”-style data, with widgets to supply autocomplete suggestions, better tags, related items, topic hubs, etc.
      • Used by Bing, Wall Street Journal, Google, etc.
  • 35. Large Web Knowledge Bases: Freebase
    • ~1GB of socially-curated RDF almanac data plus data from partners (Creative Commons licensed)
  • 36. Semantic Blogger Support: Zemanta
    • Automatic link, image, keyword, tag suggestions for blogs and email
      • Average semi-professional blogger spends ~20 mins adding “decorative” content
    • Key technology is fast language processing
    • Markup accuracy is guaranteed because users explicitly add the suggestions (feedback)
      • Includes user specified friends/feeds/photos/etc as well as standard ones
  • 37. Semantic Wikis
  • 38. Example: HL7 Healthcare Vocabulary Management
    • Large-scale terminology management at www.biomedgt.org .
  • 39. Semantic Data in HL7 (www.BioMedGT.org)
    • Subject-matter experts give simple, authorable statements
    • Semantic data provides pivotable dimensions
  • 40. Talk Outline: The Evolving Semantic Web
    • The Origins of the Semantic Web
      • DARPA’s DAML Program
      • RDF and OWL
    • Semantic Web Evolution to 2010
    • Semantic Web Transformation in Three Application Areas
      • Markets and Companies
    • The Fourth Generation
      • Linked Data and the Scaling Revolution
  • 41. The Background of Knowledge: DBpedia
    • Extract Wikipedia assertions
      • ~23M Factbox assertions
      • Category assertions, link structures, tables
    • DBpedia 3.4 dataset (Nov 09 Wikipedia)
      • ~2.9M things, ~479M triples
        • 282K persons, 339K places, 88K music albums, 44K films, 870K links to images, 3.8M links to relevant external web pages, 4.9M links into RDF datasets
      • Classifications via Wikipedia and YAGO categories
      • One of the largest broad knowledge bases in the world
    • Simple queries over extracted data
      • “ Things near the Eiffel Tower”
      • “ The official websites of companies with more than 50000 employees”
      • “ Soccer players from team with stadium with >40000 seats, who were born in a country with more than 10M inhabitants”
  • 42. DBpedia for Users
    • Query Wikipedia like a database
    DBpedia Mobile
  • 43. Putting it all Together: Linked Open Data
    • Goals
      • Create a single, simple set of rules for publishing and linking RDF data
      • Build a data commons by making open data sources available as RDF
      • Set RDF links between data items from different data sources
    • May 2009 LOD dataset
      • ~4.7B triples, and ~100M RDF interlinks, growing faster than I can track
      • Estimate 13.5B triples now with ~140M interlinks
      • Database linkage means that LOD will soon be impossible to count except via order of magnitude
  • 44. May 2007 September 2007 March 2009 September 2008 The Growing Web of Linked Data
  • 45. Topic Distribution in the March 2009 Linked Datasets Life Sciences Publications Online Activities Music Geography Cross-Domain 4,500,000,000 triples 180,000,000 data links
  • 46. Summing up: Building the Largest Database in the World
    • In mid-2004...
      • RDF and OWL had just been standardized
      • Advances were made via traditional corporate/public R&D programs
      • The first wave of semantic web startups (many of which have since failed)
      • Implementations were technically very sophisticated, but fully custom and had no web involvement
      • A few early conferences (ISWC, SemTech)
  • 47. Summing up: Building the Largest Database in the World
    • In mid-2004...
      • RDF and OWL had just been standardized
      • Advances were made via traditional corporate/public R&D programs
      • The first wave of semantic web startups (many of which have since failed)
      • Implementations were technically very sophisticated, but fully custom and had no web involvement
      • A few early conferences (ISWC, SemTech)
    • Now in 2010...
      • The Semantic Web is the most exciting thing happening on the web
      • RDF assertions scaling into the billions, with little to no programmatic control
      • Search engines are starting to develop snippet products
      • Early adopters are publishing RDFa and seeing benefits
      • Network effects are starting
  • 48. Summing up: The Largest Knowledge Base in the World
    • In mid-2004...
      • RDF and OWL had just been standardized
      • Advances were made via traditional corporate/public R&D programs
      • The first wave of semantic web startups (many of which have since failed)
      • Implementations were technically very sophisticated, but fully custom and had no web involvement
      • A few early conferences (ISWC, SemTech) and session tracks
    • Now in 2009...
      • The Semantic Web is the most exciting thing happening on the web
      • RDF assertions scaling into the billions, with little to no programmatic control
      • Search engines are starting to develop products
      • Bestbuy is publishing products, store descriptions, hours in RDFa
    “ Linked Data Now!!” -- Sir Tim Berners-Lee Put your data on the Semantic Web Link it to Other Data Be part of the Revolution!
  • 49. Thank You Disclaimer: The preceding slides represent the views of the author only. All brands, logos and products are trademarks or registered trademarks of their respective companies.