Your SlideShare is downloading. ×
  • Like
Semantic Web: introduction & overview
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Semantic Web: introduction & overview


A lecture/conversation focusing on the first 12 years of Semantic Web - delivered on February 21, 2012. …

A lecture/conversation focusing on the first 12 years of Semantic Web - delivered on February 21, 2012.

See for more details. More detailed course material is at

Published in Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Suddenly Semantic Search is hot. While some well-known companies are incorporating semantics in their search now, semantic search (and semantic browsing, advertisement/targeting, etc) is not new (at conceptual, research or commercial levels). Here are just some links [apologies for self-citations as they are available to me- may want to search to find other efforts]:

    A 2000 talk on commercial semantic search, browsing, etc:

    A 2000 interview on Semantic Search:

    A 2000/2001 patent (System and method for creating a semantic web and its applications in browsing, searching, profiling, personalization and advertising) :

    A 2002 paper:

    A 2003 keynote : 'Semantic Web in Action: Ontology-driven information search, integration and analysis' :

    A 2005 'exchange' on thoughts on one way of adding semantics to search by a Google director and my response (and this is broadly how GKG is exploited now, incorporating entities /things, relationships):
    Are you sure you want to
    Your message goes here
  • This slide deck was used for a short course taught in early 2012 on-line for international students at universities in India. More recently, We have a low cost book that carries this subject matter further: 'Semantics Empowered Web 3.0: Managing Enterprise, Social, Sensor, and Cloud-based Data and Services for Advanced Applications' Furthermore, all materials (Slides, Video, assignments) used in the class that uses this book is available free: For instructors wishing to teach this course, contact me at for assistance.
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • RDF: Triple structure
  • Review types of heterogeneity. Why we need to reconcile data heterogeneityUniform Resource Locator: A network location and used as an identifier for resources on the Web. URL is a specific type of URI. URI can be used to refer to anythingIRI: In addition to ASCII character set, contains Universal Character Set (from RFC 3987)
  • RDF uses XML Schema datatypes
  • Allows creation of an abstract representation of domain
  • Allows creation of an abstract representation of domain
  • Review types of heterogeneity. Why we need to reconcile data heterogeneity
  • Review types of heterogeneity. Why we need to reconcile data heterogeneity
  • Review types of heterogeneity. Why we need to reconcile data heterogeneity
  • Taalee (subsequently Voquette and Semagix) was founded in 1999 as an Audio/Video Web Search Company (focus on A/V mainly for scalability and market focus reasons, servicename: MediaAnywhere). Domain models/ontologies were created in major areas (many more than what you can find on Bing in 2011) and automatically populated to build knowledge bases (populated ontologies or WorldModel) from a variety of structured and semistructured sources, and periodically kept up to date. This was than used for semantic annotation/metadata extraction to drive semantic search, browsing, etc applications over data crawled from Web sites.
  • The important thing is that the system knew that Robert Duval is a movie actor, is a different person that David Duval who is a golfer and a sportsperson, and had understanding of a variety of relationships Robert Duval participates in – such as
  • Obtained from Ivan’s slide
  • Obtained from Ivan’s slide
  • Obtained from Ivan’s slide
  • Obtained from Ivan’s slide
  • Obtained from Ivan’s slide
  • Let me give a technological introduction to what our center is about: we all face a fire hose of data-- Pubmed adds 2000 to 4000 citations per day, it is usual to add about 5 gig from a single run of a scientific experiment -- and just imagine how much data created by all the cameras and 40 billion mobile sensors in the world! But even with all the search and browsing tools we have, we face huge information glut. How do we make sense from the data? Just as humans apply their knowledge and experience to understand what they see– we apply domain model or knowledge to attach meaningful labels to these data. Then we can apply computational techniques to visualize, provide situational awareness, discovery nuggets of knowledge of information and insight. For example, from all that biomedical data, what a scientist may be looking for is– how can we treat Migraine? What has Magnesium to do with Migraine? Why does Magnesium deficiency cause Migraine? What is the process by which Magnesium affects Migraine?
  • Kno.e.sis has 15 faculty in Computer Science, life sciences and health care, cognitive science and business. It has about 50 PhD students and post docs– about 2/3 of these in Computer Science. Its faculty members have 40 labs, and occupies a majority of 50K sqft Joshi Research Center. Its students are highly successful– eg tenure track faculty @ Case Western Reserve Univ or Researcher at IBM Almaden. It has received recent funding from funding from Microsoft Research. IBM Research, HP Labs, Google, and small companies (Janya, EZdi,…) and collaborates with many more (Yahoo! Labs, NLM, …).


  • 1. Semantic Web: intro & overview A conversation with students – Feb 21, 2012 Amit Sheth – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH, USA 1
  • 2. What are two of the mostimportant software success stories of 2012?
  • 3. Apple’s Siri IBM’s WatsonWhat are common technologies?
  • 4. Just stepping back a bit
  • 5. Semantic technologies in the mainstream• Microsoft purchased Powerset in 2008• Apple purchased Siri [Apr 2010] – “Once Again The Back Story Is About Semantic Web”• Google buys Metaweb [June 2010]...” Google Snaps Up Metaweb in Semantic Web Play” – Now see: “Google Knowledge Graph Could Change Search Forever”• FacebookOpenGraph, Twitter annotation …”another example of semantic web going mainstream” “Google, Twitter and Facebook build the semantic web” 5
  • 6. • RDFa adoption ….Search engines (esp Bing) are about to introduce domain models and (all) use of background knowledge/structured databases with large entity bases• Bing, Yahoo! and Google announced
  • 7. A bit of history• Semantics with metadata and ontologies for heterogeneous documents and multiple repositories of data including the Web was discussed in 1990s (semantic information brokering, faceted search, InfoHarness, SIMS, Ariadne, OBSERVER, SHOE, MREF, InfoQuilt, …). Also DAML and OIL.• Tim Berners-Lee used “Semantic Web” in his 1999 book• I had founded a company Taalee in 1999, gave a keynote on Semantic Web & commercialization in 2000 and filed for a patent in 2000 (awarded 2001).• Well known TBL, Hendler, Lassila paper in Scientific American took AI-ish approach (agents,…) to Semantic Web• First 5 years saw too much of AI/DL, but more practical/applied work has dominated recently
  • 8. Different foci• TBL – focus on data: Data Web (“In a way, the Semantic Web is a bit like having all the databases out there as one big database.”)• Others focus on reasoning and intelligent processing
  • 9. 1 2 3 ofSemantic Web
  • 10. 1• Ontology: Agreement with a common vocabulary/nomenclature, conceptual models and domain Knowledge• Schema + Knowledge base• Agreement is what enables interoperability• Formal description - Machine processability is what leads to automation
  • 11. 2• Semantic Annotation (Metadata Extraction): Associating meaning with data, or labeling data so it is more meaningful to the system and people.• Can be manual, semi-automatic (automatic with human verification), automatic.
  • 12. From Syntax to Semantics Deep semanticsShallow semantics Changing Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics
  • 13. 3• Reasoning/Computation: semantics enabled search, integration, answering complex queries, connections and analyses (paths, sub graphs), pattern finding, mining, hypothesis validation, discovery, visualization
  • 14. Semantic Web Stack• Web of Linked Data• Introduced by Berners Lee et. al as next step for Web of Documents• Allow “machine understanding” of data,• Create “common” models of domains using formal language - ontologies Semantic Web Layer Cake Layer cake image source:; see W3C SW publications
  • 15. Characteristics of Semantic Web Self Easy to Describing Understand The Semantic Web:Machine &Issued by XML, RDF & Ontologya Trusted HumanAuthority Readable Can be Convertible Secured Adapted from William Ruh (CISCO) 15
  • 16. Resource Description Framework Location Company Armonk, New York, United States IBM Zurich, Switzerland• Resource Description Framework – Recommended by W3C for metadata modeling [RDF]• A standard common modeling framework – usable by humans and machine understandable RDF/OWL slides From: Semantic Web in Health Informatics (thanks: Satya)
  • 17. RDF: Triple Structure, IRI, Namespace Headquarters located in Armonk, New York, IBM United States• RDF Triple o Subject: The resource that the triple is about o Predicate: The property of the subject that is described by the triple o Object:The value of the property• Web Addressable Resource:Uniform Resource Locator (URL), Uniform Resource Identifier(URI), Internationalized Resource Identifier (IRI)• Qualified Namespace: asxsd: o xsd: string instead of
  • 18. RDF Representation• Two types of property values in a triple o Web resource Headquarters located in IBM Armonk, New York, o Typed literal United States Has total employees IBM “430,000” ^^xsd:integer • The graph model of RDF:node-arc-node is the primary representation model • Secondary notations: Triple notation o companyExample:IBM companyExample:has-Total- Employee “430,000”^^xsd:integer .
  • 19. RDF Schema Headquarters located in Armonk, New IBM York, United States Headquarters located in Redwood Oracle Shores, California, United States Headquarters located in Company Geographical Location• RDF Schema: Vocabulary for describing groups of resources [RDFS]
  • 20. RDF Schema • Propertydomain(rdfs:domain) and range(rdfs:range) Domain Headquarters located in Range Company Geographical Location • Class Hierarchy/Taxonomy:rdfs:subClassOf SubClass rdfs:subClassOf (Parent) ClassComputer Technology CompanyCompanyBanking CompanyInsurance Company
  • 21. Ontology: A Working Definition• Ontologies are shared conceptualizations of a domain represented in a formal language*• Ontologies: o Common representation model - facilitate interoperability, integration across different projects, and enforce consistent use of terminology o Closely reflect domain-specific details (domain semantics) essential to answer end user o Support reasoning to discover implicit knowledge* Paraphrased from Gruber, 1993
  • 22. Expressiveness Range: Knowledge Representation and Ontologies KEGG TAMBIS BioPAX Thesauri “narrower Disjointness, term” Formal Frames Inverse, relation is-a (properties) part of…Catalog/ID DB Schema UMLS RDF RDFS DAML CYC Wordnet OO OWL IEEE SUO Informal Formal Value General Terms/ is-a instance Restriction Logical glossary constraints GO SWETO GlycO EcoCycSimple Pharma ExpressiveTaxonomies Ontologies Ontology Dimensions After McGuinness and Finin
  • 23. OWL2 Web Ontology Language• A language for modeling ontologies [OWL]• OWL2 is declarative• An OWL2 ontology (schema) consists of: o Entities:Company, Person o Axioms:Company employs Person o Expressions:A Person Employed by a Company = CompanyEmployee• Reasoning: Draw a conclusion given certain constraints are satisfied o RDF(S) Entailment o OWL2 Entailment
  • 24. OWL2 Constructs• Class Disjointness: Instance of class A cannot be instance of class B• Complex Classes: Combining multiple classes with set theory operators: o Union:Parent =ObjectUnionOf(:Mother :Father) o Logical negation:UnemployedPerson = ObjectIntersectionOf(:EmployedPerson) o Intersection:Mother =ObjectIntersectionOf(:Parent :Woman)
  • 25. OWL2 Constructs• Property restrictions: defined over property• Existential Quantification: o Parent =ObjectSomeValuesFrom(:hasChild :Person) o To capture incomplete knowledge• Universal Quantification: o US President = objectAllValuesFrom(:hasBirthPlace United States)• Cardinality Restriction
  • 26. SPARQL: Querying Semantic Web Data• A SPARQL query pattern composed of triples• Triples correspond to RDF triple structure, but have variable at: o Subject: ?companyex:hasHeadquaterLocationex:NewYork. o Predicate: ex:IBM?whatislocatedinex:NewYork. o Object: ex:IBMex:hasHeadquaterLocation?location.• Result of SPARQL query is list of values – valuescan replace variable in query pattern
  • 27. SPARQL: Query Patterns• An example query patternPREFIX ex:<>SELECT?company ?location WHERE{?company ex:hasHeadquaterLocation?location.}• Query Result company location Multiple Matches IBM NewYork Oracle RedwoodCity MicorosoftCorporation Bellevue
  • 28. SPARQL: Query Forms• SELECT: Returns the values bound to the variables• CONSTRUCT: Returns an RDF graph• DESCRIBE: Returns a description (RDF graph) of a resource (e.g. IBM) o The contents of RDF graph is determined by SPARQL query processor• ASK: Returns a Boolean o True o False
  • 29. a little bit about ontologies
  • 30. Many Ontologies Available Today Open Biomedical Ontologies ,
  • 31. From simple ontologies
  • 32. Drug Ontology Hierarchy (showing is-a relationships) formulary_ non_drug_ interaction_ property formulary reactant property indication indication_ property owl:thingmonograph property _ix_class prescription interaction_ _drug_ with_non_ brandname_ prescription brand_name drug_reactantprescription individual _drug interaction _drug_ property brandname_ brandname_ composite prescription interaction_ undeclared _drug_ with_mono interaction_ generic graph_ix_cl with_prescri cpnum_ generic_ ass ption_drug group composite generic_ individual
  • 33. to complex ontologies
  • 34. N-Glycosylation metabolic pathway GNT-I attaches GlcNAc at position 2N-glycan_beta_GlcNAc_9 N-acetyl-glucosaminyl_transferase_V N-glycan_alpha_man_4 GNT-V attaches GlcNAc at position 6 UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021
  • 35. A little bit about semanticmetadata extractions and annotations
  • 36. Extractionfor Metadata Creation Nexis Digital Videos UPI AP ... ... Feeds/ Data Stores Documents WWW, Enterprise Digital Maps Repositories ... Digital Images Digital Audios Create/extract as much (semantics) metadata automatically as possible;Use ontlogies to improve and enhance EXTRACTORS extraction METADATA
  • 37. Automatic Semantic Metadata Extraction/Annotation
  • 38. Semantics & Semantic Web in 1999-2002
  • 39. Sample applications• Early Semantic Search, use baby steps of today’s engines• Enterprise applications – healthcare & life sciences, financial, security• Driving the innovation with new types of data: sensor (Semantic Sensor Web), social (Semantic Social Web), semantic IoT/WoT
  • 40. Taalee Semantic/Faceted Search & Browsing(1999-2001) BLENDED BROWSING & QUERYING INTERFACE Targeted e-shopping/e-commerce ATTRIBUTE & KEYWORD QUERYING assets accessSEMANTIC BROWSING uniform view of worldwide distributed assets of similar type
  • 41. Semantic Search/Browsing/Directory (2001-….) Links to news on companies that compete against Commerce One Crucial news on Commerce One’s competitors (Ariba) can Links to news on companieseasily and be accessed Commerce One competes against automatically (To view news on Ariba, click on the link Search for company for Ariba) ‘Commerce One’
  • 42. Semantic Search/Browsing/Directory (2001-….) System recognizes ENTITY & CATEGORY Relevant portion of the Directory is automatically presented.
  • 43. Semantic Search/Browsing/Directory (2001-….) Users can explore Semantically related Information.
  • 44. Equity Research Dashboard with Blended Semantic Querying and Browsing Automatic 3rd party Focused content relevantintegration content organized by topic (semantic categorization) Related relevant content not explicitly asked for (semantic associations) Automatic Content Aggregation from multiple Competitive content providers research and feeds inferredautomatically
  • 45. Extracting Semantic Metadata from Semistructured and Structured Sources (1999 – 2002)Semagix Freedom for buildingontology-driven information system Managing Semantic Content on the Web
  • 46. Ontology Creation and Maintenance Steps 1. Ontology Model Creation (Description) 2. Knowledge Agent Creation Ontology Semantic Query Server 4. Querying the Ontology 3. Automatic aggregation of Knowledge © Semagix, Inc.
  • 47. Semantic Associations - Connecting the Dots Ahmed Yaseer: • Appears on Watchlist ‘FBI’ Watch list Organization • Works for Company ‘WorldCom’ Hamas FBI Watchlist • Member of a banned member of organization organization’ appears on Watchlist Ahmed Yaseer works for Company WorldCom Company 2004 SEMAGIX 47
  • 48. Global Investment Bank Law Public World Wide BLOGS, Watch Lists Enforcement Regulators Records Web content RSS Semi-structured Government Data Un-structure text, Semi-structured DataEstablishingNew Account User will be able to navigate the ontology using a number of different interfaces Scores the entity based on the content and entity relationships Fraud Prevention application used in financial services – Related KYC application is deployed at Majority of Global Banks
  • 49. Fast forward to 2005-2006
  • 50. Semantic Web+Clinical Practice Informatics =Active Semantic Electronic Medical Record (ASEMR) Operationally deployed in January 2006, in use (as of 2012)
  • 51. ASEMR: SW application in useIn daily use at Athens Heart Center – 28 person staff • Interventional Cardiologists • Electrophysiology Cardiologists – Deployed since January 2006 – 40-60 patients seen daily – 3000+ active patients – Serves a population of 250,000 people
  • 52. Information Overload in Clinical Practice• New drugs added to market – Adds interactions with current drugs – Changes possible procedures to treat an illness• Insurance Coverages Change – Insurance may pay for drug X but not drug Y even though drug X and Y are equivalent – Patient may need a certain diagnosis before some expensive test are run• Physicians need a system to keep track of ever changing landscape
  • 53. Active Semantic Document (ASD)A document (typically in XML) with the following features:• Semantic annotations – Linking entities found in a document to ontology – Linking terms to a specialized lexicon [TR]• Actionable information – Rules over semantic annotations – Violated rules can modify the appearance of the document (Show an alert)
  • 54. Active Semantic Patient Record• An application of ASD• Three Ontologies – Practice Information about practice such as patient/physician data – Drug Information about drugs, interaction, formularies, etc. – ICD/CPT Describes the relationships between CPT and ICD codes• Medical Records in XML created from database
  • 55. Active Semantic Electronic Medical Record AppIn Use Today at Athens Heart Center For Clinical Decision Support since January 2006Amit P. Sheth, S. Agrawal,JonathanLathem, Nicole Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active SemanticElectronic Medical Record, Proc. of the 5th International Semantic Web Conference, 2006
  • 56. Demo of ASEMR and other applications
  • 57. Benefits of ASEMR• Error prevention (drug interactions, allergy) – Patient care – insurance• Decision Support (formulary, billing) – Patient satisfaction – Reimbursement• Efficiency/time – Real-time chart completion – “semantic” and automated linking with billing
  • 58. Using large data sets for StructuredData on the web:Linked Open Data – samples from2005 to 2010
  • 59. Linked Open Data Publish Open Data Sets in RDF By 2010, 203 data data sets 25 billion TriplesImage:
  • 60. You publish the raw data… Semantic Web Adoption and Application
  • 61. … and others can use it Semantic Web Adoption and Application
  • 62. Using the LOD to build Web site: BBC Semantic Web Adoption and Application
  • 63. Using the LOD to build Web site: BBC Semantic Web Adoption and Application
  • 64. GoodRelations Ontology - RDFa Semantic Web Adoption and Application
  • 65. GoodRelations Ontology - RDFa Semantic Web Adoption and Application
  • 66. GoodRelations Ontology - RDFa Semantic Web Adoption and Application
  • 67. Fast forward to 2010-2011
  • 68. Schema.orgShared Amazing things can happenVocabulary Will give some on-line examples
  • 69. Twitris: Semantic Social Web Mash-up Select date Select topic N-gram summaries Topic tree Sentiment Spatial Marker Tweet trafficImages & Videos Analysis Related tweets Reference news Wikipedia articles TWITRIS
  • 70. Web (and associated for Human Experience is Computing computing) Enhanced Experience, Tech assimilated in life evolving/using semantics to leverage text Web as an oracle assistant / partner - “ask the Web”: Situations, 2007 + data + services Events - Powerset Objects Web ofpeople, Sensor Web - social networks, user-createdcasualcontent - 40 billionsensors, 500M+ FB users, 1B tweets/wk Patterns Web of resources - data, service, data, mashups Keywords - 4 billionmobilecomputing Web of databases1997 - dynamically generated pages - web query interfaces Web of pages - text, manually created links - extensive navigation
  • 71. 2D-3D & Immersive Visualization, Human Impacting affects Computer Interfaces bottom line MigraineOntologies/Dom Magnesium ain Models/ inhibit isa Stress Knowledge Patient Calcium Channel Blockers Knowledge discovery SEMANTICS, MEANING PROCESSING Semantic Search/ Browsing/Personalization/ Patterns / Inference / Reasoning Analysis, Knowledge Meta data / Discovery, Semantic Visualization, Annotations Situational Awareness Search and Metadata Extraction/Semantic Annotations browsing Big data Structured text (Scientific Experimental Public domain publications / Clinical Trial Data knowledge white papers) Results (PubMed) 71
  • 72. Semantics as core enabler, enhancer @ Kno.e.sis
  • 73. Take Home Message (Cont.)Semantics play a key role in refering"meaning" behind the data. Requiresprogress from keywords ->entities ->relationships ->events, from raw data tohuman-centric abstractions.
  • 74. Take Home Message (Cont.)Wide variety of semantic models andKBs(vocabularies, social dictionaries, community created semi-structured knowledge, domain-specific datasets,ontologies)empower semantic solutions. This canlead to Semantic Scalability – scalability that ismeaningful to human activities and decisionmaking.
  • 75. Interested in more?Kno.e.sis Wiki for the following and more:• Computing for Human Experience• Continuous Semantics to Analyze Real-Time Data• Semantic Modeling for Cloud Computing• Citizen Sensing, Social Signals, and Enriching Human Experience• Semantics-Empowered Social Computing• Semantic Sensor Web• Traveling the Semantic Web through Space, Theme and Time• Relationship Web: Blazing Semantic Trails between Web Resources• SA-REST: Semantically Interoperable and Easier-to-Use Services and Mashups• Semantically Annotating a Web ServiceTutorials: Semantic Web:Technologies and Applications for the Real-World (WWW2007)Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications (WWW2011)Partial Funding: NSF (Semantic Discovery: IIS: 071441, Spatio Temporal Thematic: IIS-0842129), AFRL and DAGSI (Semantic Sensor Web), Microsoft Research (Semantic Search) and IBM Research (Analysis of Social Media Content),and HP Researh (Knowledge Extraction from Community-Generated Content).
  • 76. Future: Computing for Human Experience Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA Vision Paper: Computing for Human Experience: 76