Ontology and semantic web (2016)

•Download as PPT, PDF•

4 likes•1,358 views

This document provides an outline and overview of ontologies and the semantic web. It discusses triples, ontologies, SPARQL, inferencing, and methodology for creating a semantic network. Triples, reification, confidence levels, ontology design, architecture, and components are covered. Methodology includes building a semantic network from natural language processing.

July 2011
cmtrim@us.ibm.com

Ontologies and the Semantic Web

© 2012 IBM Corporation

Outline

 Triples
– Reification
– Confidence Levels

 Ontology
– Design
– Architecture (big picture)
– SPARQL
– Inferencing

 Methodology
– Creating a Semantic Network

© 2012 IBM Corporation

Triples


Subject Predicate Object

“The author of Hamlet is Shakespeare”

Shakespeare authorOf Hamlet

Hamlet hasAuthor Shakespeare

© 2012 IBM Corporation

Triples


“Shakespeare wrote Hamlet in 1876”

Shakepeare authorOf Hamlet

Hamlet writtenIn 1876

© 2012 IBM Corporation

Triples (Reification)

Wikipedia states “Shakespeare wrote Hamlet in 1876”
 Wikipedia states Shakepeare
 Shakepeare authorOf Hamlet
 Hamlet writtenIn 1876

© 2012 IBM Corporation

Triples (Reification)

Wikipedia states “Shakespeare wrote Hamlet in 1876”

Wikipedia states (Hamlet writtenIn 1876)

Shakespeare authorOf Hamlet

© 2012 IBM Corporation

Triples (Confidence Levels)

 ShakespeareOnline states (Hamlet writtenIn 1599)
 Wikipedia states (Hamlet writtenIn 1876)

 When was Hamlet written?
– 1599
– 1876

© 2012 IBM Corporation

Triples (Confidence Levels)

 Go from this:
– ShakepeareOnline states (Hamlet writtenIn 1599)
 To this:
– (ShakepeareOnline states (Hamlet writtenIn 1599)) hasConfidenceLevel 90

© 2012 IBM Corporation

Triples (Confidence Levels)

© 2012 IBM Corporation

What is an Ontology?

 Description of the kinds of entities there are and how they are related (Chris Welty)

© 2012 IBM Corporation

Ontology

 “Shakespeare wrote Hamlet in 1876”
 How many “types” of things are there in this statement?
– Authors
– Books
– Plays
– Years
– Sources
– Characters
 What relationships could exist between these types?

© 2012 IBM Corporation

Ontology

 Author
– Playwright {Shakespeare, Marlowe}
 Book
– Play {Hamlet, Macbeth, Faustus}

 RDF:
– Shakepeare a Playwright
– Shakepeare a Author
– Hamlet a Play
– Hamlet a Book

© 2012 IBM Corporation

 William Shakespeareen2:Playwright was an English poet and
playwright, widely regarded as the greatest writer in the
English language and the world's pre-eminent dramatist.

© 2012 IBM Corporation

Semantic Chains


AIX hasCommand topas monitors
(process uses (CPU hasComponent
resources))

© 2012 IBM Corporation

SPARQL

SELECT ?command
WHERE {
AIX hasCommand ?command .
?command monitors/uses CPU
}

© 2012 IBM Corporation

Inference

Ontology Model (Classes):
 Product
– SupportedProduct (x hasMaker IBM)
 Company
– IBM
– NonIBM (disjoint to IBM)
• { Microsoft, Oracle, Teradata)

Ontology Model (Predicates):
 <Product> hasMaker <Company>

Triple Store data:
 Rational Software Architect hasMaker IBM
 Rational Software Architect a SupportedProduct

© 2012 IBM Corporation

Tivoli Monitoring hasSynonym ITM

© 2012 IBM Corporation

Tivoli Monitoring hasSynonym ITM
ITM hasComponent ITM Agent

© 2012 IBM Corporation

Tivoli Monitoring hasSynonym ITM
ITM hasComponent ITM Agent
Tivoli Monitoring hasComponent Tivoli Monitoring Agent
Tivoli Monitoring Agent hasSynonym ITM Agent

© 2012 IBM Corporation

“Agent” analysis

itm agent 54
db2 agent 32
os agent 32
ul agent 31
monitoring agent 29
oracle agent 22
agent needs 21
itm ul agent 16
windows os agent 15
agent left 14
agent system 14
citrix agent 14
mysap agent 14
unix os agent 13 © 2012 IBM Corporation

Proximal Verbs (normalized)

monitor
support
configure
run
start
show
build
appear
© 2012 IBM Corporation

Events

Situation Event
Omnibus Event
ITM Event
Minor Event
Triggering Event
Console Event
System Event
TBSM Event
JMX Event
TEC Event
© 2012 IBM Corporation

Blank Nodes

 Explict Characterization vs Implicit (Predicate-driven) Identification

© 2012 IBM Corporation

Blank Nodes

 What are blank nodes?
– A way of profiling entities
– A way of identifying entities without explicit identification
– Implicit identification
– Predicate driven identification of data (rather than explict characterization)

 Examples:
– “That person has a child”
– “That person has a child and a husband”

© 2012 IBM Corporation

Anonymous (Anon) Nodes

 What is the difference between an Anon Node and a Blank Node?
 An “anonymous node” is an existentially quantitifed variable ∃
 A typical RDF node has an identifier to which it is useful to refer

© 2012 IBM Corporation

Appendix A - Resources

 Glossary
 Books
 Common OWL Editors
 Triple Stores

© 2012 IBM Corporation

Glossary


OWL – Web Ontology Language


RDF – Resource Description Framework


SPARQL – Simple Protocol and RDF Query Language

© 2012 IBM Corporation

Books

 Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL
– Author(s): Dean Allemang and Jim Hendler
– Second Edition

© 2012 IBM Corporation

Common OWL Editors


TopBraid Composer (TBC)

Free Edition (also Standard + Maestro Editions)

http://www.topquadrant.com/products/TB_Composer.html

Protege

Free, open source ontology editor and knowledge-base framework

http://protege.stanford.edu/

© 2012 IBM Corporation

Triple Stores


Comparison and links here:

http://www.w3.org/wiki/LargeTripleStores

Sesame - scalable and transactional

May be more suited to web environments

Setup slightly more complex than Jena TDB

Jena TDB - scalable and very simple set up

Code Samples and API introduction here:
− http://cattail.boulder.ibm.com/cattail/#view=cmtrim@us.ibm.com/files/53A1E4007F0F3DDB
8C12752E093F23B6

The latest version of Jena TDB (0.90) is transactional. Past versions of TDB
were not transactional, and may not be suited for web environments.

DB2-RDF – builds on top of the Jena Graph SPI.

https://www.ibm.com/developerworks/mydeveloperworks/blogs/nlp/entry/db2_rdf
_nosql_graph_support13

© 2012 IBM Corporation

Appendix B - OWL

 OWL (Web Ontology Language)
– Built on top of RDF (same syntax RDF)
 Open World vs Closed World assumption
 Parts of an Ontology:
– Header
– Classes and Individuals
– Properties
– Annotations
– Datatypes
 Instance vs Subclass

© 2012 IBM Corporation

OWL – Subclasses and Types

 alpha rdfs:subClassOf of Thing
– a rdf:type alpha
– b rdf:type alpha

 beta rdfs:subClassOf alpha
– c rdf:type beta
– d rdf:type beta
– c rdf:type alpha
– d rdf:type alpha

© 2012 IBM Corporation

OWL – Subclasses and Types

 President rdfs:subClassOf Dignitary
 Dignitary rdfs:subClassOf Person
 This model states:
– All dignitaries are people
– All presidents are dignitaries (and thus,
people)
 John Smith rdf:type Person
 Queen Elizabeth rdf:type Dignitary
– Queen Elizabeth rdf:type Person
 GW Bush rdf:type President
– GW Bush rdf:type Dignitary
– GW Bush rdf:type Person
 Barack Obama rdf:type President
– Barack Obama rdf:type Dignitary
– Barack Obama rdf:type Person
 How do we expand this model to classify actively-
serving American presidents?

© 2012 IBM Corporation

Appendix C – OWL Properties

 Transitive Property
 Functional Property
 Inverse Functional Property
 Symmetric Property
 Asymmetric Property
 Reflexive Property
 Irreflexive Property
 Property Chains
 Putting it all together
 Others

© 2012 IBM Corporation

Transitive Property


hasVersion rdf:type owl:TransitiveProperty

Windows hasVersion Windows XP

Windows XP hasVersion Windows XP SP2

Windows hasVersion Windows XP SP2

© 2012 IBM Corporation

Functional Property


ssn-name rdf:type owl:FunctionalProperty

123-45-6789 ssn-ame Bob Smith

123-45-6789 ssn-ame Robert Smythe

Bob Smith owl:sameAs Robert Smythe

© 2012 IBM Corporation

Inverse Functional Property


hasSpeKey rdf:type owl:InverseFunctionalProperty

File Net Web Services hasSpeKey 5724S03

FN WS hasSpeKey 5724S03

File Net Web Services owl:sameAs FN WS

© 2012 IBM Corporation

Symmetric Property

 siblingOf rdf:type owl:SymmetricProperty
 Tim siblingOf Jim
 Jim siblingOf Tim

© 2012 IBM Corporation

Asymmetric Property

 hasParent rdf:type owl:AsymmetricProperty
 Stewie hasParent Peter
 Peter does not have parent Stewie

© 2012 IBM Corporation

Reflexive Property

© 2012 IBM Corporation

Irreflexive Property

© 2012 IBM Corporation

Property Chain

 [] rdfs:subPropertyOf hasGrandfather;
owl:propertyChain (
hasFather
hasFather
).
 John III hasFather John JR
 John JR hasFather John SR
 John III hasGrandfather John SR

© 2012 IBM Corporation

Putting it all together …

 hasSynonym
– Transitive, Symmetric

© 2012 IBM Corporation

Appendix D - Classic Mereology

 Transitive Axiom
 Reflexive Axiom
 Antisymmetric Axiom

© 2012 IBM Corporation

Transitive Axiom

 parts of parts are parts of the whole
 If A is part of B and B is part of C, then A is part of C

© 2012 IBM Corporation

Reflexive Axiom

 everything is part of itself
– A is part of A

© 2012 IBM Corporation

Antisymmetric Axiom

 nothing is a part of its parts
– if A is part of B and A != B then B is not part of A

© 2012 IBM Corporation

Appendix E - Partonomy

 Can you distinguish parts from kinds?
 Why is this important?
 This is often the difference between a taxonomy and an ontology
– A taxonomy doesn’t need to distinguish between parts and kinds
– An ontology must make this distinction

 Vehicle
-Car
--Engine
---Crankcase
----Aluminum Crankcase

© 2012 IBM Corporation

Appendix F – Common Predicates

 hasPart
– hasPart owl:inverseOf partOf
– hasPart rdf:type owl:TransitiveProperty
– partOf rdf:type owl:TransitiveProperty
 hasLocus

© 2012 IBM Corporation

Appendix G

 Blank nodes
 Anonymous (Anon) nodes
 Quads

© 2012 IBM Corporation

Quads

 (Reference Jena Tutorial with TDB.ppt)

© 2012 IBM Corporation

Maintenance*

 The relational model has relations between entities established through explict keys
(primary, foreign) and associative entities.
– Changing relationships in this case is cumbersome, as it requires changes to the base
model structure itself.
– Changes in an RDBMS can be difficult for a populated database.
 Hierarchcal models have similar limitations
 The graph model (RDF) makes it much easier to maintain the model once it is deployed.
– A critical point is that relations are part of the data, not part of the database structure
– If a new relationship needs to be added that was not anticipated, a new triple is simply
added to the datastore.
– A graph model can be traversed from any perspective. In constrast, other types of
database designs might require structural changes to answer new questions that arise
after initial implementation.

© 2012 IBM Corporation

Design Styles

 Avoid proliferating owl:inverseOf [1]

© 2012 IBM Corporation

Webinar in which Mike Bennett describes the unique approach Hypercube applies to modeling business semantics (the method used in creating the EDM Council's FIBO Business Conceptual Ontology). The end result of creating this kind of business conceptual ontology is that a firm will have a single, canonical source of meaning across all its data resources, like a golden copy but in the semantics space - so we sometimes refer to this a "Golden Ontology". Mike explains the principles for creating an enterprise conceptual ontology. From this webinar you will learn: 3 things you need to know about ontologies - Words are not Concepts - Meaning is not Truth - Syntax is not Semantics 3 things you need to do to build a Golden reference ontology: - Classification - Abstraction - Partitioning 3 ways to use a Golden Ontology - Querying across legacy data sources - Mapping and data integration - Reasoning with Semantic Web applications

Ontology development in protégé-آنتولوژی در پروتوغه

sadegh salehi

OOPS!: on-line ontology diagnosis by Maria Poveda

semanticsconference

Protege tutorial

Comércio de Portugal

SAS University Edition - Getting Started

Craig Trim

Semantic Web For Distributed Social Networks

David Peterson

The Semantic Web: 2010 Update

James Hendler

Building OBO Foundry ontology using semantic web tools

Melanie Courtot

Practical Semantic Web and Why You Should Care - DrupalCon DC 2009

Boris Mann

Ontology, Semantic Web and DBpedia

Richard Kuo

Ontology Web services for Semantic Applications

Trish Whetzel

The Semantic Web (and what it can deliver for your business)

Knud Möller

SMWCon Fall 2015 FForms

Open University in the Netherlands

We present Fresnel Forms, a plugin we developed for Protégé, an editor for Semantic Web ontologies. The Fresnel Forms plugin processes the currently active ontology in a Protégé session to export a semantic wiki for that ontology. This export uses Semantic MediaWiki’s XML-based export format for import into an existing wiki. Fresnel Forms also provides a GUI editor to let the user fine-tune the generated interface before exporting it to a wiki. Fresnel Forms exports use features from Semantic MediaWiki and Semantic Forms to provide an annotate-and-browse data system interface. Each wiki Fresnel Forms generates provides forms for entering data for classes and fields that conform to the original ontology. Templates provide displays of pages created with these forms. Finally, the wiki’s ExportRDF feature creates Semantic Web triples for the data entered that use URI’s from the original ontology. Fresnel Forms provides thus an efficient way to create a wiki for populating a given Semantic Web ontology. Fresnel Forms can be downloaded and installed on Protégé from http://is.cs.ou.nl/OWF/index.php5/Fresnel_Forms

Introduction To The Semantic Web

guest262aaa

Real-time Semantic Web with Twitter Annotations

Joshua Shinavier

Introduction to the Semantic Web

Nuxeo

The semantic web

Dotkumo

The Standardization of Semantic Web Ontology

Myungjin Lee

Semantic Web

Adarsh Kumar Yadav

Introduction to the Semantic Web

Oscar Corcho

Ontologies and the Semantic Web

Craig Trim

Why we chose mongodb for guardian.co.uk

Graham Tackley

Viewers also liked

How to Create a Golden Ontology

Mike Bennett

Ontology development in protégé-آنتولوژی در پروتوغه

sadegh salehi

OOPS!: on-line ontology diagnosis by Maria Poveda

semanticsconference

Protege tutorial

Comércio de Portugal

SAS University Edition - Getting Started

Craig Trim

Semantic Web For Distributed Social Networks

David Peterson

The Semantic Web: 2010 Update

James Hendler

Building OBO Foundry ontology using semantic web tools

Melanie Courtot

Practical Semantic Web and Why You Should Care - DrupalCon DC 2009

Boris Mann

Ontology, Semantic Web and DBpedia

Richard Kuo

Ontology Web services for Semantic Applications

Trish Whetzel

The Semantic Web (and what it can deliver for your business)

Knud Möller

SMWCon Fall 2015 FForms

Open University in the Netherlands

Introduction To The Semantic Web

guest262aaa

Real-time Semantic Web with Twitter Annotations

Joshua Shinavier

Introduction to the Semantic Web

Nuxeo

The semantic web

Dotkumo

The Standardization of Semantic Web Ontology

Myungjin Lee

Semantic Web

Adarsh Kumar Yadav

Introduction to the Semantic Web

Oscar Corcho

Viewers also liked (20)

How to Create a Golden Ontology

Ontology development in protégé-آنتولوژی در پروتوغه

OOPS!: on-line ontology diagnosis by Maria Poveda

Protege tutorial

SAS University Edition - Getting Started

Semantic Web For Distributed Social Networks

The Semantic Web: 2010 Update

Building OBO Foundry ontology using semantic web tools

Practical Semantic Web and Why You Should Care - DrupalCon DC 2009

Ontology, Semantic Web and DBpedia

Ontology Web services for Semantic Applications

The Semantic Web (and what it can deliver for your business)

SMWCon Fall 2015 FForms

Introduction To The Semantic Web

Real-time Semantic Web with Twitter Annotations

Introduction to the Semantic Web

The semantic web

The Standardization of Semantic Web Ontology

Semantic Web

Introduction to the Semantic Web

Similar to Ontology and semantic web (2016)

Ontologies and the Semantic Web

Craig Trim

Why we chose mongodb for guardian.co.uk

Graham Tackley

Ruby on Rails - An overview

Thomas Asikis

Moving from Relational to Document Store

Graham Tackley

guardian.co.uk is a leading UK-based news website. We've spent ten years fighting relational database representations of our domain model, until the implementation of our API made us realise that if only we could store documents everything got simpler. I'll talk about the history that led us to choosing MongoDB as a key part of our infrastructure going forward and how we're progressively migrating from our relational database. With huge credit to Mat Wall @matwall for creating the original version of this talk.

BP203 limitless languagesMark Myers

Big SQL: Powerful SQL Optimization - Re-Imagined for open source

DataWorks Summit

Let's be honest - there are some pretty amazing capabilities locked in proprietary SQL engines which have had decades of R&D baked into them. At this session, learn how IBM, working with the Apache community, has unlocked the value of their SQL optimizer for Hive, HBase, ObjectStore, and Spark - helping customers avoid lock-in while providing best performance, concurrency and scalability for complex, analytical SQL workloads. You'll also learn how the SQL engine was extended and integrated with Ambari, Ranger, YARN/Slider and HBase. We share the results of this project which has enabled running all 99 TPC-DS queries at world record breaking 100TB scale factor.

Beyond JavaScript Frameworks: Writing Reliable Web Apps With Elm - Erik Wende...

Codemotion

In times where a jungle of JavaScript frameworks wants to solve every conceivable problem in web app development, Elm offers a different approach. Elm is a functional language that compiles to JavaScript. It has a user-friendly compiler, a sound type system, built-in immutability and lots of other features that come in handy when developing large, hopefully bug-free, single-page apps. While having fun in the process! In this talk you'll see how Elm works and learn how to use it to build a web app. More importantly, you'll learn the pros and cons of using it over a JavaScript-based solution.

Q con london2011-matthewwall-whyichosemongodbforguardiancouk

Roger Xia

Web servicesoverview

thisismusthafa

Building a right sized, do-anything runtime using OSGi technologies: a case s...

mfrancis

The WebSphere Application Server Liberty profile uses several OSGi technologies in addition to the Equinox OSGi framework: Configuration Admin, Metatype, and Declarative Services being first and foremost among them. In this talk, I'll go over how Liberty uses these technologies to create a dynamic flexible runtime that can be right-sized based on the server's configuration. I'll share the lessons we've learned, and what we consider to be best practice for interacting with these three services. Bio: Erin Schnabel is the Development lead for the WebSphere Application Server Liberty profile. She has over 12 years of experience in the WebSphere Application Server development organization in various technical roles. Erin has over 15 years of experience working with Java and application middleware across various hardware platforms, including IBM z/OS®. She specializes in composable runtimes, including the application of OSGi, object-oriented and service-oriented technologies and design patterns to decompose existing software systems into flexible, composable units.

Hadoop, SQL & NoSQL: No Longer an Either-or Question

Tony Baer

It used to be black and white. If you needed MapReduce processing, you chose Hadoop; if you needed standard query and reporting, you chose a SQL data warehouse. The decision is no longer clear cut. With YARN clearing the way for Hadoop to accept multiple workloads, Hadoop is no longer your father’s MapReduce machine – as frameworks are rapidly emerging for interactive SQL, search, streaming and other workloads. We are on the path toward a federated world of analytic and operational decision stores, but as the boundaries between platform types grow fuzzier, deciding what platforms to use and where to run which workloads grow trickier.

Hadoop, SQL and NoSQL, No longer an either/or questionDataWorks Summit

Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Romeo Kienzler

3978 Why is Java so different... A Session for Cobol/PLI/Assembler Developers

nick_garrod

InterConnect 2015 Session 3978 Why is Java so different... A Session for Cobol/PLI/Assembler Developers. After giving sessions about Java in the past few years that tell System Programmers they should do Java on System z and that Java is just like every other language, this session tries to explain why Java is a bit different in Operating and Handling. Therefore this session tries to compare COBOL/PLI/Assembler with Java and how the Java technology works on System z: > Why can't you phase in a Java Program > How is a JIT working > Understand the Development Process of Java Applications > Debugging and Logging of Java Applications > and a lot more...

Making everything better with OSGi - a happy case study in building a really ...

mfrancis

OSGi Community Event 2014 Abstract: The WebSphere Application Server Liberty Profile makes extensive use of OSGi technologies to achieve a dynamic, compact, flexible and powerful application server. Using a foundation of Equinox, Subsytems, Configuration Admin, Metatype, and Declarative Services, we built a right-sized elastically-capable runtimes which allows users to get going with (almost) zero-setup, (almost) zero-hardware, and (really) zero-migration. This talk will discuss how Liberty uses OSGi, what OSGi gives us, why OSGi services are the best thing since sliced bread, what we've learned, and our development best practices. Speaker Bio: Holly Cummins is a senior software engineer developing enterprise middleware with the IBM WebSphere, and a committer on the Apache Aries project. She is a co-author of Enterprise OSGi in Action and has spoken at Devoxx, JavaZone, The ServerSide Java Symposium, JAX London, GeeCon, and the Great Indian Developer Summit, as well as a number of user groups.

Leverage Enterprise Integration Patterns with Apache Camel and TwitterBruno Borges

The Wearable Application Server - Holly Cummins

JAX London

Mobile technology has so far mostly been confined to the client side, for fairly obvious reasons - traditionally, clients are mobile, and servers are not. However, not only is hardware getting smaller, servers are too. When your application server can run on pocket-sized £25 hardware it opens up some pretty cool possibilities - your server is literally lightweight. Not only can you have location-based services, you can have locatable servers. Servers can run on phones, they can run on the Raspberry Pi, and so they can go almost anywhere you can think of. Modularity gives software the flexibility it needs to cram into these tight spaces without sacrificing power. This talk will demonstrate developing and deploying a web application to an instance of WebSphere Application Server embedded in a comedy hat.

JVM Multitenancy (JavaOne 2012)

Graeme_IBM

Web servicesoverview

Prabhat gangwar

We Can Do Better - IBM's Vision for the Next Generation of Java Runtimes - Jo...

mfrancis

Similar to Ontology and semantic web (2016) (20)

Ontologies and the Semantic Web

Why we chose mongodb for guardian.co.uk

Ruby on Rails - An overview

Moving from Relational to Document Store

BP203 limitless languages

Big SQL: Powerful SQL Optimization - Re-Imagined for open source

Beyond JavaScript Frameworks: Writing Reliable Web Apps With Elm - Erik Wende...

Q con london2011-matthewwall-whyichosemongodbforguardiancouk

Web servicesoverview

Building a right sized, do-anything runtime using OSGi technologies: a case s...

Hadoop, SQL & NoSQL: No Longer an Either-or Question

Hadoop, SQL and NoSQL, No longer an either/or question

Information Retrieval, Applied Statistics and Mathematics onBigData - German ...

3978 Why is Java so different... A Session for Cobol/PLI/Assembler Developers

Making everything better with OSGi - a happy case study in building a really ...

Leverage Enterprise Integration Patterns with Apache Camel and Twitter

The Wearable Application Server - Holly Cummins

JVM Multitenancy (JavaOne 2012)

Web servicesoverview

We Can Do Better - IBM's Vision for the Next Generation of Java Runtimes - Jo...

More from Craig Trim

Publishing Python to PyPI using Github Actions.pptx

Craig Trim

This presentation provides a straightforward guide to publishing Python projects on PyPI using GitHub Actions. It's a practical walkthrough for developers on automating the release process of their Python packages. You'll learn how to set up a PyPI token, configure GitHub workflows, and push updates that trigger automatic package deployment. This resource is for anyone looking to eliminate manual uploads to PyPI with a straightforward approach to using GitHub's tools for continuous integration and deployment.

SAS Visual Process Flows

Craig Trim

Bluemix NL Classifier Tutorial

Craig Trim

Bluemix - Deploying a Java Web Application

Craig Trim

IBM Bluemix - Building a Project with Maven

Craig Trim

Question Types in Natural Language Processing

Craig Trim

Jenkins on Docker

Craig Trim

IBM Bluemix: Creating a Git Project

Craig Trim

Octave - Prototyping Machine Learning Algorithms

Craig Trim

Octave is a high-level language suitable for prototyping learning algorithms. Octave is primarily intended for numerical computations and provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used through its interactive command line interface, but it can also be used to write non-interactive programs. The syntax is matrix-based and provides various functions for matrix operations. This tool has been in active development for over 20 years.

The Onomyicon

Craig Trim

There are many words in english that end with the suffix "-nym" or "-nymy". This comes from the ancient Greek ὄνυμα, meaning "name" or "word", and could even be loosely translated as "state of being". A categorization of onomastic terminology is a helpful step in understanding data. In the automated creation of a semantic model, it is neccessary to develop patterns. Semantic models are primarily composed of space (static information) and time (process / event oriented). Patterns built around onoma help is deriving the former. This is not a complete list of all Onoma. In many respects, the class of words ending with "-nym" could be considered open. Neologism (the type of words belonging to the class "neonym") can be easily created to describe any category for any entity type.

Dependency parsing (2013)

Craig Trim

Inference using owl 2.0 semantics

Craig Trim

An Introduction to the Jena API

Craig Trim

The art of tokenizationCraig Trim

Deep Parsing (2012)

Craig Trim

More from Craig Trim (15)

Publishing Python to PyPI using Github Actions.pptx

SAS Visual Process Flows

Bluemix NL Classifier Tutorial

Bluemix - Deploying a Java Web Application

IBM Bluemix - Building a Project with Maven

Question Types in Natural Language Processing

Jenkins on Docker

IBM Bluemix: Creating a Git Project

Octave - Prototyping Machine Learning Algorithms

The Onomyicon

Dependency parsing (2013)

Inference using owl 2.0 semantics

An Introduction to the Jena API

The art of tokenization

Deep Parsing (2012)

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

Generating a custom Ruby SDK for your web service or Rails API using Smithy

g2nightmarescribd

Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

UiPath Test Automation using UiPath Test Suite series, part 3

DianaGray10

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Connector Corner: Automate dynamic content and events by pushing a button

DianaGray10

Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to: Create a campaign using Mailchimp with merge tags/fields Send an interactive Slack channel message (using buttons) Have the message received by managers and peers along with a test email for review But there’s more: In a second workflow supporting the same use case, you’ll see: Your campaign sent to target colleagues for approval If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team But—if the “Reject” button is pushed, colleagues will be alerted via Slack message Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors. And... Speakers: Akshay Agnihotri, Product Manager Charlie Greenberg, Host

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

Bits & Pixels using AI for Good.........

Alison B. Lowndes

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

Securing your Kubernetes cluster_ a step-by-step guide to success !

Mission to Decommission: Importance of Decommissioning Products to Increase E...

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Assuring Contact Center Experiences for Your Customers With ThousandEyes

Generating a custom Ruby SDK for your web service or Rails API using Smithy

How world-class product teams are winning in the AI era by CEO and Founder, P...

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

UiPath Test Automation using UiPath Test Suite series, part 4

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

UiPath Test Automation using UiPath Test Suite series, part 3

Neuro-symbolic is not enough, we need neuro-*semantic*

When stars align: studies in data quality, knowledge graphs, and machine lear...

Connector Corner: Automate dynamic content and events by pushing a button

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

Bits & Pixels using AI for Good.........

Ontology and semantic web (2016)

9. Triples (Confidence Levels)  Go from this: – ShakepeareOnline states (Hamlet writtenIn 1599)  To this: – (ShakepeareOnline states (Hamlet writtenIn 1599)) hasConfidenceLevel 90 © 2012 IBM Corporation

12. Ontology  “Shakespeare wrote Hamlet in 1876”  How many “types” of things are there in this statement? – Authors – Books – Plays – Years – Sources – Characters  What relationships could exist between these types? © 2012 IBM Corporation

13. Ontology  Author – Playwright {Shakespeare, Marlowe}  Book – Play {Hamlet, Macbeth, Faustus}  RDF: – Shakepeare a Playwright – Shakepeare a Author – Hamlet a Play – Hamlet a Book © 2012 IBM Corporation

23. Inference Ontology Model (Classes):  Product – SupportedProduct (x hasMaker IBM)  Company – IBM – NonIBM (disjoint to IBM) • { Microsoft, Oracle, Teradata) Ontology Model (Predicates):  <Product> hasMaker <Company> Triple Store data:  Rational Software Architect hasMaker IBM  Rational Software Architect a SupportedProduct © 2012 IBM Corporation

34. “Agent” analysis itm agent 54 db2 agent 32 os agent 32 ul agent 31 monitoring agent 29 oracle agent 22 agent needs 21 itm ul agent 16 windows os agent 15 agent left 14 agent system 14 citrix agent 14 mysap agent 14 unix os agent 13 © 2012 IBM Corporation

38. Blank Nodes  What are blank nodes? – A way of profiling entities – A way of identifying entities without explicit identification – Implicit identification – Predicate driven identification of data (rather than explict characterization)  Examples: – “That person has a child” – “That person has a child and a husband” © 2012 IBM Corporation

39. Anonymous (Anon) Nodes  What is the difference between an Anon Node and a Blank Node?  An “anonymous node” is an existentially quantitifed variable ∃  A typical RDF node has an identifier to which it is useful to refer © 2012 IBM Corporation

43. Common OWL Editors  TopBraid Composer (TBC)  Free Edition (also Standard + Maestro Editions)  http://www.topquadrant.com/products/TB_Composer.html  Protege  Free, open source ontology editor and knowledge-base framework  http://protege.stanford.edu/ © 2012 IBM Corporation

44. Triple Stores  Comparison and links here:  http://www.w3.org/wiki/LargeTripleStores  Sesame - scalable and transactional  May be more suited to web environments  Setup slightly more complex than Jena TDB  Jena TDB - scalable and very simple set up  Code Samples and API introduction here: − http://cattail.boulder.ibm.com/cattail/#view=cmtrim@us.ibm.com/files/53A1E4007F0F3DDB 8C12752E093F23B6  The latest version of Jena TDB (0.90) is transactional. Past versions of TDB were not transactional, and may not be suited for web environments.  DB2-RDF – builds on top of the Jena Graph SPI.  https://www.ibm.com/developerworks/mydeveloperworks/blogs/nlp/entry/db2_rdf _nosql_graph_support13 © 2012 IBM Corporation

45. Appendix B - OWL  OWL (Web Ontology Language) – Built on top of RDF (same syntax RDF)  Open World vs Closed World assumption  Parts of an Ontology: – Header – Classes and Individuals – Properties – Annotations – Datatypes  Instance vs Subclass © 2012 IBM Corporation

46. OWL – Subclasses and Types  alpha rdfs:subClassOf of Thing – a rdf:type alpha – b rdf:type alpha  beta rdfs:subClassOf alpha – c rdf:type beta – d rdf:type beta – c rdf:type alpha – d rdf:type alpha © 2012 IBM Corporation

47. OWL – Subclasses and Types  President rdfs:subClassOf Dignitary  Dignitary rdfs:subClassOf Person  This model states: – All dignitaries are people – All presidents are dignitaries (and thus, people)  John Smith rdf:type Person  Queen Elizabeth rdf:type Dignitary – Queen Elizabeth rdf:type Person  GW Bush rdf:type President – GW Bush rdf:type Dignitary – GW Bush rdf:type Person  Barack Obama rdf:type President – Barack Obama rdf:type Dignitary – Barack Obama rdf:type Person  How do we expand this model to classify actively- serving American presidents? © 2012 IBM Corporation

48. OWL – Subclasses and Types  President rdfs:subClassOf Dignitary  Dignitary rdfs:subClassOf Person  This model states: – All dignitaries are people – All presidents are dignitaries (and thus, people)  John Smith rdf:type Person  Queen Elizabeth rdf:type Dignitary – Queen Elizabeth rdf:type Person  GW Bush rdf:type President – GW Bush rdf:type Dignitary – GW Bush rdf:type Person  Barack Obama rdf:type President – Barack Obama rdf:type Dignitary – Barack Obama rdf:type Person  How do we expand this model to classify actively- serving American presidents? © 2012 IBM Corporation

49. Appendix C – OWL Properties  Transitive Property  Functional Property  Inverse Functional Property  Symmetric Property  Asymmetric Property  Reflexive Property  Irreflexive Property  Property Chains  Putting it all together  Others © 2012 IBM Corporation

52. Inverse Functional Property  hasSpeKey rdf:type owl:InverseFunctionalProperty  File Net Web Services hasSpeKey 5724S03  FN WS hasSpeKey 5724S03  File Net Web Services owl:sameAs FN WS © 2012 IBM Corporation

57. Property Chain  [] rdfs:subPropertyOf hasGrandfather; owl:propertyChain ( hasFather hasFather ).  John III hasFather John JR  John JR hasFather John SR  John III hasGrandfather John SR © 2012 IBM Corporation

63. Appendix E - Partonomy  Can you distinguish parts from kinds?  Why is this important?  This is often the difference between a taxonomy and an ontology – A taxonomy doesn’t need to distinguish between parts and kinds – An ontology must make this distinction  Vehicle -Car --Engine ---Crankcase ----Aluminum Crankcase © 2012 IBM Corporation

69. Maintenance*  The relational model has relations between entities established through explict keys (primary, foreign) and associative entities. – Changing relationships in this case is cumbersome, as it requires changes to the base model structure itself. – Changes in an RDBMS can be difficult for a populated database.  Hierarchcal models have similar limitations  The graph model (RDF) makes it much easier to maintain the model once it is deployed. – A critical point is that relations are part of the data, not part of the database structure – If a new relationship needs to be added that was not anticipated, a new triple is simply added to the datastore. – A graph model can be traversed from any perspective. In constrast, other types of database designs might require structural changes to answer new questions that arise after initial implementation. © 2012 IBM Corporation

Editor's Notes

So in the scope of an hour we can hardly hope to exhaust this subject – it’s sort of like having an hour to talk about Relational Databases. Where do you begin? With data normalization? 3 rd normal form? ERD design and tools? RDBMS implementations like DB2? Index optimization? Loading and retrieving data? SQL? JDBC connectivity? JPA and entity managed beans? It’s a wide topic. Same with this. So we’ll touch on a few underlying points. Our team has bi-weekly calls in this space so if you’re interested in further information, just let me know and I’ll sign you up.
The first part is “Triples”: Talk about what a triple is, and how multiple triples form a semantic chain. A semantic network is a collection of semantic chains. Triple > Semantic Chain > Semantic Network Talk about reification and why that’s important (using a triple in place of a subject or object; being able to make a statement about a triple) Talk about confidence levels; how to implement them and when to use them. The second part is Ontology Design: How does the Ontology fit into this?
Let’s start by looking at the data. This is a triple store. Or a semantic network. Or a knowledge base. The terms are often used interchangeably. Basically – it’s a bunch of connected nodes. There is no underlying schema in the sense of an RDBMS. Nodes are related to each other by means of edges, or relationships – in triple store parlance these relationships are referred to as “predicates”. A predicate connects one node to another. A connection from one node to another node by means of a predicate, is referred to as a “triple” On the next several slides, we’re going to go through an example of decomposing a natural english sentence (unstructured text) into triples. So this takes a very basic understanding of english grammar – recognizing verbs and nouns and nouns that function as objects of a verb, and those that function as subjects of a verb. We’ll start with a basic sentence that resolves to a single triple, and work our way up a more complex semantic chain – a collection of related triples. And we’ll also begin to make assertions about various triples in our network – some of the data we trust, some of it we might not.
So here’s our first triple. “The author of Hamlet is Shakespeare” (or, Shakespeare wrote Hamlet, Shakespeare is the fellow what done wrote the play named Hamlet, etc) We abbreviate this sentence into a triple: Shakespeare authorOf Hamlet So we’ve decomposed our data into a triple. We can reverse the first triple by saying that: If Shakespeare is the author of Hamlet Then The author of Hamlet is Shakespeare This may seem trivial. And perhaps it is. But the important thing to note here is that the intelligence for the data is maintained at the level of the data; not in the application. We don’t have to maintain a business rule within our application layer that states if “A” has a given relation to “B”, then “B” must have a given relation to “A”. We can simply assert within the data that the predicate “authorOf” has an inverse predicate named “hasAuthor”. So it was not necessary for us to explicitly assert anywhere that Hamlet was written by Shakespeare; we simply “know” this because We know that Shakespeare was the author of Hamlet And “authorOf’ has a inverse relationship to “hasAuthor” At the point of the first slide I want you to understand what a triple is: Subject and a Object connected by a predicate “ Shakespeare” and “Hamlet” may be interesting in and of themselves, but the connection between the two is valuable. It tells us something important about these two items. And if we encounter either Shakespeare or Hamlet, in the course of parsing unstructured data, we now have a semantic reference point for both of them. So now you might be asking - why would we want do this? Why not decompose this data into a relational database or some other data mechanism? We could have a database table named “Authors” and another table called “Books” and perhaps create a third lookup table that associates authors to books. That’s another option. So far we haven’t made much of a case for decomposing our data into triples. But a couple of things – we’re only at the first slide, and I’m not trying to talk anyone into using triple stores over an RDBMS. Some data is a match for semantic networks, some data isn’t. I am hoping this presentation will give you a better sense of when a triple store might be a good fit. Each triple represents a statement of a relationship between the things denoted by the nodes that it links. Each triple has three parts: a subject, an object, and predicate (also called a property) that denotes a relationship. The direction of the arc is significant: it always points toward the object.[1] References: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-data-model
So let’s say we log onto Wikipedia or some other trusted internet source, and we want to understand this sentence: “ Shakepeare wrote Hamlet in 1876” Now we have two triples. And note that “Hamlet” functions both as the object of one triple and as the subject of another triple. This is not only perfectly valid, but is fundamental to the power of triple stores. These two triples form a “semantic chain”. A semantic chain is defined as two or more triples, that taken together, form a statement. In isolation, “Shakepeare authorOf Hamlet” and “Hamlet writtenIn 1876” are useful, but taken together, this semantic chain can help answer the question: “What did Shakespeare write in 1876?” The answer is not only obvious, but more importantly, it is computationally simple.
Useless: “ Wikipedia states Shakespeare” True: Shakespeare authorOf Hamlet False: Hamlet writtenIn 1876 What we want to do in a semantic network (triple store) is not only add data, but add our sources for the data. Then we can begin to associate confidence levels with those sources. When building a triple store you could have multiple sources for data – 100s or 1000s of different sources – whatever. Data can come from structured respositiories, from unstructured internet-based sources (like forums or user communities), or semi-structured locations like dbpedia. Or you can open up your triple store to a community of users and allow them to add data. So again, some sources are trust worthy, and some aren’t. It’s up to you to make that distinction. But here’s how you enable it in your data. So let’s examine this semantic chain again – it’s actually not correct. Let’s look closely at what it’s saying. We have 3 connected triples - … What we really want to say is “Wikipedia states (Hamlet writtenIn 1876)” So we actually want to make a statement about a triple. Now up until now, we only looked at predicates that were related to single nodes. But predicates can also be related to triples.
(Wikipedia states Shakespeare), (Shakespeare authorOf Hamlet), (Hamlet writtenIn 1876) Without reification what do we have? one useless statement, one true statement, one false statement With reification, what do we have? Wikipedia states (Hamlet writtenIn 1876)
Question: When was Hamlet written? Answers: 1599 1876 It is not uncommon for a Knowledge Base (triple store) to have multiple answers to a question. How can we assign confidence levels to triples in order to rank answers by most probable to least probable? The simplest way (in this case) would be to say that anything from Wikipedia has a low confidence and anything from ShakepeareOnline has a high confidence. Why do we need confidence levels? Your source data for the knowledge base (triple store) will come from multiple sources. Some of these sources will be trustworthy, others will be questionable. For example, some data might come from structured sources, such as product catalogs. This type of data is typically very trustworthy. Other data might come from SMEs. Typically this data has high confidence as well. Data with a lower confidence might come from the crawling user forums on the internet. Content for the KB might be crowdsourced too; confidence in data obtained by this means might vary by user.
So how can I express a confidence level around each of these assertions? I have asserted that: ShakespeareOnline states (Hamlet writtenIn 1599) And now I want to assert that (ShakespeareOnline states (Hamlet writtenIn 1599)) hasConfidenceLevel 90 (the confidence level is arbitrary; I use a scale of 1-100, but you can use whatever you want)
So now if the question is asked: When was Hamlet written? We can either give both answers with their respective probabilities and let the user decide, or, given that one answer has a much higher confidence than another, simply return our most confident answer. Now here’s a question for the audience: How many triples do you have in the each diagram on this slide? The answer is 3. Reading from inner most to outermost, Triple 1: “Hamlet writtenIn 1876” Triple 2: Wikipedia states Triple1 Triple 3: Triple2 hasConfidenceLevel 90 Take aways: Reification is a powerful feature of triple stores. Reification can be taken to any level All right, this has been pretty fast and pretty advanced. But we covered in just a few slides: What a triple is (Subject Predicate Object) What a semantic chain is (2+ triples) How to make statements about triples (reification) How to make statements about statements about triples (reification)
Going to keep this section short for sake of time. We will pick up on Ontology Design and Implementation on a future presentation. To make an analogy, it could be said that what an ERD is to a relational database, an Ontology is to a triple store. Don’t want to take this analogy to far; but it’s a good introduction. In our previous example we had several “types” of things we were looking at: Authors, Books, Plays, Years, Sources, Characters
An Ontology contains “Classes” and “Predicates”. A Class is a set of things that can be either the subject or the object of a triple. If I create a class called “Author”, a member of that class (or set) can be “William Shakespeare” or “Christopher Marlowe”. A class can have 0..* members. A class can also have 0..* sub-classes. A sub-class of “Author” might be “Playwright” Note that we say “what relationships could exist between these types”. We’re looking somewhat beyond our source data at the moment, and beginning to consider reality from a more objective standpoint. Let’s forget about what our source data asserts; what relationships could exist between Authors, Playwrights, Books and Plays? And how are those relationships related to each other? (that last question is beyond the scope of the current set of slides, but still an important one when designing an Ontology) Note: It’s important not to apply Object-Oriented (OO) thinking to Ontology design. The two are not related, even though the terminology frequently overlaps (classes, inheritance, etc).
I’ve created a simple Ontology with 2 classes: Author and Book Each class has a sub-class. Author has sub class Playwright Book has sub class Play The “Play” class has 3 members (Hamlet, Macbeth, Faustus) The “Playwright” class has 2 members (Shakepeare, Marlowe) Because we’ve asserted that Shakepeare is a Playwright, and Playwright is a subclass of Author, then we can infer that Shakepeare is an Author. This brings up an important point of inference: There are explicitly stated facts (triples) And implicit triples that can be inferred (or derived) from those facts Inference is a powerful feature of Ontologies and Triple Stores. Back to slide 11, so with Ontologies we look at the things that are and determine how they are related to each other. That way when we encounter types of things in our unstructured data, we now know what they are, and how they are related to other things in our domain. Let’s look at a real world example now. We’ll segue from this into how triple stores and Ontologies fit into an overall NLP architecture.
So we have an NLP parser (LanguageWare). LW produces annotated text; that is, text annotated with not only syntactic information (like what part of speech a word is; a verb, noun, adverb, adjective, etc), but the text is also annotated semantic information. A word might be recognized as a product “Rational Software Architect” or as a dignitary “President Barack Obama” or as a location “Haifa Research Lab”. Text analytics can only go so far with simple POS (part-of-speech / syntactic) tagging. The semantic annotations are likely to add the most value. So where does this semantic information come from? From the dictionaries you say. Yes, that is true. Every “tag” or “annotation” in an NLP parser has an associated dictionary. So we can create an annotation called “Author” and if we have a dictionary of authors (Shakepeare, Marlowe, Dickens, etc), we can be reasonably confident of recognizing an author in unstructured data when we encounter one (based on the sufficiency and size of our dictionary). But where does the dictionary come from? If you want a dictionary of authors or companies or products or stock symbols, likely you can search online and find CSV or flat text files with this information. And that’s always a good option. But let’s consider this carefully for a moment. Where are these dictionaries coming from? What is their purpose? And what do they relate to? A dictionary
An author annotation is based on an Author dictionary. The annotation of “William Shakespeare” as an Author is an implict triple: William Shakespeare a Author So some unstructured data was pulled off of Wikipedia and annotated using LanguageWare. We now recognize that Shakespeare is an author. But let’s call something out there: This is unstructured data. How can we transition from unstructured  structured data?
How do we move from unstructured data to structured data? Remember in our previous slide (12) when we talked about all the different types of “things” that existed in the statement: (Shakespeare authorOf (Hamlet written in 1599)) states Wikipedia We have at least this many “things”: Authors Books Plays (type of Book) Playwrights (type of Author) Sources (e.g. as in of Information) Characters Dates (Years) If we have the right dictionaries, and create the right annotations, we can recognize all these “things” in unstructured text using an NLP parser. But how do these “things” relate to each other? What’s the relationship between an author and a book?
Now, I realized I’ve really marked up this sentence and there’s arrows and red text going all over the place. So let’s examine this closely. We’ve only recognized (e.g. annotated) two words in this entire sentence: William Shakespeare as a Playwright and Hamlet as a Play. But look at the depth of the understanding that we have. There’s a model depicted on this image, and we want to examine this more carefully. You’ll notice first of all that there are a total of 6 annotations represented on the diagram with arrows flowing between them. These annotations are produced by the NLP parser, and modeled (here’s the key point), they are modeled in the Ontology. It’s in the Ontology that we specify how a Book is related to a Date, or to a Language, and a Language to a Country to an Author, to a work produced by that Author, and so on. Each annotation is backed by a dictionary. The data for that dictionary is generated out of the triple store that conforms to the Ontology. The Ontology shows the relationship of all the annotations to each other. The annotation of “William Shakespeare” as an Author is an implict triple: William Shakespeare a Author We are now beginning to transition from unstructured data into the realm of structured data; if we know that William Shakespare is an Author, we also know that Authors live in Countries; that Authors write books that are published on certain dates and written in certain languages, etc. There’s an entire semantic chain of information that can be derived from this sentence – and that’s the point! Further, the Ontology helps us to understand what data we’re missing. If the NLP parser has recognized the author and the title, what hasn’t it recognized? It appears that all books are published on a date. So let’s look for the date – it’s in there. Further, it appears that a language is involved too – we can find that as well. To summarize, the Ontology gives us the relations that exist between annotations. It helps us to understand each annotated token in a larger context (the context of a semantic chain and semantic network). It also helps us to understand what information we are missing, and what else we need to look for. Are you faced with a large corpus of unstructured data? Where do you begin? How do you even know where to start looking and what you should start looking for? A model can help clarify this. The Ontology is your link into the real world. Without an Ontology, the annotations used by the NLP parser can become somewhat random. Who decides what an annotation should be named? Are they making this decision in coordination with what already exists? What modeling discipline exists? In past projects without an Ontology model, the NLP annotations over time had no link to the real world. Some one joining the project wouldn’t know what a “RemainingUsefulWord” or a “PowerActionWord” was – there’s no just way. If these had been designed in the discipline of an Ontology model, this discipline would have enforced a better standard in terms of naming, and likewise provided a link to the real world. Consider the diagram above. We may never annotate the source text for Language, Date or Country. Then again, maybe we would – but we don’t need to. The point is, these concepts still provide value, because they give us the context and domain understanding of the concepts that we do use as annotations in our NLP parser (like Book, Play and Author). This is an important point: Not every Ontology class needs to be associated with an annotation/dictionary in your NLP parser. In an extreme example, you might have an Ontology model with 15 classes and only one of them is used in the NLP parser. Also note: There is no constraint toward a single Ontology model. Multiple ontology models can be used. It is likewise not a necessity that Ontology models must be related, either integrated peer-to-peer or via an “Upper Ontology”. The need may exist, but it depends on circumstance. Maintaining multiple models, each as a context around a particular annotation, or annotation set, is a valid solution. It may even make collaborative team efforts simpler.
So now we move onto this slide. This component model illustrates a point that was made in the previous slide (17). Rather than the diagram on page 14, where the NLP parser is operating in isolation from a larger semantic network, now we have added the context to data that a semantic network provides. We are beginning to add structure to our unstructured data. And this is largely what the big picture looks like. Note another interesting aspect to this diagram. What comes first – the dictionaries or the triple store? This is somewhat of a chicken-and-egg syndrome. Typically, a project that is just starting up will bootstrap the process by using out-of-the-box dictionaries with perhaps some other structured data that has been provided. The key point to notice here is that the output of the NLP parser is annotated text that has two purposes: The first purpose of annotated text is input to the text analytics portion of the project. After all, this is the main purpose of this technology; provide some insight into the unstructured data However, there is a second benefit. The annotated text can also be used to enhance the triple store , which will in turn result in enhanced dictionary generation, which will in turn result in enhanced NLP parser annotations. Annotated text from the NLP parser can be examined – the most obvious application is to find the “unmatched” tokens – that is, the tokens that the NLP parser did not recognize. These are the result of “gaps” in the understanding of this semantic architecture. Unmatched tokens can be classified according to the Ontology model. For example, if the tokens “Mark” and “Twain” were not not recognized by the NLP parser, the compound token “Mark Twain” can be added to the triple store (Mark Twain a Author). The next time the dictionaries are generated, the author dictionary will contain the “Mark Twain” token and any further encounter of this name in text will result in a positive match. It is beyond the scope of this current slide deck to discuss how the triple store is loaded and how dictionaries are generated.
So we are using LanguageWare and we annotate this unstructured text. The yellow-highlighted text are the tokens of interest to us. Now how did we know to annotate these particular tokens? The area of machine learning, or building up a contextual understanding from the domain, is not a purely automated one. The process can be acclerated through the use of the proper tools (LW and ICA come to mind), as well as by using token recognition techniques via the underlying grammar, or through other methods. Some of these methods are discussed in later slides. For now, let it be sufficient to say that these tokens have been recognized, and from them we are able to derive the semantic chain shown on the next slide (20).
From the unstructured text on the previous slide, we were able to construct this semantic chain. So let’s say we have an interactive application that attempts to understand user input and react accordingly. If a user types in “topas” we can now place this within the context of the semantic chain shown on this slide. We are able to infer that the user is talking about AIX, and that the user is likely attempting to monitor CPU usage. Note that we can’t infer very much if the user types in AIX. If the user inputs AIX, we can’t necessarily infer that the user is talking about the “topas” command. The user is just as likely to be referring to something else in connection with AIX. AIX is a common token that likely occurs within multiple semantic chains (and would in fact be a key node in the entire semantic network). Some tokens (like “topas”) fulfill the role of “triggering token”. How these tokens are recognized is beyond the scope of this slide deck, but the recognition can involve either a manual designation or an algorithm applied against the triple store to find tokens that potentially fulfill this role (refer to Phil Tetlow’s work in this area).
This SPARQL query will retrieve the specific AIX command that monitors CPU usage. Note that none of the specified variables (such as AIX or CPU) are required, but are used to narrow down the query results. If AIX was left as a variable, then this query would return all commands, regardless of platform, that fulfilled the given critieria. SPARQL is triple-based query language, and is familiar to anyone who is familiar triple-based syntax.
Ok, so this is just a cool slide that was thrown in. Inference was briefly mentioned on an earlier slide (13). Inference is the ability for implict (or inherent) triples (or knowledge) to be constructed from existing data. Inference is something we perform all the time in our minds without realizing it. If you see someone entering the office shaking off a wet umbrella, you may reasonably infer that it was raining outside. You don’t need to have been outside, nor do you need to look out the window to perform this inference. When you create an Ontology it conforms to an OWL profile. OWL stands for “Web Ontology Model”. There are many different OWL profiles, and I believe all of them are vendor-netural and open standards. An OWL profile specifies all the different types of inference that may be performed on the model. Again, it’s beyond the scope of this slide deck to talk about all the inferences that can be both modeled and peformed. Suffice for now to state that this is a key feature of an Ontology that sets it apart from say a relational database. If you create an RDBMS, you know everything that there is to know in advance about your domain. We’ve all been on projects where the ERD changed half-way through and it throws everything off. Much of the application layer logic is built on top of assumptions made within the ERD and any change can have a wide impact. When you build out an ERD you specify all your relationships (PKs, FKs, AKs, etc) in advance. You don’t wake up one day and say – hey! There’s a link between those two tables, and have I no idea how that came about! But in the context of an Ontology and Triple Store not only is that what can happen, but that’s what should happen! The relationships can and will surprise you. Here’s three companies that operate in a similar space – CIA, MI6 and Facebook. Each has knowledge used to make certain connections. If 5 of your friends like a certain book or movie, chances are you might too. And so on. There’s guilt by association. But the point is, you don’t have to build out your entire Ontology in advance prior to populating the triple store. You can populate a triple store with a very lightweight Ontology, and as you begin to encounter certain items and certain patterns in the text, you can refine and build our Ontology model. For example, the Support 123 Ontology model has these classes: Product Company A few months down the road, we were told that we could only help users with supported products. What’s a supported product? Anything made by IBM. So we created a subclass of Product called SupportProduct: Product SupportedProduct (madeBy IBM) Company IBM NonIBM
The only data that was explictly asserted (e.g. added to the triple store) was this triple (fact): Rational Software Architect hasMaker IBM I didn’t even have to say that “Rational Software Architect” was a product. Based on the Ontology model, anything that hasMaker <company> is a product. And not only that, I’ve just defined a sub-class called SupportedProduct that a sub-class (or subset of Product). Anything product that hasMaker IBM, is a supported Product. So from one simple triple, I’ve inferred several more. I did not have to make any changes in the application layer. The logic in the application layer simply needs to issue a SPARQL query that states: SELECT ?x WHERE { ?x a SupportedProduct } and “Rational Software Architect” will be returned. If I want to withdraw this rule, or otherwise refine it – perhaps state that Oracle/Sun products are supported (just as an example), the SupportedProduct class can be refined. I know we’re glossing over quite a bit here. We’re not even looking at an Ontology Editor (like Protégé or TopBraidComposer) that can make this happen. But that’s just for the sake of time. So if you flip back to the previous slide (22), you’ll notice in this 3d visualization of a triple store, all the green nodes represent asserted triples (if you look closely you can see green lines between them). The grey lines represent first and second order inferences that were made between the nodes. There is obviously a lot more knowledge available when you begin to leverage the power of inferencing. First order inference: if A  B  C then A  C if A  B  C and A  B  D then C = D (note: these are not intended to be taken as mathematical propositions that hold true in all cases. I’m simply attempting to illustrate that a “first-order inference” examines a single proposition before making an inference, and a “second-order inference” examines the result of two propositions before making an inference. An inference may examine as many propositions as necessary to come up with an inference; but the visualization on the previous slide only went as far as second-order inferences.
Simply by introducing a class in the Ontology called “SupportedProduct” we can now infer a classification for “WebSphere” and “RSA” that did not previously exist.
“ Tivoli Monitoring” comes from the IBM Product Catalog (PTI). “ ITM” as a synonym of “Tivoli Monitoring” comes from an official source within IBM. “ ITM agent” is located through searching the corpus for high-frequency compound nominals (in this case, phrases that start with “ITM”). NOTE: Reference slide 30 for detailed information on how the corpus was searched.
“ Tivoli Monitoring” comes from the IBM Product Catalog (PTI). “ ITM” as a synonym of “Tivoli Monitoring” comes from an official source within IBM. “ ITM agent” is located through searching the corpus for high-frequency compound nominals (in this case, phrases that start with “ITM”). NOTE: Reference slide 30 for detailed information on how the corpus was searched.
“ Tivoli Monitoring” comes from the IBM Product Catalog (PTI). “ ITM” as a synonym of “Tivoli Monitoring” comes from an official source within IBM. “ ITM agent” is located through searching the corpus for high-frequency compound nominals (in this case, phrases that start with “ITM”). The triples in red were inferred. Now, given that “ITM agent” was located, locate all the other phrases that have the word “agent”, and occur within the context of “Tivoli Monitoring” (refer to slide 30, next)
Note that “Unix O/S Agent” and “ITM Agent” both share common predicates to the same object. It can be posited that if two subjects that the predicates are shared to a single object, the greater the similarity is between the two subjects. “ ITM agent” and “Unix O/S agent” possess a given degree of similarity, in that the relationships they share (predicate-object paths) are identical. Keep in mind that this entire network (pictured above) has been constructed in an automated fashion. It may seem apparent to us (in application of human reasoning) that “ITM agent” and “Unix O/S agent” are similar, but this semantic network shows that in the context of “sending”, “scheduling” and “receiving” events, the two agents are identical. How far we want to take that concept of “identity” is up to the consumer of the semantic network. It has been well said that within the context of the English language, no two words are exactly synonymous; there are always subtle shades of meaning. And between “ITM agent” and “Unix O/S agent” there is certainly more than a subtle change in meaning. Nevertheless, the relationship here is clear; the two nodes posess a high degree of similarity.
The red line indicates the incorporation of a bayesian belief model into the direct graph. There is a given degree of plausbility that itm agent and unix o/s agent are functionality synonymous; this belief could be updated in the light of future information.
Staying within the corpus of “Tivoli Monitoring Agent”, this diagram shows a refinement of event. So there is a high probability that when an event is being talked about, it’s a Network Event, TEC Event, AIX Event, JMX Event or Omnibus Event (note: this is not the entire list; contents were constrained for diagram readability). So now we can say that: Tivoli Monitoring hasPart Tivoli Monitoring Agent schedules/sends/receives events of type Network, AIX, JMX, etc.
Some of this is really just hierarchal classification. The link between “TEC Event” and “TEC Adapter” is uncertain. “TEC adapter” was simply found by searching for “TEC” in the proximity of “TEC event” hits. In the cases, semantic interpolation performed by an SME may be necessary. Also, within the context of “Tivoli Monitoring” it may be useful to understand how “events” and “adapters” work together; what verbs tend to connect these two tokens? The following hierarchy is trivial to construct automatically: TEC Adapter Tivoli TEC Adapter Netview TEC Adapter Omegamon TEC Adapter In each case, the “child” token has a prefix.
This is in the context of “Tivoli Monitoring Agent” Lexeme (lemma highlighted): Tivoli Monitoring Agent ITM Agent itmagent The top 100 documents containing a match from this lexeme were returned from the Lucene index. The documents were then searched for high-frequency compound nominals containing “agent”. It is presumed that these tokens are related to “Tivoli Monitoring Agent”, likely as a sub-type.
These are the verbs that occur in proximity to “agent” in the context of “Tivoli Monitoring Agent” These are all things that an “agent” can “do”.
Partial list
When Johnny came home, Pasco waddled to the door, stamped his webbed feet, fluffed his wings, and sang "Quack ... Quack ... quaaack!" Question: What is Pasco? Answer: Pasco is a Thing (Entity) Well, that is accurate, but not very precise. Can we be more precise? What do we know about Pasco? "waddled" - type of walk "webbed feet" - body part "fluffed" and "wings" - body part (fluffed = feathers) "Quack" - sound What do we know about Pets in our Ontology? Duck walksLike Waddle, Shuffle, SillyWalk soundsLike Quack looksLike glossy green head, white neck ring, white tail, wings, yellow bill, orange webbed feet What we are saying is that anything that has a silly walk *might* be a duck. Even these (http://www.youtube.com/watch?v=IqhlQfXUk7w) might be ducks. In this case, in our sample sentence, we can see that Pasco walksLike Waddle soundsLike Quack looksLike Webbed Feet and that lines up to our Ontology definition of what a duck is. We can infer that Paso is a duck within a degree of confidence expressed as 0 <= x <= 100. In this case, we might say we are 100% certain that Pasco is a duck, assuming that each property of a duck gives us an independent certainty of 33 1/3%. So let's say we only had this text to go on: "Hi Pasco!", said Johnny. "Quack Quack!", said Pasco. In this case, we only have Pasco soundsLike Quack and again, assuming a 33 1/3% confidence per relationship, we are now 33 1/3% certain that Pasco is a duck. But he might be a Parrot that sounds like a duck. Or Pasco could be Johnny's little brother imitating a duck. Or Pasco could be ... anyThing. And if we have this: Pasco ruffled his feathers and reiterated, rather dryly, "quack, quack, quack". Now we have Pasco soundsLike Quack looksLike feathers and maybe that's enough to give us 66 2/3% certainty that Pasco is a duck. Note that we don't need to evenly divide probability among relationships in an entity. Perhaps sounds is more important to us. Or less important to us. Maybe sound is only worth 10%, and looks are everything. So anything we find with appearence has a higher probability. The rating belongs in the model and can be as simple or as complicated as it needs to be. The point is - blank nodes are a powerful mode of expression in RDF. We define what we do know (objective reality) in our Ontology, then parse our unstructured text to if anything we know is in the text. And if parts of things we know are in the text, then we know at least something.
Blank Node Recognition of entities by virtue of the properties the entity has, rather than by an explict identifier This is a paradigm shift from RDBMS thought. In a relational database, we would identify a customer using an explict identifer (the PK). This identifier has an important meaning throughout the database and in and of itself is representative of the customer. Our social security numbers are also a good example of this. A blank node is a means of representing an entity without an explict identifier. The entity is identified by virtue of the properties that are associated with it. Here’s a real world example: “ That person has a child” That person <Entity 1> Has-a <Verb> Child <Entity 2> By virtue of the relationship (has-a child) we know that the subject of the sentence is a parent. We’ve identified the entity by virtue of the property associated with it. That person <Entity 1> Has-a <Verb> Child <Entity 2a> and a Husband <Entity 2a> Now we can infer that Entity 1 is { Woman, Wife, Mother } on the basis of these three associated properties. Blank nodes are heavily used in entity profiling. For example, terrorism. Does someone have the attributes of a terrorist? Attributes in this case would refer to properties. If a “blank node” (unidentified entity) has many candidate properties that fit the profile, it may fall into that sub-class. Blank nodes are a very powerful technique, because it allows the Predicates within an Ontology to define the class that the node (RDF individual) belongs to. The idea of blank nodes has a basis in meronymy – “the semantic relation that holds between a part and the whole” Refined example: “ JD was seen yesterday” What do we know about JD? Not much, if anything. Not even enough to assume JD is a person. So we have this: BNODE<JD> “ Jd picked up her child from school” Now we know JD has a child. This allows us to type the B-Node <JD> like this: BNODE<JD> a Person; a Parent; hasChild BNODE<C>. BNODE<C> a Person; hasParent BNODE<JD>. We don’t know the gender of BNODE<JD> or very much about BNODE<C>. “ Jd called her husband at 3:15 PM” Now we can refine our information model this way: BNODE<JD> a Person; a Woman; a Parent; a Mother; hasChild BNODE<C>; hasHusband BNODE. BNODE a Person; a Man; a Parent; a Husband. hasWife BNODE<JD>; hasChild BNODE<C>. BNODE<C> hasParent BNODE<JD>; hasMother BNODE<JD>; hasParent BNODE; hasFather BNODE. Note that we don’t remove any of our types though some are clearly superceded by others. “a Person; a Man” is surely redundant, in the sense that our Ontology model would almost certaintly classify Man rdfs:subClassOf Person and Man owl:inverseOf Woman, but there’s no need to try to normalize properties and remove redundancy. Redundancy will not affect the model integrity and is not a bad thing. Side-note: How would we model the phone call between husband and wife? Might be many ways, but try reification: ANON<COMMUNICATION> Subject: BNODE<JD> Predicate: calls Object: BNODE rdfs:timestamp /timestamp/ References: http://www.w3.org/TR/rdf-primer/
I also felt it was useful to distinguish between anonymous nodes (exsistentialy quantified variables) and nodes with identifiers to which it is useful to refer . the "Dan lives in some thing that is in Texas" can indeed be written "Dan lives in some thing X, which is in Texas", if one says nothing else about X. When switching between syntaxes, or simply in reformatting an RDF document, it may become impossible to let an anonymous node remain anonymous - so generated Ids (genids) have become the norm. But in an engineering system one might be tempted to. It is also in practice much more readable to use new regenerated local identifeirs for anonymous nodes in the output of a system which has merged data from several sources. So I found I was tracking the bit which repreentedt that an Id was arbitrarily generated.[4] So basically, the difference exists only in theory. In practice (all the implementations I am familiar with), every anonymous node has an ID. (cmtrim) References: http://www.w3.org/DesignIssues/Anonymous.html
Contact sje@us.ibm.com for Jena API questions/support
Because of the inherently distributed knowledge model of the Semantic Web, OWL makes an open world assumption. This assumption has some significant impacts on how information is modeled and interpreted. The open world assumption states that the truth of a statement is independent of whether it is known. In other words, not knowing whether a statement is explicitly true does not imply that the statement is false. The closed world assumption, as you might expect, is the opposite. It states that any statement that is not known to be true can be assumed to be false. Under the open world assumption, new information must always be additive. It can be contradictory, but it cannot remove previously asserted information.[1] Most systems operate with a closed world assumption. They assume that information is complete and known . For many practical applications this is a safe, and often necessary, assumption to make. However, a closed world assumption can limit the expressivity of a system in some cases because it is more difficult to differentiate between incomplete information and information that is known to be untrue. Returning to the example of Figure 4-6, there is no straightforward way to model the fact that Mike Smith may or may not be an employee. In a system that makes a closed world assumption, there are only two things in the world: employees and not employees.[1] No Unique Names Assumption The no unique names assumption states that unless explicitly stated otherwise, you cannot assume that resources that are identified by different URIs are different. Once again, this assumption is quite different from those of many traditional systems. In most database systems, for instance, all information is known, and assigning a unique identifier, such as a primary key that is consistently used throughout the system, is possible. Like the open world assumption, the no unique names assumption impacts inference capabilities related to the uniqueness of resources. Redundant and ambiguous data is a common issue in information management systems, and the no unique names assumption makes these issues easier to handle because resources can be made the same without destroying any information or dropping and updating database records.[1] Design note: Q: Should individuals be placed within an OWL model? A: typically this is fine for demos with small datasets, but beyond that recommend contraining OWL model to classes/subclasses only, and using a triple store as a place for storing instance data (individuals). Instance vs Subclass Need to understand the difference between (a rdf:type b) and (a rdfs:subClassOf b). (set theory) In the first case, we are stating that “a is a member of the set b”. In the second case we are stating that “a is a subset of the set b” To restate using OWL terms: “ a is a type of b” “ a is a subclass of b” References: [1] Chapter 4 - Incorporating Semantics Semantic Web Programming by John Hebeler, Matthew Fisher, Ryan Blace and Andrew Perez-Lopez
Triples in red are inferrred. Because beta is a sub class of alpha, then the members of beta are likewise members of alpha
Inferred tripels are italicized. There is a lot that this model doesn’t say. We could further subclass Dignitary to have royalty or various heads-of-state, or even down to the level of King, Queen, etc for Queen Elizabeth. Also, President doesn’t take into account which country is being spoken of or if the president is actively serving or has retired.
Inferred tripels are italicized. There is a lot that this model doesn’t say. We could further subclass Dignitary to have royalty or various heads-of-state, or even down to the level of King, Queen, etc for Queen Elizabeth. Also, President doesn’t take into account which country is being spoken of or if the president is actively serving or has retired. Classes Country America Citizen AmericanCitizen (bornIn some America) President ActivePresident (rdfs:subClassOf ActiveEmployee) InactivePresident (rdfs:subClassOf InactiveEmployee) AmericanPresident (employedIn some America) Employee ActiveEmployee ActivePresident RetiredEmployee InactivePresident Barack Obama rdf:type President; employedIn America . Predicate <Citizen> bornIn <Country> <Employee> employedIn <Country>
The specfication of domain and range for any property doesn't act as constraint. It just acts as an axiom in OWL.
When could something like this ever be useful? In unstructured text analytics! If I do “is-a” pattern extraction from NYT, and find out that Slovakia is-a place Slovakia is-a country Slovakia is-a land Then I might start to infer that (place == country == land) with some degree of confidence (DoC). The DoC depends on how often this inverse functional property holds true. The inverse functional property is a great way to extract synonyms or variations from text!
Synonyms are both Transitive and Symmetric
The OWL 2 construct AsymmetricObjectProperty allows it to be asserted that an object property expression is asymmetric - that is if the property expression OPE holds between the individuals x and y, then it cannot hold between y and x. Note that asymmetric is stronger than simply not symmetric.[1] For example, in mereology, the partOf relation is defined to be transitive, reflexive, and antisymmetric References: http://www.w3.org/2007/OWL/wiki/New_Features_and_Rationale#F6:__Reflexive.2C_Irreflexive.2C_and_Asymmetric_Object_Properties p ∈ ICEXT( I (owl:AsymmetricProperty)) iif p ∈ IP , ∀ x , y : ( x , y ) ∈ IEXT( p ) implies ( y , x ) ∉ IEXT( p ) http://owl.semanticweb.org/page/New-Feature-AsymmetricProperty-001-RDFXML
In a social network, Peter knows JimBob. Use of the reflexive property allows us to cover the obvious case – Peter knows Peter and JimBob knows JimBob. Or in partonomy, “car is a part of a car” http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/simple-part-whole-relations-v1.3.html A property P is said to be reflexive when the property must relate individual a to itself. In Figure 4.25 we can see an example of this: using the property knows, an individual George must have a relationship to itself using the property knows. In other words, George must know herself. However, in addition, it is possible for George to know other people; therefore the individual George can have a relationship with individual Simon along the property knows.
Irreflexive If a property P is irreflexive, it can be described as a property that relates an individual a to individual b, where individual a and individual b are not the same. An example of this would be the property motherOf: an individual Alice can be related to individual Bob along the property motherOf, but Alice cannot be motherOf herself (Figure 4.26)
Property chains are used to relate various categories the father of your father is your grandfather the wife of your brother is your sister-in-law the son of your sister is your nephew This works great for genealogies and I suspect that’s what it was created for. I’m certain there are other uses to – it seems like a convenient property. How does this differ from a Transitive property? Similar, but just involves a semantic renaming of a property. Also of note this is not limited to just two triple as shown above. A property chain can be enacted over 2..* triples. Can a property chain be enacted over an existing property chain? eg. hasGrandfather o hasFather = hasGreatGrandfather I’m not sure … For those with a maths background or bent, a property chain is similar to a functor: http://en.wikipedia.org/wiki/Functor Please don’t take away the wrong idea from this slide – there is no need to understand functors in order to understand quite simply what’s happening here: hasFather o hasFather = hasGrandfather
“ Turn on” is set as a synonym for “Power on”, and “switch on” for “turn on”. Given the predicate properties that are checked here, all of these words are now synonyms of each other. Power on and Switch on have no direct relationship in the explict world, but are related symmetrically via turn on. Note that while a synonym is both transitive and symmetric, an acronym is neither. Digital Video Disc hasAcronym DVD Acronyms are typically not transitive (this would imply there was an acronym that represented an acronym). If the acronym was symmetric, this would the same as saying DVD hasAcronym Digital Video Disc Which would likewise be incorrect. It has been said that there are no exact synonyms in the english language; every variation has a subtle difference in meaning (perhaps given the origins of either Germanic-Saxon, Anglo-Norman or Latin). However, the predicate does not need to reflect this nuance (though it could if the modeler chose).
Relation to classic Mereology The classic study of parts and wholes, mereology, has three axioms: the part-of relation is Transitive - "parts of parts are parts of the whole" - If A is part of B and B is part of C, then A is part of C Reflexive - "Everything is part of itself" - A is part of A Antisymmetric - "Nothing is a part of its parts" - if A is part of B and A != B then B is not part of A. OWL does not have built-in primitives for antisymmetric or reflexive properties, nor is there any work-around for them. In most cases this causes no problems, but it does mean that if you create a cycle in the part-of hierarchy (usually by accident) it will go unnoticed by the classifier (although it may cause the classifier to run forever.) Furthermore, in mereology, since everything is a part of itself, we have to define "proper parts" as "parts not equal to the whole". Whereas in OWL we have to do the reverse: i.e. define "parts" (analogous to "proper parts") and then define "reflexive parts" in terms of "parts". A number of other relations follow the same pattern as faults, e.g. "Repairs on a part are kinds of repairs on the whole". However, not all relations follow this pattern, e.g. "Purchase of a part is not purchase of the whole" (you can buy the wheels off a car without buying the car). mechanic repair wheels mechanic repair car buyer purchase wheels buyer purchase car (NO) References: http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/
Distinguishing parts from kinds Although both part-whole relations and subclassOf generate hierarchies, it is important not to confuse the part-whole hierarchy with the subclassOf hierarchy. This is easily done because in many library and related applications, part-whole and subclass relations are deliberately conflated into a single "broader than / narrower than" axis. For example consider the following:Vehicle Car Engine Crankcase Aluminum Crankcase "Car" is a kind of "Vehicle", but "Engine" is a part of a "Car", "Crankcase" is a part of an "Engine", but "Aluminum Crankcase" is a kind of "Crankcase". Such hierarchies serve well for navigation, however they they are conflating the two relations (partOf and subClassOf). Statements about "all vehicles" do not necessarily, or even probably, hold for "all engines". Such hierarchies do need to be recreated in situations that obey the rule "A fault of the part is a kind of fault of the whole". References: http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/
What’s wrong with this model? Nothing if it’s a taxonomy for hierarchal navigation for example But if this is an Ontology, there are some problems Classes must be read in an “is-a” relationship – “Engine is-a Car” is wrong (etc) Example from reference[1] References: http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/
Here’s a better way of creating an Ontology (from prior example)
What are common predicates used in the industry? Best would be to have a large number of OWL files and run a frequency analysis on these. It’s helpful to have suggestions to see what other people are doing and as a way of following best practices. This slide is not prescriptive – just making suggestions to help in modeling. Also, a knowledge of certain common predicates (hasPart/partOf) can even help avoid common pitfalls (in this case partonomy)[1] hasLocus[1] - the scene of any event or action In certain domains, most notably medicine, we generally understand that while body parts (e.g. a heart) can exist outside of a body, they do not normally do so. Thus it makes sense to say, in general, "A fault in the heart is a fault in the body," without having a particular heart or body in mind, and it makes sense to reason over classes defined that way. For other domains, most notably manufacturing, it is more common for parts to exist outside of some whole, and so it may not generally be true that a fault in an engine is a fault in a car (if the engine is not in a car), just as it may not be generally true that an engine is a car part. In these cases, the capability to reason over classes may not be that useful, and again the existential restriction on the direct properties may not make sense.[1] References: http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/
*The Role of Semantic Models in Smarter Industrial Operations. 30 Mar. 2012. developerWorks. <http://www.ibm.com/developerworks/industry/library/ind-semanticmodels/ >.
Although every property could have an inverse, we choose one preferred direction to keep the model small and understandable. Providing all inverses could be done in a supplemental profile. One exception to this rule is prov:wasGeneratedBy's inverse: prov:generated, which is included because of goal 1. When an asserter is describing an Activity (a principal Element), they should be able to describe it as a subject. prov:generated is needed to do this. [1] References: http://www.w3.org/2011/prov/wiki/ProvRDF#ProvenanceOntology.owl

Ontology and semantic web (2016)

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Ontology and semantic web (2016)

Similar to Ontology and semantic web (2016) (20)

More from Craig Trim

More from Craig Trim (15)

Recently uploaded

Recently uploaded (20)

Ontology and semantic web (2016)

Editor's Notes