Ontology and semantic web (2011)


Published on

An Ontology is a description of things that exist and how they relate to each other. Ontologies and Natural Language Processing (NLP) can often be seen as two sides of the same coin.

Published in: Technology
1 Comment
  • This has ref to Slide 5. In the relation "WrittenIn" 1876 is shown as object. However, 1876 is a parameter of time but not an object strictly. So,to keep the structure compact (instead of too many independent triples), is it not better to define the relation "Authored" with parameters (in "year"). Then the relation is still "Authored" and year of authoring can be given as a suitable parameter. PVN 23JUL15
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • So in the scope of an hour we can hardly hope to exhaust this subject – it’s sort of like having an hour to talk about Relational Databases. Where do you begin? With data normalization? 3 rd normal form? ERD design and tools? RDBMS implementations like DB2? Index optimization? Loading and retrieving data? SQL? JDBC connectivity? JPA and entity managed beans? It’s a wide topic. Same with this. So we’ll touch on a few underlying points. Our team has bi-weekly calls in this space so if you’re interested in further information, just let me know and I’ll sign you up.
  • The first part is “Triples”: Talk about what a triple is, and how multiple triples form a semantic chain. A semantic network is a collection of semantic chains. Triple > Semantic Chain > Semantic Network Talk about reification and why that’s important (using a triple in place of a subject or object; being able to make a statement about a triple) Talk about confidence levels; how to implement them and when to use them. The second part is Ontology Design: How does the Ontology fit into this?
  • Let’s start by looking at the data. This is a triple store. Or a semantic network. Or a knowledge base. The terms are often used interchangeably. Basically – it’s a bunch of connected nodes. There is no underlying schema in the sense of an RDBMS. Nodes are related to each other by means of edges, or relationships – in triple store parlance these relationships are referred to as “predicates”. A predicate connects one node to another. A connection from one node to another node by means of a predicate, is referred to as a “triple” On the next several slides, we’re going to go through an example of decomposing a natural english sentence (unstructured text) into triples. So this takes a very basic understanding of english grammar – recognizing verbs and nouns and nouns that function as objects of a verb, and those that function as subjects of a verb. We’ll start with a basic sentence that resolves to a single triple, and work our way up a more complex semantic chain – a collection of related triples. And we’ll also begin to make assertions about various triples in our network – some of the data we trust, some of it we might not.
  • So here’s our first triple. “The author of Hamlet is Shakespeare” (or, Shakespeare wrote Hamlet, Shakespeare is the fellow what done wrote the play named Hamlet, etc) We abbreviate this sentence into a triple: Shakespeare authorOf Hamlet So we’ve decomposed our data into a triple. We can reverse the first triple by saying that: If Shakespeare is the author of Hamlet Then The author of Hamlet is Shakespeare This may seem trivial. And perhaps it is. But the important thing to note here is that the intelligence for the data is maintained at the level of the data; not in the application. We don’t have to maintain a business rule within our application layer that states if “A” has a given relation to “B”, then “B” must have a given relation to “A”. We can simply assert within the data that the predicate “authorOf” has an inverse predicate named “hasAuthor”. So it was not necessary for us to explicitly assert anywhere that Hamlet was written by Shakespeare; we simply “know” this because We know that Shakespeare was the author of Hamlet And “authorOf’ has a inverse relationship to “hasAuthor” At the point of the first slide I want you to understand what a triple is: Subject and a Object connected by a predicate “ Shakespeare” and “Hamlet” may be interesting in and of themselves, but the connection between the two is valuable. It tells us something important about these two items. And if we encounter either Shakespeare or Hamlet, in the course of parsing unstructured data, we now have a semantic reference point for both of them. So now you might be asking - why would we want do this? Why not decompose this data into a relational database or some other data mechanism? We could have a database table named “Authors” and another table called “Books” and perhaps create a third lookup table that associates authors to books. That’s another option. So far we haven’t made much of a case for decomposing our data into triples. But a couple of things – we’re only at the first slide, and I’m not trying to talk anyone into using triple stores over an RDBMS. Some data is a match for semantic networks, some data isn’t. I am hoping this presentation will give you a better sense of when a triple store might be a good fit. Each triple represents a statement of a relationship between the things denoted by the nodes that it links. Each triple has three parts: a subject, an object, and predicate (also called a property) that denotes a relationship. The direction of the arc is significant: it always points toward the object.[1] References: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-data-model
  • So let’s say we log onto Wikipedia or some other trusted internet source, and we want to understand this sentence: “ Shakepeare wrote Hamlet in 1876” Now we have two triples. And note that “Hamlet” functions both as the object of one triple and as the subject of another triple. This is not only perfectly valid, but is fundamental to the power of triple stores. These two triples form a “semantic chain”. A semantic chain is defined as two or more triples, that taken together, form a statement. In isolation, “Shakepeare authorOf Hamlet” and “Hamlet writtenIn 1876” are useful, but taken together, this semantic chain can help answer the question: “What did Shakespeare write in 1876?” The answer is not only obvious, but more importantly, it is computationally simple.
  • Useless: “ Wikipedia states Shakespeare” True: Shakespeare authorOf Hamlet False: Hamlet writtenIn 1876 What we want to do in a semantic network (triple store) is not only add data, but add our sources for the data. Then we can begin to associate confidence levels with those sources. When building a triple store you could have multiple sources for data – 100s or 1000s of different sources – whatever. Data can come from structured respositiories, from unstructured internet-based sources (like forums or user communities), or semi-structured locations like dbpedia. Or you can open up your triple store to a community of users and allow them to add data. So again, some sources are trust worthy, and some aren’t. It’s up to you to make that distinction. But here’s how you enable it in your data. So let’s examine this semantic chain again – it’s actually not correct. Let’s look closely at what it’s saying. We have 3 connected triples - … What we really want to say is “Wikipedia states (Hamlet writtenIn 1876)” So we actually want to make a statement about a triple. Now up until now, we only looked at predicates that were related to single nodes. But predicates can also be related to triples.
  • (Wikipedia states Shakespeare), (Shakespeare authorOf Hamlet), (Hamlet writtenIn 1876) Without reification what do we have? one useless statement, one true statement, one false statement With reification, what do we have? Wikipedia states (Hamlet writtenIn 1876)
  • Question: When was Hamlet written? Answers: 1599 1876 It is not uncommon for a Knowledge Base (triple store) to have multiple answers to a question. How can we assign confidence levels to triples in order to rank answers by most probable to least probable? The simplest way (in this case) would be to say that anything from Wikipedia has a low confidence and anything from ShakepeareOnline has a high confidence. Why do we need confidence levels? Your source data for the knowledge base (triple store) will come from multiple sources. Some of these sources will be trustworthy, others will be questionable. For example, some data might come from structured sources, such as product catalogs. This type of data is typically very trustworthy. Other data might come from SMEs. Typically this data has high confidence as well. Data with a lower confidence might come from the crawling user forums on the internet. Content for the KB might be crowdsourced too; confidence in data obtained by this means might vary by user.
  • So how can I express a confidence level around each of these assertions? I have asserted that: ShakespeareOnline states (Hamlet writtenIn 1599) And now I want to assert that (ShakespeareOnline states (Hamlet writtenIn 1599)) hasConfidenceLevel 90 (the confidence level is arbitrary; I use a scale of 1-100, but you can use whatever you want)
  • So now if the question is asked: When was Hamlet written? We can either give both answers with their respective probabilities and let the user decide, or, given that one answer has a much higher confidence than another, simply return our most confident answer. Now here’s a question for the audience: How many triples do you have in the each diagram on this slide? The answer is 3. Reading from inner most to outermost, Triple 1: “Hamlet writtenIn 1876” Triple 2: Wikipedia states Triple1 Triple 3: Triple2 hasConfidenceLevel 90 Take aways: Reification is a powerful feature of triple stores. Reification can be taken to any level All right, this has been pretty fast and pretty advanced. But we covered in just a few slides: What a triple is (Subject Predicate Object) What a semantic chain is (2+ triples) How to make statements about triples (reification) How to make statements about statements about triples (reification)
  • Going to keep this section short for sake of time. We will pick up on Ontology Design and Implementation on a future presentation. To make an analogy, it could be said that what an ERD is to a relational database, an Ontology is to a triple store. Don’t want to take this analogy to far; but it’s a good introduction. In our previous example we had several “types” of things we were looking at: Authors, Books, Plays, Years, Sources, Characters
  • An Ontology contains “Classes” and “Predicates”. A Class is a set of things that can be either the subject or the object of a triple. If I create a class called “Author”, a member of that class (or set) can be “William Shakespeare” or “Christopher Marlowe”. A class can have 0..* members. A class can also have 0..* sub-classes. A sub-class of “Author” might be “Playwright” Note that we say “what relationships could exist between these types”. We’re looking somewhat beyond our source data at the moment, and beginning to consider reality from a more objective standpoint. Let’s forget about what our source data asserts; what relationships could exist between Authors, Playwrights, Books and Plays? And how are those relationships related to each other? (that last question is beyond the scope of the current set of slides, but still an important one when designing an Ontology) Note: It’s important not to apply Object-Oriented (OO) thinking to Ontology design. The two are not related, even though the terminology frequently overlaps (classes, inheritance, etc).
  • I’ve created a simple Ontology with 2 classes: Author and Book Each class has a sub-class. Author has sub class Playwright Book has sub class Play The “Play” class has 3 members (Hamlet, Macbeth, Faustus) The “Playwright” class has 2 members (Shakepeare, Marlowe) Because we’ve asserted that Shakepeare is a Playwright, and Playwright is a subclass of Author, then we can infer that Shakepeare is an Author. This brings up an important point of inference: There are explicitly stated facts (triples) And implicit triples that can be inferred (or derived) from those facts Inference is a powerful feature of Ontologies and Triple Stores. Back to slide 11, so with Ontologies we look at the things that are and determine how they are related to each other. That way when we encounter types of things in our unstructured data, we now know what they are, and how they are related to other things in our domain. Let’s look at a real world example now. We’ll segue from this into how triple stores and Ontologies fit into an overall NLP architecture.
  • So we have an NLP parser (LanguageWare). LW produces annotated text; that is, text annotated with not only syntactic information (like what part of speech a word is; a verb, noun, adverb, adjective, etc), but the text is also annotated semantic information. A word might be recognized as a product “Rational Software Architect” or as a dignitary “President Barack Obama” or as a location “Haifa Research Lab”. Text analytics can only go so far with simple POS (part-of-speech / syntactic) tagging. The semantic annotations are likely to add the most value. So where does this semantic information come from? From the dictionaries you say. Yes, that is true. Every “tag” or “annotation” in an NLP parser has an associated dictionary. So we can create an annotation called “Author” and if we have a dictionary of authors (Shakepeare, Marlowe, Dickens, etc), we can be reasonably confident of recognizing an author in unstructured data when we encounter one (based on the sufficiency and size of our dictionary). But where does the dictionary come from? If you want a dictionary of authors or companies or products or stock symbols, likely you can search online and find CSV or flat text files with this information. And that’s always a good option. But let’s consider this carefully for a moment. Where are these dictionaries coming from? What is their purpose? And what do they relate to? A dictionary
  • An author annotation is based on an Author dictionary. The annotation of “William Shakespeare” as an Author is an implict triple: William Shakespeare a Author So some unstructured data was pulled off of Wikipedia and annotated using LanguageWare. We now recognize that Shakespeare is an author. But let’s call something out there: This is unstructured data. How can we transition from unstructured  structured data?
  • How do we move from unstructured data to structured data? Remember in our previous slide (12) when we talked about all the different types of “things” that existed in the statement: (Shakespeare authorOf (Hamlet written in 1599)) states Wikipedia We have at least this many “things”: Authors Books Plays (type of Book) Playwrights (type of Author) Sources (e.g. as in of Information) Characters Dates (Years) If we have the right dictionaries, and create the right annotations, we can recognize all these “things” in unstructured text using an NLP parser. But how do these “things” relate to each other? What’s the relationship between an author and a book?
  • Now, I realized I’ve really marked up this sentence and there’s arrows and red text going all over the place. So let’s examine this closely. We’ve only recognized (e.g. annotated) two words in this entire sentence: William Shakespeare as a Playwright and Hamlet as a Play. But look at the depth of the understanding that we have. There’s a model depicted on this image, and we want to examine this more carefully. You’ll notice first of all that there are a total of 6 annotations represented on the diagram with arrows flowing between them. These annotations are produced by the NLP parser, and modeled (here’s the key point), they are modeled in the Ontology. It’s in the Ontology that we specify how a Book is related to a Date, or to a Language, and a Language to a Country to an Author, to a work produced by that Author, and so on. Each annotation is backed by a dictionary. The data for that dictionary is generated out of the triple store that conforms to the Ontology. The Ontology shows the relationship of all the annotations to each other. The annotation of “William Shakespeare” as an Author is an implict triple: William Shakespeare a Author We are now beginning to transition from unstructured data into the realm of structured data; if we know that William Shakespare is an Author, we also know that Authors live in Countries; that Authors write books that are published on certain dates and written in certain languages, etc. There’s an entire semantic chain of information that can be derived from this sentence – and that’s the point! Further, the Ontology helps us to understand what data we’re missing. If the NLP parser has recognized the author and the title, what hasn’t it recognized? It appears that all books are published on a date. So let’s look for the date – it’s in there. Further, it appears that a language is involved too – we can find that as well. To summarize, the Ontology gives us the relations that exist between annotations. It helps us to understand each annotated token in a larger context (the context of a semantic chain and semantic network). It also helps us to understand what information we are missing, and what else we need to look for. Are you faced with a large corpus of unstructured data? Where do you begin? How do you even know where to start looking and what you should start looking for? A model can help clarify this. The Ontology is your link into the real world. Without an Ontology, the annotations used by the NLP parser can become somewhat random. Who decides what an annotation should be named? Are they making this decision in coordination with what already exists? What modeling discipline exists? In past projects without an Ontology model, the NLP annotations over time had no link to the real world. Some one joining the project wouldn’t know what a “RemainingUsefulWord” or a “PowerActionWord” was – there’s no just way. If these had been designed in the discipline of an Ontology model, this discipline would have enforced a better standard in terms of naming, and likewise provided a link to the real world. Consider the diagram above. We may never annotate the source text for Language, Date or Country. Then again, maybe we would – but we don’t need to. The point is, these concepts still provide value, because they give us the context and domain understanding of the concepts that we do use as annotations in our NLP parser (like Book, Play and Author). This is an important point: Not every Ontology class needs to be associated with an annotation/dictionary in your NLP parser. In an extreme example, you might have an Ontology model with 15 classes and only one of them is used in the NLP parser. Also note: There is no constraint toward a single Ontology model. Multiple ontology models can be used. It is likewise not a necessity that Ontology models must be related, either integrated peer-to-peer or via an “Upper Ontology”. The need may exist, but it depends on circumstance. Maintaining multiple models, each as a context around a particular annotation, or annotation set, is a valid solution. It may even make collaborative team efforts simpler.
  • So now we move onto this slide. This component model illustrates a point that was made in the previous slide (17). Rather than the diagram on page 14, where the NLP parser is operating in isolation from a larger semantic network, now we have added the context to data that a semantic network provides. We are beginning to add structure to our unstructured data. And this is largely what the big picture looks like. Note another interesting aspect to this diagram. What comes first – the dictionaries or the triple store? This is somewhat of a chicken-and-egg syndrome. Typically, a project that is just starting up will bootstrap the process by using out-of-the-box dictionaries with perhaps some other structured data that has been provided. The key point to notice here is that the output of the NLP parser is annotated text that has two purposes: The first purpose of annotated text is input to the text analytics portion of the project. After all, this is the main purpose of this technology; provide some insight into the unstructured data However, there is a second benefit. The annotated text can also be used to enhance the triple store , which will in turn result in enhanced dictionary generation, which will in turn result in enhanced NLP parser annotations. Annotated text from the NLP parser can be examined – the most obvious application is to find the “unmatched” tokens – that is, the tokens that the NLP parser did not recognize. These are the result of “gaps” in the understanding of this semantic architecture. Unmatched tokens can be classified according to the Ontology model. For example, if the tokens “Mark” and “Twain” were not not recognized by the NLP parser, the compound token “Mark Twain” can be added to the triple store (Mark Twain a Author). The next time the dictionaries are generated, the author dictionary will contain the “Mark Twain” token and any further encounter of this name in text will result in a positive match. It is beyond the scope of this current slide deck to discuss how the triple store is loaded and how dictionaries are generated.
  • So we are using LanguageWare and we annotate this unstructured text. The yellow-highlighted text are the tokens of interest to us. Now how did we know to annotate these particular tokens? The area of machine learning, or building up a contextual understanding from the domain, is not a purely automated one. The process can be acclerated through the use of the proper tools (LW and ICA come to mind), as well as by using token recognition techniques via the underlying grammar, or through other methods. Some of these methods are discussed in later slides. For now, let it be sufficient to say that these tokens have been recognized, and from them we are able to derive the semantic chain shown on the next slide (20).
  • From the unstructured text on the previous slide, we were able to construct this semantic chain. So let’s say we have an interactive application that attempts to understand user input and react accordingly. If a user types in “topas” we can now place this within the context of the semantic chain shown on this slide. We are able to infer that the user is talking about AIX, and that the user is likely attempting to monitor CPU usage. Note that we can’t infer very much if the user types in AIX. If the user inputs AIX, we can’t necessarily infer that the user is talking about the “topas” command. The user is just as likely to be referring to something else in connection with AIX. AIX is a common token that likely occurs within multiple semantic chains (and would in fact be a key node in the entire semantic network). Some tokens (like “topas”) fulfill the role of “triggering token”. How these tokens are recognized is beyond the scope of this slide deck, but the recognition can involve either a manual designation or an algorithm applied against the triple store to find tokens that potentially fulfill this role (refer to Phil Tetlow’s work in this area).
  • This SPARQL query will retrieve the specific AIX command that monitors CPU usage. Note that none of the specified variables (such as AIX or CPU) are required, but are used to narrow down the query results. If AIX was left as a variable, then this query would return all commands, regardless of platform, that fulfilled the given critieria. SPARQL is triple-based query language, and is familiar to anyone who is familiar triple-based syntax.
  • Ok, so this is just a cool slide that was thrown in. Inference was briefly mentioned on an earlier slide (13). Inference is the ability for implict (or inherent) triples (or knowledge) to be constructed from existing data. Inference is something we perform all the time in our minds without realizing it. If you see someone entering the office shaking off a wet umbrella, you may reasonably infer that it was raining outside. You don’t need to have been outside, nor do you need to look out the window to perform this inference. When you create an Ontology it conforms to an OWL profile. OWL stands for “Web Ontology Model”. There are many different OWL profiles, and I believe all of them are vendor-netural and open standards. An OWL profile specifies all the different types of inference that may be performed on the model. Again, it’s beyond the scope of this slide deck to talk about all the inferences that can be both modeled and peformed. Suffice for now to state that this is a key feature of an Ontology that sets it apart from say a relational database. If you create an RDBMS, you know everything that there is to know in advance about your domain. We’ve all been on projects where the ERD changed half-way through and it throws everything off. Much of the application layer logic is built on top of assumptions made within the ERD and any change can have a wide impact. When you build out an ERD you specify all your relationships (PKs, FKs, AKs, etc) in advance. You don’t wake up one day and say – hey! There’s a link between those two tables, and have I no idea how that came about! But in the context of an Ontology and Triple Store not only is that what can happen, but that’s what should happen! The relationships can and will surprise you. Here’s three companies that operate in a similar space – CIA, MI6 and Facebook. Each has knowledge used to make certain connections. If 5 of your friends like a certain book or movie, chances are you might too. And so on. There’s guilt by association. But the point is, you don’t have to build out your entire Ontology in advance prior to populating the triple store. You can populate a triple store with a very lightweight Ontology, and as you begin to encounter certain items and certain patterns in the text, you can refine and build our Ontology model. For example, the Support 123 Ontology model has these classes: Product Company A few months down the road, we were told that we could only help users with supported products. What’s a supported product? Anything made by IBM. So we created a subclass of Product called SupportProduct: Product SupportedProduct (madeBy IBM) Company IBM NonIBM
  • The only data that was explictly asserted (e.g. added to the triple store) was this triple (fact): Rational Software Architect hasMaker IBM I didn’t even have to say that “Rational Software Architect” was a product. Based on the Ontology model, anything that hasMaker <company> is a product. And not only that, I’ve just defined a sub-class called SupportedProduct that a sub-class (or subset of Product). Anything product that hasMaker IBM, is a supported Product. So from one simple triple, I’ve inferred several more. I did not have to make any changes in the application layer. The logic in the application layer simply needs to issue a SPARQL query that states: SELECT ?x WHERE { ?x a SupportedProduct } and “Rational Software Architect” will be returned. If I want to withdraw this rule, or otherwise refine it – perhaps state that Oracle/Sun products are supported (just as an example), the SupportedProduct class can be refined. I know we’re glossing over quite a bit here. We’re not even looking at an Ontology Editor (like Protégé or TopBraidComposer) that can make this happen. But that’s just for the sake of time. So if you flip back to the previous slide (22), you’ll notice in this 3d visualization of a triple store, all the green nodes represent asserted triples (if you look closely you can see green lines between them). The grey lines represent first and second order inferences that were made between the nodes. There is obviously a lot more knowledge available when you begin to leverage the power of inferencing. First order inference: if A  B  C then A  C if A  B  C and A  B  D then C = D (note: these are not intended to be taken as mathematical propositions that hold true in all cases. I’m simply attempting to illustrate that a “first-order inference” examines a single proposition before making an inference, and a “second-order inference” examines the result of two propositions before making an inference. An inference may examine as many propositions as necessary to come up with an inference; but the visualization on the previous slide only went as far as second-order inferences.
  • Simply by introducing a class in the Ontology called “SupportedProduct” we can now infer a classification for “WebSphere” and “RSA” that did not previously exist.
  • “ Tivoli Monitoring” comes from the IBM Product Catalog (PTI). “ ITM” as a synonym of “Tivoli Monitoring” comes from an official source within IBM. “ ITM agent” is located through searching the corpus for high-frequency compound nominals (in this case, phrases that start with “ITM”). NOTE: Reference slide 30 for detailed information on how the corpus was searched.
  • “ Tivoli Monitoring” comes from the IBM Product Catalog (PTI). “ ITM” as a synonym of “Tivoli Monitoring” comes from an official source within IBM. “ ITM agent” is located through searching the corpus for high-frequency compound nominals (in this case, phrases that start with “ITM”). NOTE: Reference slide 30 for detailed information on how the corpus was searched.
  • “ Tivoli Monitoring” comes from the IBM Product Catalog (PTI). “ ITM” as a synonym of “Tivoli Monitoring” comes from an official source within IBM. “ ITM agent” is located through searching the corpus for high-frequency compound nominals (in this case, phrases that start with “ITM”). The triples in red were inferred. Now, given that “ITM agent” was located, locate all the other phrases that have the word “agent”, and occur within the context of “Tivoli Monitoring” (refer to slide 30, next)
  • Note that “Unix O/S Agent” and “ITM Agent” both share common predicates to the same object. It can be posited that if two subjects that the predicates are shared to a single object, the greater the similarity is between the two subjects. “ ITM agent” and “Unix O/S agent” possess a given degree of similarity, in that the relationships they share (predicate-object paths) are identical. Keep in mind that this entire network (pictured above) has been constructed in an automated fashion. It may seem apparent to us (in application of human reasoning) that “ITM agent” and “Unix O/S agent” are similar, but this semantic network shows that in the context of “sending”, “scheduling” and “receiving” events, the two agents are identical. How far we want to take that concept of “identity” is up to the consumer of the semantic network. It has been well said that within the context of the English language, no two words are exactly synonymous; there are always subtle shades of meaning. And between “ITM agent” and “Unix O/S agent” there is certainly more than a subtle change in meaning. Nevertheless, the relationship here is clear; the two nodes posess a high degree of similarity.
  • The red line indicates the incorporation of a bayesian belief model into the direct graph. There is a given degree of plausbility that itm agent and unix o/s agent are functionality synonymous; this belief could be updated in the light of future information.
  • Staying within the corpus of “Tivoli Monitoring Agent”, this diagram shows a refinement of event. So there is a high probability that when an event is being talked about, it’s a Network Event, TEC Event, AIX Event, JMX Event or Omnibus Event (note: this is not the entire list; contents were constrained for diagram readability). So now we can say that: Tivoli Monitoring hasPart Tivoli Monitoring Agent schedules/sends/receives events of type Network, AIX, JMX, etc.
  • Some of this is really just hierarchal classification. The link between “TEC Event” and “TEC Adapter” is uncertain. “TEC adapter” was simply found by searching for “TEC” in the proximity of “TEC event” hits. In the cases, semantic interpolation performed by an SME may be necessary. Also, within the context of “Tivoli Monitoring” it may be useful to understand how “events” and “adapters” work together; what verbs tend to connect these two tokens? The following hierarchy is trivial to construct automatically: TEC Adapter Tivoli TEC Adapter Netview TEC Adapter Omegamon TEC Adapter In each case, the “child” token has a prefix.
  • This is in the context of “Tivoli Monitoring Agent” Lexeme (lemma highlighted): Tivoli Monitoring Agent ITM Agent itmagent The top 100 documents containing a match from this lexeme were returned from the Lucene index. The documents were then searched for high-frequency compound nominals containing “agent”. It is presumed that these tokens are related to “Tivoli Monitoring Agent”, likely as a sub-type.
  • These are the verbs that occur in proximity to “agent” in the context of “Tivoli Monitoring Agent” These are all things that an “agent” can “do”.
  • Partial list
  • When Johnny came home, Pasco waddled to the door, stamped his webbed feet, fluffed his wings, and sang "Quack ... Quack ... quaaack!" Question: What is Pasco? Answer: Pasco is a Thing (Entity) Well, that is accurate, but not very precise. Can we be more precise? What do we know about Pasco? "waddled" - type of walk "webbed feet" - body part "fluffed" and "wings" - body part (fluffed = feathers) "Quack" - sound What do we know about Pets in our Ontology? Duck walksLike Waddle, Shuffle, SillyWalk soundsLike Quack looksLike glossy green head, white neck ring, white tail, wings, yellow bill, orange webbed feet What we are saying is that anything that has a silly walk *might* be a duck. Even these (http://www.youtube.com/watch?v=IqhlQfXUk7w) might be ducks. In this case, in our sample sentence, we can see that Pasco walksLike Waddle soundsLike Quack looksLike Webbed Feet and that lines up to our Ontology definition of what a duck is. We can infer that Paso is a duck within a degree of confidence expressed as 0 <= x <= 100. In this case, we might say we are 100% certain that Pasco is a duck, assuming that each property of a duck gives us an independent certainty of 33 1/3%. So let's say we only had this text to go on: "Hi Pasco!", said Johnny. "Quack Quack!", said Pasco. In this case, we only have Pasco soundsLike Quack and again, assuming a 33 1/3% confidence per relationship, we are now 33 1/3% certain that Pasco is a duck. But he might be a Parrot that sounds like a duck. Or Pasco could be Johnny's little brother imitating a duck. Or Pasco could be ... anyThing. And if we have this: Pasco ruffled his feathers and reiterated, rather dryly, "quack, quack, quack". Now we have Pasco soundsLike Quack looksLike feathers and maybe that's enough to give us 66 2/3% certainty that Pasco is a duck. Note that we don't need to evenly divide probability among relationships in an entity. Perhaps sounds is more important to us. Or less important to us. Maybe sound is only worth 10%, and looks are everything. So anything we find with appearence has a higher probability. The rating belongs in the model and can be as simple or as complicated as it needs to be. The point is - blank nodes are a powerful mode of expression in RDF. We define what we do know (objective reality) in our Ontology, then parse our unstructured text to if anything we know is in the text. And if parts of things we know are in the text, then we know at least something.
  • Blank Node Recognition of entities by virtue of the properties the entity has, rather than by an explict identifier This is a paradigm shift from RDBMS thought. In a relational database, we would identify a customer using an explict identifer (the PK). This identifier has an important meaning throughout the database and in and of itself is representative of the customer. Our social security numbers are also a good example of this. A blank node is a means of representing an entity without an explict identifier. The entity is identified by virtue of the properties that are associated with it. Here’s a real world example: “ That person has a child” That person <Entity 1> Has-a <Verb> Child <Entity 2> By virtue of the relationship (has-a child) we know that the subject of the sentence is a parent. We’ve identified the entity by virtue of the property associated with it. That person <Entity 1> Has-a <Verb> Child <Entity 2a> and a Husband <Entity 2a> Now we can infer that Entity 1 is { Woman, Wife, Mother } on the basis of these three associated properties. Blank nodes are heavily used in entity profiling. For example, terrorism. Does someone have the attributes of a terrorist? Attributes in this case would refer to properties. If a “blank node” (unidentified entity) has many candidate properties that fit the profile, it may fall into that sub-class. Blank nodes are a very powerful technique, because it allows the Predicates within an Ontology to define the class that the node (RDF individual) belongs to. The idea of blank nodes has a basis in meronymy – “the semantic relation that holds between a part and the whole” Refined example: “ JD was seen yesterday” What do we know about JD? Not much, if anything. Not even enough to assume JD is a person. So we have this: BNODE<JD> “ Jd picked up her child from school” Now we know JD has a child. This allows us to type the B-Node <JD> like this: BNODE<JD> a Person; a Parent; hasChild BNODE<C>. BNODE<C> a Person; hasParent BNODE<JD>. We don’t know the gender of BNODE<JD> or very much about BNODE<C>. “ Jd called her husband at 3:15 PM” Now we can refine our information model this way: BNODE<JD> a Person; a Woman; a Parent; a Mother; hasChild BNODE<C>; hasHusband BNODE<P>. BNODE<P> a Person; a Man; a Parent; a Husband. hasWife BNODE<JD>; hasChild BNODE<C>. BNODE<C> hasParent BNODE<JD>; hasMother BNODE<JD>; hasParent BNODE<P>; hasFather BNODE<P>. Note that we don’t remove any of our types though some are clearly superceded by others. “a Person; a Man” is surely redundant, in the sense that our Ontology model would almost certaintly classify Man rdfs:subClassOf Person and Man owl:inverseOf Woman, but there’s no need to try to normalize properties and remove redundancy. Redundancy will not affect the model integrity and is not a bad thing. Side-note: How would we model the phone call between husband and wife? Might be many ways, but try reification: ANON<COMMUNICATION> Subject: BNODE<JD> Predicate: calls Object: BNODE<P> rdfs:timestamp /timestamp/ References: http://www.w3.org/TR/rdf-primer/
  • I also felt it was useful to distinguish between anonymous nodes (exsistentialy quantified variables) and nodes with identifiers to which it is useful to refer . the "Dan lives in some thing that is in Texas" can indeed be written "Dan lives in some thing X, which is in Texas", if one says nothing else about X. When switching between syntaxes, or simply in reformatting an RDF document, it may become impossible to let an anonymous node remain anonymous - so generated Ids (genids) have become the norm. But in an engineering system one might be tempted to. It is also in practice much more readable to use new regenerated local identifeirs for anonymous nodes in the output of a system which has merged data from several sources. So I found I was tracking the bit which repreentedt that an Id was arbitrarily generated.[4] So basically, the difference exists only in theory. In practice (all the implementations I am familiar with), every anonymous node has an ID. (cmtrim) References: http://www.w3.org/DesignIssues/Anonymous.html
  • Contact sje@us.ibm.com for Jena API questions/support
  • Because of the inherently distributed knowledge model of the Semantic Web, OWL makes an open world assumption. This assumption has some significant impacts on how information is modeled and interpreted. The open world assumption states that the truth of a statement is independent of whether it is known. In other words, not knowing whether a statement is explicitly true does not imply that the statement is false. The closed world assumption, as you might expect, is the opposite. It states that any statement that is not known to be true can be assumed to be false. Under the open world assumption, new information must always be additive. It can be contradictory, but it cannot remove previously asserted information.[1] Most systems operate with a closed world assumption. They assume that information is complete and known . For many practical applications this is a safe, and often necessary, assumption to make. However, a closed world assumption can limit the expressivity of a system in some cases because it is more difficult to differentiate between incomplete information and information that is known to be untrue. Returning to the example of Figure 4-6, there is no straightforward way to model the fact that Mike Smith may or may not be an employee. In a system that makes a closed world assumption, there are only two things in the world: employees and not employees.[1] No Unique Names Assumption The no unique names assumption states that unless explicitly stated otherwise, you cannot assume that resources that are identified by different URIs are different. Once again, this assumption is quite different from those of many traditional systems. In most database systems, for instance, all information is known, and assigning a unique identifier, such as a primary key that is consistently used throughout the system, is possible. Like the open world assumption, the no unique names assumption impacts inference capabilities related to the uniqueness of resources. Redundant and ambiguous data is a common issue in information management systems, and the no unique names assumption makes these issues easier to handle because resources can be made the same without destroying any information or dropping and updating database records.[1] Design note: Q: Should individuals be placed within an OWL model? A: typically this is fine for demos with small datasets, but beyond that recommend contraining OWL model to classes/subclasses only, and using a triple store as a place for storing instance data (individuals). Instance vs Subclass Need to understand the difference between (a rdf:type b) and (a rdfs:subClassOf b). (set theory) In the first case, we are stating that “a is a member of the set b”. In the second case we are stating that “a is a subset of the set b” To restate using OWL terms: “ a is a type of b” “ a is a subclass of b” References: [1] Chapter 4 - Incorporating Semantics Semantic Web Programming by  John Hebeler, Matthew Fisher, Ryan Blace and Andrew Perez-Lopez
  • Triples in red are inferrred. Because beta is a sub class of alpha, then the members of beta are likewise members of alpha
  • Inferred tripels are italicized. There is a lot that this model doesn’t say. We could further subclass Dignitary to have royalty or various heads-of-state, or even down to the level of King, Queen, etc for Queen Elizabeth. Also, President doesn’t take into account which country is being spoken of or if the president is actively serving or has retired.
  • Inferred tripels are italicized. There is a lot that this model doesn’t say. We could further subclass Dignitary to have royalty or various heads-of-state, or even down to the level of King, Queen, etc for Queen Elizabeth. Also, President doesn’t take into account which country is being spoken of or if the president is actively serving or has retired. Classes Country America Citizen AmericanCitizen (bornIn some America) President ActivePresident (rdfs:subClassOf ActiveEmployee) InactivePresident (rdfs:subClassOf InactiveEmployee) AmericanPresident (employedIn some America) Employee ActiveEmployee ActivePresident RetiredEmployee InactivePresident Barack Obama rdf:type President; employedIn America . Predicate <Citizen> bornIn <Country> <Employee> employedIn <Country>
  • The specfication of domain and range for any property doesn't act as constraint. It just acts as an axiom in OWL.
  • When could something like this ever be useful? In unstructured text analytics! If I do “is-a” pattern extraction from NYT, and find out that Slovakia is-a place Slovakia is-a country Slovakia is-a land Then I might start to infer that (place == country == land) with some degree of confidence (DoC). The DoC depends on how often this inverse functional property holds true. The inverse functional property is a great way to extract synonyms or variations from text!
  • Synonyms are both Transitive and Symmetric
  • The OWL 2 construct  AsymmetricObjectProperty  allows it to be asserted that an object property expression is asymmetric - that is if the property expression OPE holds between the individuals x and y, then it cannot hold between y and x. Note that asymmetric is stronger than simply not symmetric.[1] For example, in mereology, the  partOf  relation is defined to be transitive, reflexive, and antisymmetric References: http://www.w3.org/2007/OWL/wiki/New_Features_and_Rationale#F6:__Reflexive.2C_Irreflexive.2C_and_Asymmetric_Object_Properties p ∈ ICEXT( I (owl:AsymmetricProperty)) iif p ∈ IP , ∀ x , y  : ( x , y ) ∈ IEXT( p ) implies ( y , x ) ∉ IEXT( p ) http://owl.semanticweb.org/page/New-Feature-AsymmetricProperty-001-RDFXML
  • In a social network, Peter knows JimBob. Use of the reflexive property allows us to cover the obvious case – Peter knows Peter and JimBob knows JimBob. Or in partonomy, “car is a part of a car” http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/simple-part-whole-relations-v1.3.html A property P is said to be reflexive when the property must relate individual a to itself. In Figure 4.25 we can see an example of this: using the property knows, an individual George must have a relationship to itself using the property knows. In other words, George must know herself. However, in addition, it is possible for George to know other people; therefore the individual George can have a relationship with individual Simon along the property knows.
  • Irreflexive If a property P is irreflexive, it can be described as a property that relates an individual a to individual b, where individual a and individual b are not the same. An example of this would be the property motherOf: an individual Alice can be related to individual Bob along the property motherOf, but Alice cannot be motherOf herself (Figure 4.26)
  • Property chains are used to relate various categories the father of your father is your grandfather the wife of your brother is your sister-in-law the son of your sister is your nephew This works great for genealogies and I suspect that’s what it was created for. I’m certain there are other uses to – it seems like a convenient property. How does this differ from a Transitive property? Similar, but just involves a semantic renaming of a property. Also of note this is not limited to just two triple as shown above. A property chain can be enacted over 2..* triples. Can a property chain be enacted over an existing property chain? eg. hasGrandfather o hasFather = hasGreatGrandfather I’m not sure … For those with a maths background or bent, a property chain is similar to a functor: http://en.wikipedia.org/wiki/Functor Please don’t take away the wrong idea from this slide – there is no need to understand functors in order to understand quite simply what’s happening here: hasFather o hasFather = hasGrandfather
  • “ Turn on” is set as a synonym for “Power on”, and “switch on” for “turn on”. Given the predicate properties that are checked here, all of these words are now synonyms of each other. Power on and Switch on have no direct relationship in the explict world, but are related symmetrically via turn on. Note that while a synonym is both transitive and symmetric, an acronym is neither. Digital Video Disc hasAcronym DVD Acronyms are typically not transitive (this would imply there was an acronym that represented an acronym). If the acronym was symmetric, this would the same as saying DVD hasAcronym Digital Video Disc Which would likewise be incorrect. It has been said that there are no exact synonyms in the english language; every variation has a subtle difference in meaning (perhaps given the origins of either Germanic-Saxon, Anglo-Norman or Latin). However, the predicate does not need to reflect this nuance (though it could if the modeler chose).
  • Relation to classic Mereology The classic study of parts and wholes, mereology, has three axioms: the part-of relation is Transitive - "parts of parts are parts of the whole" - If A is part of B and B is part of C, then A is part of C Reflexive - "Everything is part of itself" - A is part of A Antisymmetric - "Nothing is a part of its parts" - if A is part of B and A != B then B is not part of A. OWL does not have built-in primitives for antisymmetric or reflexive properties, nor is there any work-around for them. In most cases this causes no problems, but it does mean that if you create a cycle in the part-of hierarchy (usually by accident) it will go unnoticed by the classifier (although it may cause the classifier to run forever.) Furthermore, in mereology, since everything is a part of itself, we have to define "proper parts" as "parts not equal to the whole". Whereas in OWL we have to do the reverse: i.e. define "parts" (analogous to "proper parts") and then define "reflexive parts" in terms of "parts". A number of other relations follow the same pattern as faults, e.g. "Repairs on a part are kinds of repairs on the whole". However, not all relations follow this pattern, e.g. "Purchase of a part is  not purchase of the whole" (you can buy the wheels off a car without buying the car). mechanic repair wheels mechanic repair car buyer purchase wheels buyer purchase car (NO) References: http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/
  • Distinguishing parts from kinds Although both part-whole relations and subclassOf generate hierarchies, it is important not to confuse the part-whole hierarchy with the subclassOf hierarchy. This is easily done because in many library and related applications, part-whole and subclass relations are deliberately conflated into a single "broader than / narrower than" axis. For example consider the following:Vehicle Car Engine Crankcase Aluminum Crankcase "Car" is a kind of "Vehicle", but "Engine" is a part of a "Car", "Crankcase" is a part of an "Engine", but "Aluminum Crankcase" is a kind of "Crankcase". Such hierarchies serve well for navigation, however they they are conflating the two relations (partOf and subClassOf). Statements about "all vehicles" do not necessarily, or even probably, hold for "all engines". Such hierarchies do need to be recreated in situations that obey the rule "A fault of the part is a kind of fault of the whole". References: http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/
  • What’s wrong with this model? Nothing if it’s a taxonomy for hierarchal navigation for example But if this is an Ontology, there are some problems Classes must be read in an “is-a” relationship – “Engine is-a Car” is wrong (etc) Example from reference[1] References: http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/
  • Here’s a better way of creating an Ontology (from prior example)
  • What are common predicates used in the industry? Best would be to have a large number of OWL files and run a frequency analysis on these. It’s helpful to have suggestions to see what other people are doing and as a way of following best practices. This slide is not prescriptive – just making suggestions to help in modeling. Also, a knowledge of certain common predicates (hasPart/partOf) can even help avoid common pitfalls (in this case partonomy)[1] hasLocus[1] - the scene of any event or action In certain domains, most notably medicine, we generally understand that while body parts (e.g. a heart) can  exist  outside of a body, they do not normally do so. Thus it makes sense to say, in general, "A fault in the heart is a fault in the body," without having a particular heart or body in mind, and it makes sense to reason over classes defined that way. For other domains, most notably manufacturing, it is more common for parts to exist outside of some whole, and so it may not generally be true that a fault in an engine is a fault in a car (if the engine is not in a car), just as it may not be generally true that an engine is a car part. In these cases, the capability to reason over classes may not be that useful, and again the existential restriction on the direct properties may not make sense.[1] References: http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/
  • *The Role of Semantic Models in Smarter Industrial Operations. 30 Mar. 2012. developerWorks. <http://www.ibm.com/developerworks/industry/library/ind-semanticmodels/ >.
  • Although every property could have an inverse, we choose one preferred direction to keep the model small and understandable. Providing all inverses could be done in a supplemental profile. One exception to this rule is prov:wasGeneratedBy's inverse: prov:generated, which is included because of goal 1. When an asserter is describing an Activity (a principal Element), they should be able to describe it as a subject. prov:generated is needed to do this. [1] References: http://www.w3.org/2011/prov/wiki/ProvRDF#ProvenanceOntology.owl
  • Ontology and semantic web (2011)

    1. 1. July 2011cmtrim@us.ibm.comOntologies and the Semantic Web © 2012 IBM Corporation
    2. 2. Outline Triples – Reification – Confidence Levels Ontology – Design – Architecture (big picture) – SPARQL – Inferencing Methodology – Creating a Semantic Network © 2012 IBM Corporation
    3. 3. © 2012 IBM Corporation
    4. 4. Triples Subject Predicate Object “The author of Hamlet is Shakespeare”  Shakespeare authorOf Hamlet  Hamlet hasAuthor Shakespeare © 2012 IBM Corporation
    5. 5. Triples “Shakespeare wrote Hamlet in 1876” Shakepeare authorOf Hamlet Hamlet writtenIn 1876 © 2012 IBM Corporation
    6. 6. Triples (Reification)Wikipedia states “Shakespeare wrote Hamlet in 1876” Wikipedia states Shakepeare Shakepeare authorOf Hamlet Hamlet writtenIn 1876 © 2012 IBM Corporation
    7. 7. Triples (Reification)Wikipedia states “Shakespeare wrote Hamlet in 1876” Wikipedia states (Hamlet writtenIn 1876) Shakespeare authorOf Hamlet © 2012 IBM Corporation
    8. 8. Triples (Confidence Levels) ShakespeareOnline states (Hamlet writtenIn 1599) Wikipedia states (Hamlet writtenIn 1876) When was Hamlet written? – 1599 – 1876 © 2012 IBM Corporation
    9. 9. Triples (Confidence Levels) Go from this: – ShakepeareOnline states (Hamlet writtenIn 1599) To this: – (ShakepeareOnline states (Hamlet writtenIn 1599)) hasConfidenceLevel 90 © 2012 IBM Corporation
    10. 10. Triples (Confidence Levels) © 2012 IBM Corporation
    11. 11. What is an Ontology? Description of the kinds of entities there are and how they are related (Chris Welty) © 2012 IBM Corporation
    12. 12. Ontology “Shakespeare wrote Hamlet in 1876” How many “types” of things are there in this statement? – Authors – Books – Plays – Years – Sources – Characters What relationships could exist between these types? © 2012 IBM Corporation
    13. 13. Ontology Author – Playwright {Shakespeare, Marlowe} Book – Play {Hamlet, Macbeth, Faustus} RDF: – Shakepeare a Playwright – Shakepeare a Author – Hamlet a Play – Hamlet a Book © 2012 IBM Corporation
    14. 14. © 2012 IBM Corporation
    15. 15.  William Shakespeareen2:Playwright was an English poet and playwright, widely regarded as the greatest writer in the English language and the worlds pre-eminent dramatist. © 2012 IBM Corporation
    16. 16. © 2012 IBM Corporation
    17. 17. © 2012 IBM Corporation
    18. 18. © 2012 IBM Corporation
    19. 19. © 2012 IBM Corporation
    20. 20. Semantic Chains AIX hasCommand topas monitors (process uses (CPU hasComponent resources)) © 2012 IBM Corporation
    21. 21. SPARQLSELECT ?commandWHERE { AIX hasCommand ?command . ?command monitors/uses CPU} © 2012 IBM Corporation
    22. 22. © 2012 IBM Corporation
    23. 23. InferenceOntology Model (Classes): Product – SupportedProduct (x hasMaker IBM) Company – IBM – NonIBM (disjoint to IBM) • { Microsoft, Oracle, Teradata)Ontology Model (Predicates): <Product> hasMaker <Company>Triple Store data: Rational Software Architect hasMaker IBM Rational Software Architect a SupportedProduct © 2012 IBM Corporation
    24. 24. © 2012 IBM Corporation
    25. 25. © 2012 IBM Corporation
    26. 26. Tivoli Monitoring hasSynonym ITM © 2012 IBM Corporation
    27. 27. Tivoli Monitoring hasSynonym ITMITM hasComponent ITM Agent © 2012 IBM Corporation
    28. 28. Tivoli Monitoring hasSynonym ITMITM hasComponent ITM AgentTivoli Monitoring hasComponent Tivoli Monitoring AgentTivoli Monitoring Agent hasSynonym ITM Agent © 2012 IBM Corporation
    29. 29. © 2012 IBM Corporation
    30. 30. © 2012 IBM Corporation
    31. 31. © 2012 IBM Corporation
    32. 32. © 2012 IBM Corporation
    33. 33. © 2012 IBM Corporation
    34. 34. “Agent” analysis itm agent 54 db2 agent 32 os agent 32 ul agent 31 monitoring agent 29 oracle agent 22 agent needs 21 itm ul agent 16 windows os agent 15 agent left 14 agent system 14 citrix agent 14 mysap agent 14 unix os agent 13 © 2012 IBM Corporation
    35. 35. Proximal Verbs (normalized)monitorsupportconfigurerunstartshowbuildappear © 2012 IBM Corporation
    36. 36. EventsSituation EventOmnibus EventITM EventMinor EventTriggering EventConsole EventSystem EventTBSM EventJMX EventTEC Event © 2012 IBM Corporation
    37. 37. Blank Nodes Explict Characterization vs Implicit (Predicate-driven) Identification © 2012 IBM Corporation
    38. 38. Blank Nodes What are blank nodes? – A way of profiling entities – A way of identifying entities without explicit identification – Implicit identification – Predicate driven identification of data (rather than explict characterization) Examples: – “That person has a child” – “That person has a child and a husband” © 2012 IBM Corporation
    39. 39. Anonymous (Anon) Nodes What is the difference between an Anon Node and a Blank Node? An “anonymous node” is an existentially quantitifed variable ∃ A typical RDF node has an identifier to which it is useful to refer © 2012 IBM Corporation
    40. 40. Appendix A - Resources Glossary Books Common OWL Editors Triple Stores © 2012 IBM Corporation
    41. 41. Glossary OWL – Web Ontology Language RDF – Resource Description Framework SPARQL – Simple Protocol and RDF Query Language © 2012 IBM Corporation
    42. 42. Books Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL – Author(s): Dean Allemang and Jim Hendler – Second Edition © 2012 IBM Corporation
    43. 43. Common OWL Editors TopBraid Composer (TBC)  Free Edition (also Standard + Maestro Editions)  http://www.topquadrant.com/products/TB_Composer.html Protege  Free, open source ontology editor and knowledge-base framework  http://protege.stanford.edu/ © 2012 IBM Corporation
    44. 44. Triple Stores  Comparison and links here:  http://www.w3.org/wiki/LargeTripleStores  Sesame - scalable and transactional  May be more suited to web environments  Setup slightly more complex than Jena TDB  Jena TDB - scalable and very simple set up  Code Samples and API introduction here: − http://cattail.boulder.ibm.com/cattail/#view=cmtrim@us.ibm.com/files/53A1E4007F0F3DDB 8C12752E093F23B6  The latest version of Jena TDB (0.90) is transactional. Past versions of TDB were not transactional, and may not be suited for web environments.  DB2-RDF – builds on top of the Jena Graph SPI.  https://www.ibm.com/developerworks/mydeveloperworks/blogs/nlp/entry/db2_rdf _nosql_graph_support13 © 2012 IBM Corporation
    45. 45. Appendix B - OWL OWL (Web Ontology Language) – Built on top of RDF (same syntax RDF) Open World vs Closed World assumption Parts of an Ontology: – Header – Classes and Individuals – Properties – Annotations – Datatypes Instance vs Subclass © 2012 IBM Corporation
    46. 46. OWL – Subclasses and Types alpha rdfs:subClassOf of Thing – a rdf:type alpha – b rdf:type alpha beta rdfs:subClassOf alpha – c rdf:type beta – d rdf:type beta – c rdf:type alpha – d rdf:type alpha © 2012 IBM Corporation
    47. 47. OWL – Subclasses and Types President rdfs:subClassOf Dignitary Dignitary rdfs:subClassOf Person This model states: – All dignitaries are people – All presidents are dignitaries (and thus, people) John Smith rdf:type Person Queen Elizabeth rdf:type Dignitary – Queen Elizabeth rdf:type Person GW Bush rdf:type President – GW Bush rdf:type Dignitary – GW Bush rdf:type Person Barack Obama rdf:type President – Barack Obama rdf:type Dignitary – Barack Obama rdf:type Person How do we expand this model to classify actively- serving American presidents? © 2012 IBM Corporation
    48. 48. OWL – Subclasses and Types President rdfs:subClassOf Dignitary Dignitary rdfs:subClassOf Person This model states: – All dignitaries are people – All presidents are dignitaries (and thus, people) John Smith rdf:type Person Queen Elizabeth rdf:type Dignitary – Queen Elizabeth rdf:type Person GW Bush rdf:type President – GW Bush rdf:type Dignitary – GW Bush rdf:type Person Barack Obama rdf:type President – Barack Obama rdf:type Dignitary – Barack Obama rdf:type Person How do we expand this model to classify actively- serving American presidents? © 2012 IBM Corporation
    49. 49. Appendix C – OWL Properties Transitive Property Functional Property Inverse Functional Property Symmetric Property Asymmetric Property Reflexive Property Irreflexive Property Property Chains Putting it all together Others © 2012 IBM Corporation
    50. 50. Transitive Property  hasVersion rdf:type owl:TransitiveProperty  Windows hasVersion Windows XP  Windows XP hasVersion Windows XP SP2  Windows hasVersion Windows XP SP2 © 2012 IBM Corporation
    51. 51. Functional Property ssn-name rdf:type owl:FunctionalProperty 123-45-6789 ssn-ame Bob Smith 123-45-6789 ssn-ame Robert Smythe Bob Smith owl:sameAs Robert Smythe © 2012 IBM Corporation
    52. 52. Inverse Functional Property hasSpeKey rdf:type owl:InverseFunctionalProperty File Net Web Services hasSpeKey 5724S03 FN WS hasSpeKey 5724S03 File Net Web Services owl:sameAs FN WS © 2012 IBM Corporation
    53. 53. Symmetric Property siblingOf rdf:type owl:SymmetricProperty Tim siblingOf Jim Jim siblingOf Tim © 2012 IBM Corporation
    54. 54. Asymmetric Property hasParent rdf:type owl:AsymmetricProperty Stewie hasParent Peter Peter does not have parent Stewie © 2012 IBM Corporation
    55. 55. Reflexive Property © 2012 IBM Corporation
    56. 56. Irreflexive Property © 2012 IBM Corporation
    57. 57. Property Chain [] rdfs:subPropertyOf hasGrandfather; owl:propertyChain ( hasFather hasFather ). John III hasFather John JR John JR hasFather John SR John III hasGrandfather John SR © 2012 IBM Corporation
    58. 58. Putting it all together … hasSynonym – Transitive, Symmetric © 2012 IBM Corporation
    59. 59. Appendix D - Classic Mereology Transitive Axiom Reflexive Axiom Antisymmetric Axiom © 2012 IBM Corporation
    60. 60. Transitive Axiom parts of parts are parts of the whole If A is part of B and B is part of C, then A is part of C © 2012 IBM Corporation
    61. 61. Reflexive Axiom everything is part of itself – A is part of A © 2012 IBM Corporation
    62. 62. Antisymmetric Axiom nothing is a part of its parts – if A is part of B and A != B then B is not part of A © 2012 IBM Corporation
    63. 63. Appendix E - Partonomy Can you distinguish parts from kinds? Why is this important? This is often the difference between a taxonomy and an ontology – A taxonomy doesn’t need to distinguish between parts and kinds – An ontology must make this distinction Vehicle -Car --Engine ---Crankcase ----Aluminum Crankcase © 2012 IBM Corporation
    64. 64. Partonomy © 2012 IBM Corporation
    65. 65. Partonomy © 2012 IBM Corporation
    66. 66. Appendix F – Common Predicates hasPart – hasPart owl:inverseOf partOf – hasPart rdf:type owl:TransitiveProperty – partOf rdf:type owl:TransitiveProperty hasLocus © 2012 IBM Corporation
    67. 67. Appendix G Blank nodes Anonymous (Anon) nodes Quads © 2012 IBM Corporation
    68. 68. Quads (Reference Jena Tutorial with TDB.ppt) © 2012 IBM Corporation
    69. 69. Maintenance* The relational model has relations between entities established through explict keys (primary, foreign) and associative entities. – Changing relationships in this case is cumbersome, as it requires changes to the base model structure itself. – Changes in an RDBMS can be difficult for a populated database. Hierarchcal models have similar limitations The graph model (RDF) makes it much easier to maintain the model once it is deployed. – A critical point is that relations are part of the data, not part of the database structure – If a new relationship needs to be added that was not anticipated, a new triple is simply added to the datastore. – A graph model can be traversed from any perspective. In constrast, other types of database designs might require structural changes to answer new questions that arise after initial implementation. © 2012 IBM Corporation
    70. 70. Design Styles Avoid proliferating owl:inverseOf [1] © 2012 IBM Corporation