Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Formal Aspects of Protege

238 views

Published on

For four years in the late 1990's and early 2000's I worked at Stanford University’s Section on Medical Informatics doing research in Artificial Intelligence. I was one of the primary architects on the Protege project (an open-sourced knowledge representation system) and spent quite a bit of time thinking about how to represent knowledge, the logical structure of knowledge, how to define constraints on information, and how to classify algorithms (a.k.a. “problem-solving methods”).

This talk, from 2001, describes the underlying architecture formal knowledge model used in Protege, how "slot widgets" play in the system, and goes on to describe PAL: the Protege Axiom Language. It's long, and really only for knowledge representation afficionados, but it's pretty complete.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Formal Aspects of Protege

  1. 1. 1 Formal Aspects Of Protege William Grosso Stanford Medical Informatics Stanford University
  2. 2. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Overview • Interoperability is important – HPKB: DARPA project with many participants – Protégé-2000: Lots of developers in many locations • Ray can’t write code fast enough ! – Interoperability requires common ground • Shared semantics for common constructs • The new Knowledge Model
  3. 3. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Proposed HPKB Scenario Knowledge Base(s) in a KB Server Shared Ontologies Situation Data PSM PSM PSM
  4. 4. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Knowledge Bases in HPKB • Ontologies are ways to share well-defined information – Define knowledge structure – Useful as a coupling mechanism • Knowledge Bases serve multiple roles – Repositories of shared knowledge – Community blackboards (with semantics).
  5. 5. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Interoperability requires Semantics • As long as all the developers are in the same building, things can be underspecified – Rely on “group knowledge” and “established practice” • Larger working groups (over time, space, or in numbers of people) can require more precise specifications
  6. 6. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Knowledge Models • Formal specification of the way knowledge is represented – Precise, human-readable definitions of structures in a language • Frequently unwritten – Implied by the documentation – Deduced via experience
  7. 7. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Knowledge Models at SMI • Work spurred by the OKBC Specification – Defining the Protégé Knowledge Model – Comparing it to other knowledge models • Goal: Enable Protégé tools to interoperate with knowledge-based systems from other labs – Goal is knowledge reuse • Implicit Hypothesis: understanding knowledge models will facilitate interoperation
  8. 8. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Example: Protégé and Loom • Protégé: A suite of tools to simplify knowledge base design and construction • Design ontologies, create KA tools to acquire instances • Explicitly adopts notion of external PSMs in order to focus on KA • Loom : An environment for knowledge- based system construction • Everything done inside the Loom environment
  9. 9. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Frame-Based Knowledge Models • Both Protégé and Loom use frame-based knowledge models – Classes, instances, slots, facets, … • We expect differences over things like default values and models of time • But the knowledge models differ on more mundane notions as well
  10. 10. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting What’s a Slot ? • Protégé/Win – Slots are not part of the global namespace • Define attributes of a frame • Cannot be referred to independently of either a class or an instance – Which slots are attached to an instance is part of the class definition • Loom – Slots are part of the global namespace • Defined by defrelation construct • Have attributes – domain, range, … – Slots can be reified • Instances of a slot class correspond to a specific relation (between two instances)
  11. 11. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting What’s an Instance ? • Protégé/Win – Every instance is a direct instance of a single specified class • Automatically has the own slots defined by the class • No other slots allowed – Direct instance typing cannot change. • To change type at all, need to do explicit operations on the class • Loom – Type of an instance does not have to be specified – Classifier deduces instance types • Types of instances can change (without being explicitly set) – Instances can be direct instances of more than one class
  12. 12. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Interoperation ? • Two different development environments – Two different user models – Two different approaches to KA – Two different knowledge models • Both “frame based” • Disagree on the definitions of commonly used structures • Solution: ad{o,a}pt the OKBC knowledge model
  13. 13. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Protégé-2000 Is Like HPKB • Ray can’t write the code fast enough – Therefore someone else has to write it – Protégé-2000 allows everyone to customize it using Java components • If we glue together components written at multiple labs, and knowledge bases produced by many different people, we might inadvertently introduce the same issues
  14. 14. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Components Central Framework Storage Model Storage Model Widget WidgetWidget Widget Widget Widget Widget Provided by SMI. “Plumbing” that cannot be replaced or augmented. Every running application uses a storage model for persistence. SDI currently provides two (CLIPS format and RDBMS format). Widgets mediate between the knowledge base and the user. They display small pieces of the knowledge base in a way that the user can understand and manipulate. SMI provides a generic set of default widgets.
  15. 15. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Widgets • Widgets can be added to the platform (using JavaBeans) • There is a well-defined Widget API for building new widgets and adding them to a project • Widgets can now be arbitrarily complex – Dialogs are used to configure widgets – State is stored into a separate knowledge base (the project knowledge base)
  16. 16. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Storage Models • Protege/Win stored knowledge bases in a CLIPS-compatible format • The goal for Protege-2000 is to use a wide-variety of persistence mechanisms – CLIPS-format is still useful – OKBC servers are important – Relational databases could be useful • To do this, we need to isolate out the persistence mechanism as a component
  17. 17. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Axioms and Constraints • Protege/Win used a frame-based language • Protege-2000 keeps the emphasis on frames, but adds in a constraint language – Based on KIF – Compatible with OKBC
  18. 18. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting The Actual Knowledge Model
  19. 19. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Knowledge Models • Formal specification of the way knowledge is represented – Precise, human-readable definitions of structures in a language • Gives guarantees of what must hold in the knowledge base – Other things may be true, in addition to what the knowledge model guarantees • Protege ad{a,o}pts the OKBC knowledge model
  20. 20. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting The Role of Logic • Frames are intuitive for humans – Concept / instance distinction dates back to Plato • But they’re not very well-defined – What Minsky meant by frame is not what Winograd meant by frame (and is certainly not what Plato meant by form) • We use logic to formalize the definitions – Make the underlying assumptions explicit
  21. 21. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting KIF • Knowledge Interchange Format • Developed in early 1990’s as a standard syntax for first order logic – entirely ASCII and somewhat LISPy • (forall ?x (exists ?y (......)))) • Currently a “draft standard” • http://logic.stanford.edu/kif/dpans.html • Slight peculiarity: relations are multiple arity
  22. 22. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Frames • A Frame is simply a symbol – A symbol is simply a 0-ary relation • That is, it can be an argument to a function or a predicate – That is, it is something we can make assertions about • Types of frames include most of the traditional modelling constructs (classes, instances, slots , ...)
  23. 23. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Classes • Classes are frames (are symbols ....) • Classes are also unary predicates – KIF allows multiple arity predicates – That is, classes are sets (the set of instances) – Members of the set == instances of the class. • You can assert things about the class (using the fact that the class is a frame) • You can reason about the elements of the associated set
  24. 24. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Defining Subclasses • “Subclass” usually means two things: – All instances of the subclass are instances of the superclass – Anything that is true of the superclass (as a class) is true of the subclass • The first of these is simply “subset” (=> (subclass-of ?S ?P) (forall ?F (=> (?S ?F) (?P ?F))))
  25. 25. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Multiple Inheritance • Easy to define in this model • For Set-aspects, simply use “subclass == subset” – A set can be a subset of more than one class • As frames, enforce substitutability – Any sentence that can be asserted about the superclass, as a class, ought to be true of the subclass – Winds up being union of logical statements
  26. 26. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Slots • Slots are frames (are symbols ...) • Slots are also binary predicates (taking a frame and a value) • Slots also have associated predicates: – binary (take a slot and a frame, formalize the notion of attachment): – ternary (take a slot, a frame, and a value) template-slot-value slot-value template-slot-of slot-of
  27. 27. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Attaching a Slot • Slots are frames that get attached to other frames – Attaching a slot to a class, for example • You can attach a slot as either a template slot or an own slot – template slots define information that can be propagated to elements of a class (and via inheritance) – own slots are strictly local information
  28. 28. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Slots Propagation T T T T O OO O /dev/null /dev/null instance-ofsubclass-of
  29. 29. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Restating this in KIF (=> (template-slot-value ?S ?C ?V) (and (template-slot-of ?S ?C) (=> (instance-of ?I ?C) (holds ?S ?I ?V)) (=> (subclass-of ?X ?C) (template-slot-value ?S ?X ?V))))
  30. 30. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Restating this in English “If V is a template slot value of S on the class C, then we know the following three things: 1. S has been attached to C as a template slot 2. V is an own slot value for all instances I of C 3. V is a template slot value for all subclasses X of C”
  31. 31. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Restating this in Swedish “Om V är värdet på en mallegenskap S på klassen C, så vet vi följande tre saker: 1. S har kopplats till C som en mallegenskap 2. V är ett eget värde på egenskapen för alla instanser I av C 3. V är värdet på mallegenskapen för alla underklasser X av C”
  32. 32. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Instances • An instance is a frame • The idea of “instance” is, more or less, a GUI notion (and has no implications for the knowledge model)
  33. 33. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Facets • Facets are frames (and symbols ...) • Facets are also ternary predicates (taking a frame, a slot, and a value) • Facets also have associated predicates: – ternary (take a slot, a frame, and a facet; formalize the notion of attachment): – 4-ary (take a slot, a frame, a facet and a value) template-facet-of facet-of
  34. 34. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Facet Restrictions • Template facets can only be attached to template slots • Having a value implies attachment • Similarly for own slots (=> (template-facet-of ?F ?S ?C) (template-slot-of ?S ?C)) (=> (template-facet-value ?F ?S ?C ?V) (template-facet-of ?F ?S ?C))
  35. 35. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Facet Propagation • Facets are attached to (frame, slot) pairs • Whenever a slot propagates, from one frame to another, the facets are carried along T T O O subclass-of /dev/null
  36. 36. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Canonical Facets • The standard facets are local (e.g. at a single (frame,slot) pair) constraints :VALUE-TYPE :CARDINALITY :NUMERIC-MINIMUM :NUMERIC-MAXIMUM (=> (:VALUE-TYPE ?S ?F ?C) (and (class ?C) (=> (holds ?S ?F ?V) (instance-of ?V ?C))))
  37. 37. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting OKBC Revisited • Protégé-2000 knowledge-bases are OKBC- compliant • Protégé-2000 is not OKBC generic – There are OKBC knowledge bases that Protégé-2000 cannot handle – It’s close, though ! • Differences are KA related – Protégé instances have exactly one class – The role slot
  38. 38. 38 Desiderata for a Constraint Language William Grosso Stanford Medical Informatics Stanford University
  39. 39. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Overview • Examples of Constraints • Design Desiderata • The Constraint Language • Implementation Decisions • The Default Implementation • Dimensions for Evolution
  40. 40. 40 Desiderata for the Language
  41. 41. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting The Big Modular Picture of Protege Core Protege Framewor k Storage Model Widgets Widgets Widgets Widgets Widgets Widgets Constraint Engine Actual KB
  42. 42. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Full and formal semantics • Widgets can include “widgets for acquiring specific types of constraints” • Multiple constraint engines are possible – Performing different checks at different times – Replacing one engine with another • The entire kb gets stored out to some server • Without formal semantics (a logical theory), this is just not possible
  43. 43. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Compatibility with the OKBC knowledge model • OKBC does not specify an axiom language • OKBC is specified as a set of relations in KIF – Classes are unary predicates, slots are binary predicates, ... • All of these relations should immediately be accessible from within the constraint language – And the constraint engine should give them the right semantics
  44. 44. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Ease of Translation • Important goal: we want to be able to use Protege as a front-end to a wide-variety of knowledge base servers • This means that the constraint language ought to be easily translated into a wide- variety of constraint languages – At the very least, figuring out what can be translated ought to be easy
  45. 45. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Supported by a reasonable default implementation • KMG will provide a default implementation of the constraint language – Not very efficient – But good semantics for KA – Good enough to bootstrap the process • As we learn more about constraints, and how they are used, we hope that people with real expertise will step forward
  46. 46. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting A Deficient Syllogism Major Premise: Interoperability requires formal semantics (and knowledge models based on mathematical logic) Minor Premise: Humans don’t easily adapt to formal languages Conclusion: Widgets !!!!!!!
  47. 47. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Human Readability is a Red Herring • The casual user interacts with forms – The expert user knows about classes and instances – Very few users know about the underlying logical formalism • If we design widgets for acquiring constraints, then the user will never see the constraint language
  48. 48. 48 The Constraint Language
  49. 49. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting A Single Constraint Language • Constraint language is really an interlingua for communication – Between widgets and the framework – Between the framework and the storage model • If we want all the components to evolve independently and communicate gracefully, we need to fix a single constraint language
  50. 50. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Logic • We decided on a variant of KIF • We use the KIF connectives and the KIF syntax • Not all the KIF constants and predicates are included – Our theory of arithmetic is much smaller • (defrelation ...) is omitted – For now ?
  51. 51. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Sorted Logic • Two new constructs in the language – defset: allows the user to define a “bag” of values. • Similar to notion of class, but with no support in the ontology tab • Useful for enumerated types – defrange: all variables must have their types declared • “types” can include things like “is a target of [slot name]”
  52. 52. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Reified Constraints • There is a knowledge-base for constraints – Acquiring a constraint is really “acquiring an instance of :Constraint” – You can annotate sentences and relations with useful information • You can store constraints out to a vanilla frame-based system – To a simple KB server, a constraint is just another frame
  53. 53. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting The Constraint KB • To use constraints, you must include the constraint knowledge base – Will also contain default implementation of engine (as a tab widget) – Will also include java code for the standard relations – Will also include widgets for constraint acquisition – Won’t include any instances
  54. 54. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
  55. 55. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Constraints and Axioms • Constraints and Axioms use the syntax of logic but have different semantics – Axioms can be used to assert new knowledge – Constraints are restrictions on existing knowledge • (forall ?x (exists ?y (rel-name ?x ?y))) – Asserted as an axiom: it’s reasonable to create a skolem constant and bind it to ?y – Asserted as a constraint: might not want to skolemize
  56. 56. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Multiple Interpretations of a Single Theory: • No engine can return “true” when “OKBC” would return “false” • Model theoretic terms: If an engine thinks there is a model, then there must be one – But engines are free to overlook models
  57. 57. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting New functions and predicates are implemented procedurally • KIF has the (defrelation ...) construct to define new relations • Our point of view: A relation is, almost always, something that should be defined in the ontology • The exceptions (mostly n-ary relations) should be annotated explicitly and defined procedurally
  58. 58. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting
  59. 59. 59 Universal Implementation Decisions
  60. 60. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting The Language is defined in a Knowledge-Base • PAL: Protege Axiom Language • The PAL knowledge-base contains – The constraint ontology – The default relations • And the java code that implements them – The default implementation • Once again, taking advantage of knowledge-base inclusion
  61. 61. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Enforcement of constraints is not necesarily real-time • When the user loads (or saves) a knowledge-base, it should be consistent • It’s not always possible for the user to always have a consistent KB while editing – And, even if it were possible, it might be inconvenient. • Therefore, the user should decide when to check constraints
  62. 62. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Enforcement via plug-ins (and tabs) • The basic way users will interact with constraint engines will be via tabs and widgets – We want to enable special types and categories of constraints to be annotated • Basic mechanism: subclassing :Constraint – We want to have multiple possible engines, depending on context and user preference • Constraint tabs are just another way of interacting with the KB .
  63. 63. 63 Two Important Consequences of these Decisions
  64. 64. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting What is a knowledge base ? • Used to be classes and instances • Now also includes widgets – Java code ! • Now also includes constraints – Instances with an “interpretation” beyond the standard meaning associated to frames – Custom pieces of java code that implement new relations (possibly domain specific) for the constraint language
  65. 65. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting We have evolved from OKBC to some extent • If we use the ontology as a type system, it is convenient to have the types be mutually exclusive (instances are instances of a single class) • The “role” predicate
  66. 66. 66 The Default Implementation
  67. 67. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Model-checking, rather than theorem proving • Make strong “closed world” assumptions • Main goals: – Detect incomplete entry of information – Check entered information for inconsistencies
  68. 68. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Envisioned: Constraints are mostly Local • The “more false” this assumption is, the worse the engine will perform(the better a traditional theorem prover would perform ?)
  69. 69. 69 Dimensions for Evolution
  70. 70. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Richer axiom ontology • Subclassing our ontology to provide more detailed information • “Hints” to enforcement engines – “This is best validated using [subroutine x]” or “This statement is complexity level gamma” • Statement could be generated by a widget • Your widget, in your domain, generating PAL statements for my engine to check – Formal Semantics necessary – Engines might let the user check a subset of the theory
  71. 71. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting More Predicates and Functions • Not many are included in the default implementation – Mostly for reasoning about types, arithmetic, and slot values (taking transitive closures) • Over time, we hope that people will implement predicates and pass the code to us (for inclusion as part of the Protege distribution) • Note also that relations don’t have to be general -- you can add knowledge-base
  72. 72. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Other engines • In particular, a theorem prover ? • Can GSAT be used as a preprocessing step ? – How about the work on ALL ?
  73. 73. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Support for Knowledge- Acquisition • The knowledge-model is done • The axiom language is done (as a spec) • Engines are “a mere matter of programming” (similar things have been done for 25 years now) • What’s left ?
  74. 74. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Subclassing the PAL Ontology to provide hooks for widgets ? • :CONSTRAINT only provides two slots (:pragmatics and :sentence) • How about other slots – Evaluation cost (for different engines) ? – Evaluation hints ? – What widget generated the axiom ?
  75. 75. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting “No A is a B” • A statement that is often enforced by defining separate classes • But often not: – No hemophiliac should be taking Lasix – Do we really want “Hemophiliac” as a subclass of “Person” ? – Do we really want “Lasix_Taker” as a subclass of “Patient” ?
  76. 76. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Let’s write it in PAL (forall ?P (=> (and (Person ?P) (has-disease ?P Hemophilia)) (not (taking-drug ?P Laxol))))
  77. 77. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Partially filled out instance defines matching Partially filled out instance defines matching This is really a Venn Diagram Person Person Empty Intersection
  78. 78. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Widgets play a role here: • Widget is placed on screen to mediate between humans and KB • Widget generates PAL statements • Engine interprets PAL statements • User may or may not ever see PAL
  79. 79. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Things that are done: • The knowledge model is done • The constraint language is done • The default implementation is designed and (partially implemented)
  80. 80. Knowledge ModelsWilliam Grosso Fourth International ProtegeUsers Group Meeting Things that we will do: • Finish the default implementation • Publish a full spec (as a Tech Report) ? • Serve as a clearinghouse for engines and widgets

×