Your SlideShare is downloading. ×
  • Like
UnBBayes-PRM - On Implementing Probabilistic Relational Models
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

UnBBayes-PRM - On Implementing Probabilistic Relational Models

  • 1,044 views
Published

UnBBayes is a probabilistic network framework written in Java. It has both a GUI and an API with inference, sampling, learning and evaluation. It supports BN, ID, MSBN, OOBN, HBN, MEBN/PR-OWL, PRM, …

UnBBayes is a probabilistic network framework written in Java. It has both a GUI and an API with inference, sampling, learning and evaluation. It supports BN, ID, MSBN, OOBN, HBN, MEBN/PR-OWL, PRM, structure, parameter and incremental learning.

This presentation talks about UnBBayes-PRM, a plugin for UnBBayes that has a simple implementation of Probabilistic Relational Models.

This presentation was given by Shou Matsumoto from the University of Brasilia in Brazil via web conference to PhD students at George Mason University in the US on the Friday seminar called Krypton (http://krypton.c4i.gmu.edu/) in October 29, 2010.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,044
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
23
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. On ImplementingProbabilistic Relational Models 2010, October 29th Contact: Shou Matsumoto (cardialfly@[yahoo|gmail].com)
  • 2. Content Purpose Contextualization •E/R •PRM •Link Uncertainty A Java implementation •UnBBayes-PRM**Project page: http://sourceforge.net/projects/unbbayes/
  • 3. Objectives  What is this presentation for? – Overview of PRM and its underlying conceptsPurpose – Overview of extensions of PRM • Link uncertainty – To present a simple implementation of PRM 3 • UnBBayes-PRM
  • 4. Motivations  E/R models are heavily used – Most of commercial databases are based on E/R modelsPurpose  PRM allows E/R with uncertainty – PRM is compatible with optimizations of BN and E/R  Implementations of PRM are rare 4
  • 5. Target  For whom is this presentation intended? – People interested on PRM • E.g. Database architects willing to incorporatePurpose probabilistic reasoning • People looking for a BN extension with the expressiveness of relational calculus – People looking for a PRM tool • E.g. Developers looking for a sample implementation • Learners willing to exercise PRM 5 We assume you have basic knowledge about Bayesian Networks
  • 6. What is PRM?Contextualization BN + E/R = = PRM PRM 6
  • 7. What is E/R?  E/R = Entity-Relationship Abstract conceptual representation of dataContextualization  – Often used in relational database models • E.g. Oracle, MySQL, PostgreSQL...  Entities = “nouns” – A set of elements in a domain  Relationships = “verbs” – Captures how 2 or more entities are related  Attributes = “characteristics” 7 Attributes holds actual data content.
  • 8. What is E/R?  ConstraintsContextualization – Cardinality • 1-1, 1-many, many-1, many-many – Primary Key (PK): • minimal set of uniquely identifying attributes – Foreign Key (FK): • Attributes that refers to other attributes (PK) – This is used to conduct relationships – Allowed values – Etc. 8
  • 9. What is E/R?  E/R can be represented as a set of TablesContextualization – Entities → tables – Attributes → columns – Values of attributes → content of a cell – 1-1 and 1-many (many-1) relationships → FK – Many-many relationships → table + FK  Problem – Classic E/R models do not handle uncertainty 9 UnBBayes-PRM sees E/R as a set of tables.
  • 10. So, what is PRM?  Probabilistic Relational ModelsContextualization – Template for probability distribution over a database (E/R model) • Compact graphical probabilistic model – well defined semantics • Natural domain modeling – objects, properties, relations... • Attributes can depend on attributes of related entities • Generalization over a variety of situations 10
  • 11. So, what is PRM?  PRMs learning algorithmsContextualization – Captures relationships in Bayesian learning algorithms • Theres no need to “flatten” database  PRMs are composed of: – Relational Schema, – Relational Skeleton, – Probabilistic distribution. 11 Machine learning is a major concern in PRM
  • 12. Schema  Static partContextualization – Entities + Relationships + Attributes – PK, FK, possible (allowed) values... hasFather Person ID: PK Person BloodType Father : FK to Person Mother: FK to Person BloodType : any of {A,B,AB,O} hasMother 12
  • 13. Skeleton  Dynamic partContextualization – Instantiation of a Schema – Actual objects • Attributes are filled with some values ID: Augustine ID: Mary Father: NULL Father: NULL Mother: NULL Mother: NULL BloodType: O ID: George BloodType: A Father: Augustine Mother: Mary 13 BloodType: NULL
  • 14. PRMs structure  Schema + probabilistic dependenciesContextualization  Attributes have path expressions describing their parents of that attribute. – Path expressions = slot chain • List of FK – If slot chain contains 1-many relationship, the number of parents is unknown  Conditional Probability Distribution (CPD) – Conditional Probability Table (CPT) – Functions + parameters 14 (Slot chain = empty) := no parents | parents reside in the same table
  • 15. PRMs structure John Doe Jane DoeContextualization Person Instantiation Instantiation Me Person FK1 FK2 PK Father Mother BloodType CPD of BloodType CPD of BloodType Father A A A ... Mother A B AB ... Edge from Edge from Edge from Edge from A 75% 25% 50% ... BloodType BloodType BloodType BloodType B 0% 25% 25% ... of the object of the object of the object of the object AB 0% 25% 25% ... referenced by FK1 referenced by FK2 referenced by FK1 referenced by FK2 O 25% 25% 0% ... 15
  • 16. CPD with aggregation  How do we declare the CPD if the number of parents is unknown?Contextualization  Approach 1: special purpose scripts – E.g. UnBBayes-MEBNs CPD scripts • A set of IF-THEN-ELSE statements  Approach 2: aggregation – E.g. Mode, Max, Min, Average... • Equivalent to an intermediate “deterministic” node 16 UnBBayes-PRM uses the approach 2
  • 17. Inference  Instantiation of a BN from skeletonContextualization  Descriptive attributes become random variables  Once generated, further inference is done as normal BN (evidence propagation) 17
  • 18. Does the instantiated BN have cycles?  Case 1: check at PRM schema level – Schema has no cycle → instances have no cycleContextualization  Case 2: schema contains cycles, but the instantiated BN does not ID: Augustine ID: Mary BloodType BloodType Person Person ID: George (Father) (Mother) Washington BloodType 18 Person
  • 19. Extension: link uncertainty  We only mentioned about distribution over attributes of the objects in a modelContextualization – Only the values of the attributes were uncertain  Uncertainty over relational structure of domain was not addressed yet – Structure uncertainty • Values of FK are uncertain – Slot chains are uncertain  Reference uncertainty & existence uncertainty 19 OBS. Link uncertainty is not implemented in UnBBayes-PRM
  • 20. Reference uncertainty  Slots (FK) values become a random variableContextualization – Problem • Unknown number of possible values – Its difficult to declare CPD at schema level – Solution • Create partitions based on “other attributes” – Assuming that ordinal attributes has a known number of possible values 20
  • 21. Reference uncertainty Entity2 Entity2 Entity1 Entity1 Possible values:Contextualization PK PK PKs of Entity2 FKToEntity2 (unknown) BooleanAttrib Link to a single instance of Entity2 based on the current value of PK Link to a set (partition) of instances of Entity2, based on the current value of BooleanAttrib Entity1 Entity1 Entity2 Entity2 PK Possible values: PK FKToEntity2 2 (true/false) BooleanAttrib Selector 21 We can now specify parents of FKs and CPD
  • 22. Reference uncertainty: instantiating BNContextualization  Edge types: – I: within single object – II: between objects – III: from FKs of a slot chain – IV: from partition attributes to selectors – V: from selectors to FK 22 Extracted from Probabilistic Relational Models (Getoor et al., SRL07)
  • 23. Existence uncertainty  Creation of a Boolean attribute “Exists” in tablesContextualization – Technically, entities also contain “Exists” • But we assume instances (objects) of entities “do exist” if they were instantiated – So, this mechanism is mainly for relationships – Because “Exists” is not a FK, we can use it as a normal random variable. • No major changes on BN instantiation 23 Objects are related to every possible objects, with 0% ~ 100%
  • 24. UnBBayes-PRM  Open-source Java softwareA Java Implementation – GUI & inference machine  Features – Edit Schema and Skeleton as tables – Edit probabilistic dependencies as CPT – Edit constraints (PK, FK and allowed values) – Generate BN from Skeleton – Save/load projects from file  Developed as a plug-in for UnBBayes: – Alpha version (for internal use) 24 Project page: http://sourceforge.net/projects/unbbayes/
  • 25. UnBBayes-PRMA Java Implementation 25 A plugin descriptor is the main and minimal content of a plugin
  • 26. UnBBayes-PRMA Java Implementation 26 A plugin descriptor is the main and minimal content of a plugin
  • 27. 27 A Java Implementation UnBBayes-PRM
  • 28. UnBBayes-PRM - I/O /* Table and PK declaration */A Java Implementation CREATE TABLE "Person" ( "id" VARCHAR2(300) not null, "Father" VARCHAR2(300) , "Mother" VARCHAR2(300) , "BloodType" VARCHAR2(300) ); ALTER TABLE "Person" ADD CONSTRAINT PK_Person PRIMARY KEY ("id"); /* Possible values */ ALTER TABLE "Person" ADD CONSTRAINT CK_BloodType CHECK ( "BloodType" IN (A, B, AB, O)); /* Foreign keys (relationships) */ ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Father FOREIGN KEY ("Father") REFERENCES "Person" ("id"); ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Mother FOREIGN KEY ("Mother") REFERENCES "Person" ("id"); 28 PRM is currently stored as a SQL script. This is a temporary solution.
  • 29. UnBBayes-PRM - I/O  Dependencies are stored as in-table commentsA Java Implementation COMMENT ON COLUMN Person.BloodType IS Person.BloodType() [ FK_Person_Father ] , Person.BloodType()[ FK_Person_Mother ] ; { 0.75 0.0 0.0 0.25 0.25 0.25 0.25 0.25 (...) };  Basic format: – <listOfParents>;{<listOfProbabilities>}  <listOfParents> := comma separated list – <parentClass>.<parentColumn> (<aggregateFunction>){<listOfForeignKeys>} • <listOfForeignKeys> represents a slot chain 29 This is also a temporary solution.
  • 30. UnBBayes-PRM: limitations  No support for link uncertaintyA Java Implementation – But existence uncertainty can be “simulated”  Only 1 attribute as PK  Only String types allowed – Thus, no sequences are allowed  No marginalization – Cannot delete dependencies • We must re-create attribute or edit the SQL script 30
  • 31. UnBBayes-PRM: limitations  2 edges (dependencies) to a same attribute isA Java Implementation not allowed – Even using different slot chains  3 aggregation functions: – mode, min, max.  No machine learning  No direct access to an actual database (yet) – Only by means of a SQL script. 31
  • 32. UnBBayes-PRM: (possible) future works  Add extension points for plug-ins  Integration with DBMS – Constraints/rules can be delegated to DBMSConclusion • Some of the limitations may be automatically fixed  Implement machine learning and link uncertainty  Edit E/R models as diagrams  PRM → MSBN compilation32 DBMS = DataBase Management System
  • 33. UnBBayes-PRM: (possible) future works  Implement Dynamic PRM – Dynamic BN + E/RConclusion  Integration with PROXIMITY¹ – RDN - Relational Dependency Network • Generalization of BN + E/R + Relational Markov Network33 ¹A Java open-source tool from University of Massachusetts Amherst
  • 34. Finally  PRM looks practical – Uncertainty on relational data • Immediate applicability in databasesConclusion – Advanced DBMS can add advanced features  Machine learning seems to be PRMs major concern – It was not addressed by this presentation34
  • 35. Finally  PRM cannot specify advanced rules and constraints on conditional probabilities – Some conditions must be fulfilled “manually”Conclusion – Some may be fulfilled by DBMS features  UnBBayes-PRM provides an editor and inference engine for basic PRM35
  • 36. Questions?Project page: http://sourceforge.net/projects/unbbayes/