UnBBayes is a probabilistic network framework written in Java. It has both a GUI and an API with inference, sampling, learning and evaluation. It supports BN, ID, MSBN, OOBN, HBN, MEBN/PR-OWL, PRM, structure, parameter and incremental learning.
This presentation talks about UnBBayes-PRM, a plugin for UnBBayes that has a simple implementation of Probabilistic Relational Models.
This presentation was given by Shou Matsumoto from the University of Brasilia in Brazil via web conference to PhD students at George Mason University in the US on the Friday seminar called Krypton (http://krypton.c4i.gmu.edu/) in October 29, 2010.
3. Objectives
What is this presentation for?
– Overview of PRM and its
underlying concepts
Purpose
– Overview of extensions of PRM
• Link uncertainty
– To present a simple
implementation of PRM
3
• UnBBayes-PRM
4. Motivations
E/R models are heavily used
– Most of commercial databases are
based on E/R models
Purpose
PRM allows E/R with uncertainty
– PRM is compatible with optimizations
of BN and E/R
Implementations of PRM are rare
4
5. Target
For whom is this presentation intended?
– People interested on PRM
• E.g. Database architects willing to incorporate
Purpose
probabilistic reasoning
• People looking for a BN extension with the
expressiveness of relational calculus
– People looking for a PRM tool
• E.g. Developers looking for a sample
implementation
• Learners willing to exercise PRM
5 We assume you have basic knowledge about Bayesian Networks
7. What is E/R?
E/R = Entity-Relationship
Abstract conceptual representation of data
Contextualization
– Often used in relational database models
• E.g. Oracle, MySQL, PostgreSQL...
Entities = “nouns”
– A set of elements in a domain
Relationships = “verbs”
– Captures how 2 or more entities are related
Attributes = “characteristics”
7 Attributes holds actual data content.
8. What is E/R?
Constraints
Contextualization
– Cardinality
• 1-1, 1-many, many-1, many-many
– Primary Key (PK):
• minimal set of uniquely identifying attributes
– Foreign Key (FK):
• Attributes that refers to other attributes (PK)
– This is used to conduct relationships
– Allowed values
– Etc.
8
9. What is E/R?
E/R can be represented as a set of Tables
Contextualization
– Entities → tables
– Attributes → columns
– Values of attributes → content of a cell
– 1-1 and 1-many (many-1) relationships → FK
– Many-many relationships → table + FK
Problem
– Classic E/R models do not handle uncertainty
9 UnBBayes-PRM sees E/R as a set of tables.
10. So, what is PRM?
Probabilistic Relational Models
Contextualization
– Template for probability distribution over a
database (E/R model)
• Compact graphical probabilistic model
– well defined semantics
• Natural domain modeling
– objects, properties, relations...
• Attributes can depend on attributes of related
entities
• Generalization over a variety of situations
10
11. So, what is PRM?
PRM's learning algorithms
Contextualization
– Captures relationships in Bayesian learning
algorithms
• There's no need to “flatten” database
PRM's are composed of:
– Relational Schema,
– Relational Skeleton,
– Probabilistic distribution.
11 Machine learning is a major concern in PRM
12. Schema
Static part
Contextualization
– Entities + Relationships + Attributes
– PK, FK, possible (allowed) values...
hasFather
Person
ID: PK
Person BloodType Father : FK to Person
Mother: FK to Person
BloodType : any of {A,B,AB,O}
hasMother
12
13. Skeleton
Dynamic part
Contextualization
– Instantiation of a Schema
– Actual objects
• Attributes are filled with some values
ID: Augustine ID: Mary
Father: NULL Father: NULL
Mother: NULL Mother: NULL
BloodType: O ID: George BloodType: A
Father: Augustine
Mother: Mary
13 BloodType: NULL
14. PRM's structure
Schema + probabilistic dependencies
Contextualization
Attributes have path expressions describing their
parents of that attribute.
– Path expressions = slot chain
• List of FK
– If slot chain contains 1-many relationship, the
number of parents is unknown
Conditional Probability Distribution (CPD)
– Conditional Probability Table (CPT)
– Functions + parameters
14 (Slot chain = empty) := no parents | parents reside in the same table
15. PRM's structure
John Doe Jane Doe
Contextualization
Person Instantiation
Instantiation Me
Person
FK1 FK2
PK
Father
Mother
BloodType
CPD of BloodType
CPD of BloodType
Father A A A ...
Mother A B AB ...
Edge from
Edge from Edge from
Edge from A 75% 25% 50% ...
BloodType
BloodType BloodType
BloodType B 0% 25% 25% ...
of the object
of the object of the object
of the object AB 0% 25% 25% ...
referenced by FK1 referenced by FK2
referenced by FK1 referenced by FK2 O 25% 25% 0% ...
15
16. CPD with aggregation
How do we declare the CPD if the number of parents is
unknown?
Contextualization
Approach 1: special purpose scripts
– E.g. UnBBayes-MEBN's CPD scripts
• A set of IF-THEN-ELSE statements
Approach 2: aggregation
– E.g. Mode, Max, Min, Average...
• Equivalent to an intermediate “deterministic” node
16 UnBBayes-PRM uses the approach 2
17. Inference
Instantiation of a BN from skeleton
Contextualization
Descriptive attributes become random
variables
Once generated, further inference is done as
normal BN (evidence propagation)
17
18. Does the instantiated BN
have cycles?
Case 1: check at PRM schema level
– Schema has no cycle → instances have no cycle
Contextualization
Case 2: schema contains cycles, but the instantiated BN
does not
ID: Augustine ID: Mary
BloodType BloodType
Person Person
ID: George
(Father) (Mother)
Washington
BloodType
18 Person
19. Extension:
link uncertainty
We only mentioned about distribution over attributes
of the objects in a model
Contextualization
– Only the values of the attributes were uncertain
Uncertainty over relational structure of domain was
not addressed yet
– Structure uncertainty
• Values of FK are uncertain
– Slot chains are uncertain
Reference uncertainty & existence uncertainty
19 OBS. Link uncertainty is not implemented in UnBBayes-PRM
20. Reference uncertainty
Slots' (FK) values become a random variable
Contextualization
– Problem
• Unknown number of possible values
– It's difficult to declare CPD at schema level
– Solution
• Create partitions based on “other attributes”
– Assuming that ordinal attributes has a
known number of possible values
20
21. Reference uncertainty
Entity2
Entity2 Entity1
Entity1 Possible values:
Contextualization
PK PK PKs of Entity2
FKToEntity2 (unknown)
BooleanAttrib
Link to a single instance of Entity2
based on the current value of PK
Link to a set (partition) of instances of Entity2,
based on the current value of BooleanAttrib
Entity1
Entity1
Entity2
Entity2
PK Possible values:
PK FKToEntity2 2 (true/false)
BooleanAttrib Selector
21 We can now specify parents of FKs and CPD
22. Reference uncertainty:
instantiating BN
Contextualization
Edge types:
– I: within single object
– II: between objects
– III: from FKs of a slot chain
– IV: from partition attributes to selectors
– V: from selectors to FK
22 Extracted from Probabilistic Relational Models (Getoor et al., SRL07)
23. Existence uncertainty
Creation of a Boolean attribute “Exists” in tables
Contextualization
– Technically, entities also contain “Exists”
• But we assume instances (objects) of entities
“do exist” if they were instantiated
– So, this mechanism is mainly for
relationships
– Because “Exists” is not a FK, we can use it as a
normal random variable.
• No major changes on BN instantiation
23 Objects are related to every possible objects, with 0% ~ 100%
24. UnBBayes-PRM
Open-source Java software
A Java Implementation
– GUI & inference machine
Features
– Edit Schema and Skeleton as tables
– Edit probabilistic dependencies as CPT
– Edit constraints (PK, FK and allowed values)
– Generate BN from Skeleton
– Save/load projects from file
Developed as a plug-in for UnBBayes:
– Alpha version (for internal use)
24 Project page: http://sourceforge.net/projects/unbbayes/
28. UnBBayes-PRM - I/O
/* Table and PK declaration */
A Java Implementation
CREATE TABLE "Person" (
"id" VARCHAR2(300) not null,
"Father" VARCHAR2(300) ,
"Mother" VARCHAR2(300) ,
"BloodType" VARCHAR2(300)
);
ALTER TABLE "Person" ADD CONSTRAINT PK_Person
PRIMARY KEY ("id");
/* Possible values */
ALTER TABLE "Person" ADD CONSTRAINT CK_BloodType
CHECK ( "BloodType" IN ('A', 'B', 'AB', 'O'));
/* Foreign keys (relationships) */
ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Father
FOREIGN KEY ("Father") REFERENCES "Person" ("id");
ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Mother
FOREIGN KEY ("Mother") REFERENCES "Person" ("id");
28 PRM is currently stored as a SQL script. This is a temporary solution.
29. UnBBayes-PRM - I/O
Dependencies are stored as in-table comments
A Java Implementation
COMMENT ON COLUMN Person.BloodType IS 'Person.BloodType()
[ FK_Person_Father ] , Person.BloodType()[ FK_Person_Mother ] ; { 0.75 0.0
0.0 0.25 0.25 0.25 0.25 0.25 (...) }';
Basic format:
– <listOfParents>;{<listOfProbabilities>}
<listOfParents> := comma separated list
– <parentClass>.<parentColumn>
(<aggregateFunction>){<listOfForeignKeys>}
• <listOfForeignKeys> represents a slot chain
29 This is also a temporary solution.
30. UnBBayes-PRM:
limitations
No support for link uncertainty
A Java Implementation
– But existence uncertainty can be “simulated”
Only 1 attribute as PK
Only String types allowed
– Thus, no sequences are allowed
No marginalization
– Cannot delete dependencies
• We must re-create attribute or edit the SQL
script
30
31. UnBBayes-PRM:
limitations
2 edges (dependencies) to a same attribute is
A Java Implementation
not allowed
– Even using different slot chains
3 aggregation functions:
– mode, min, max.
No machine learning
No direct access to an actual database (yet)
– Only by means of a SQL script.
31
32. UnBBayes-PRM:
(possible) future works
Add extension points for plug-ins
Integration with DBMS
– Constraints/rules can be delegated to DBMS
Conclusion
• Some of the limitations may be automatically fixed
Implement machine learning and link
uncertainty
Edit E/R models as diagrams
PRM → MSBN compilation
32 DBMS = DataBase Management System
33. UnBBayes-PRM:
(possible) future works
Implement Dynamic PRM
– Dynamic BN + E/R
Conclusion
Integration with PROXIMITY¹
– RDN - Relational Dependency Network
• Generalization of BN + E/R + Relational Markov
Network
33 ¹A Java open-source tool from University of Massachusetts Amherst
34. Finally
PRM looks practical
– Uncertainty on relational data
• Immediate applicability in databases
Conclusion
– Advanced DBMS can add advanced
features
Machine learning seems to be PRM's major
concern
– It was not addressed by this presentation
34
35. Finally
PRM cannot specify advanced rules and
constraints on conditional probabilities
– Some conditions must be fulfilled “manually”
Conclusion
– Some may be fulfilled by DBMS' features
UnBBayes-PRM provides an editor and inference
engine for basic PRM
35