UnBBayes-PRM - On Implementing Probabilistic Relational Models

On Implementing
Probabilistic Relational Models

2010, October 29th
Contact: Shou Matsumoto (cardialfly@[yahoo|gmail].com)

Content
Purpose
Contextualization
•E/R
•PRM
•Link Uncertainty
A Java implementation
•UnBBayes-PRM*

*Project page: http://sourceforge.net/projects/unbbayes/

Objectives
 What is this presentation for?
– Overview of PRM and its
underlying concepts
Purpose

– Overview of extensions of PRM
• Link uncertainty
– To present a simple
implementation of PRM
3
• UnBBayes-PRM

Motivations
 E/R models are heavily used
– Most of commercial databases are
based on E/R models
Purpose

 PRM allows E/R with uncertainty
– PRM is compatible with optimizations
of BN and E/R
 Implementations of PRM are rare

4

Target
 For whom is this presentation intended?
– People interested on PRM
• E.g. Database architects willing to incorporate
Purpose

probabilistic reasoning
• People looking for a BN extension with the
expressiveness of relational calculus
– People looking for a PRM tool
• E.g. Developers looking for a sample
implementation
• Learners willing to exercise PRM
5 We assume you have basic knowledge about Bayesian Networks

What is PRM?
Contextualization

BN + E/R

=
=

PRM
PRM
6

What is E/R?
 E/R = Entity-Relationship
Abstract conceptual representation of data
Contextualization


– Often used in relational database models
• E.g. Oracle, MySQL, PostgreSQL...
 Entities = “nouns”
– A set of elements in a domain
 Relationships = “verbs”
– Captures how 2 or more entities are related
 Attributes = “characteristics”
7 Attributes holds actual data content.

What is E/R?
 Constraints
Contextualization

– Cardinality
• 1-1, 1-many, many-1, many-many
– Primary Key (PK):
• minimal set of uniquely identifying attributes
– Foreign Key (FK):
• Attributes that refers to other attributes (PK)
– This is used to conduct relationships
– Allowed values
– Etc.
8

What is E/R?

 E/R can be represented as a set of Tables
Contextualization

– Entities → tables
– Attributes → columns
– Values of attributes → content of a cell
– 1-1 and 1-many (many-1) relationships → FK
– Many-many relationships → table + FK
 Problem
– Classic E/R models do not handle uncertainty
9 UnBBayes-PRM sees E/R as a set of tables.

So, what is PRM?
 Probabilistic Relational Models
Contextualization

– Template for probability distribution over a
database (E/R model)
• Compact graphical probabilistic model
– well defined semantics
• Natural domain modeling
– objects, properties, relations...
• Attributes can depend on attributes of related
entities
• Generalization over a variety of situations
10

So, what is PRM?
 PRM's learning algorithms
Contextualization

– Captures relationships in Bayesian learning
algorithms
• There's no need to “flatten” database
 PRM's are composed of:
– Relational Schema,
– Relational Skeleton,
– Probabilistic distribution.

11 Machine learning is a major concern in PRM

Schema
 Static part
Contextualization

– Entities + Relationships + Attributes
– PK, FK, possible (allowed) values...

hasFather

Person
ID: PK
Person BloodType Father : FK to Person
Mother: FK to Person
BloodType : any of {A,B,AB,O}

hasMother
12

Skeleton
 Dynamic part
Contextualization

– Instantiation of a Schema
– Actual objects
• Attributes are filled with some values

ID: Augustine ID: Mary
Father: NULL Father: NULL
Mother: NULL Mother: NULL
BloodType: O ID: George BloodType: A
Father: Augustine
Mother: Mary
13 BloodType: NULL

PRM's structure
 Schema + probabilistic dependencies
Contextualization

 Attributes have path expressions describing their
parents of that attribute.
– Path expressions = slot chain
• List of FK
– If slot chain contains 1-many relationship, the
number of parents is unknown
 Conditional Probability Distribution (CPD)
– Conditional Probability Table (CPT)
– Functions + parameters
14 (Slot chain = empty) := no parents | parents reside in the same table

PRM's structure
John Doe Jane Doe
Contextualization

Person Instantiation
Instantiation Me
Person
FK1 FK2
PK
Father
Mother
BloodType
CPD of BloodType
CPD of BloodType
Father A A A ...
Mother A B AB ...
Edge from
Edge from Edge from
Edge from A 75% 25% 50% ...
BloodType
BloodType BloodType
BloodType B 0% 25% 25% ...
of the object
of the object of the object
of the object AB 0% 25% 25% ...
referenced by FK1 referenced by FK2
referenced by FK1 referenced by FK2 O 25% 25% 0% ...
15

CPD with aggregation
 How do we declare the CPD if the number of parents is
unknown?
Contextualization

 Approach 1: special purpose scripts
– E.g. UnBBayes-MEBN's CPD scripts
• A set of IF-THEN-ELSE statements
 Approach 2: aggregation
– E.g. Mode, Max, Min, Average...
• Equivalent to an intermediate “deterministic” node

16 UnBBayes-PRM uses the approach 2

Inference
 Instantiation of a BN from skeleton
Contextualization

 Descriptive attributes become random
variables
 Once generated, further inference is done as
normal BN (evidence propagation)

17

Does the instantiated BN
have cycles?
 Case 1: check at PRM schema level
– Schema has no cycle → instances have no cycle
Contextualization

 Case 2: schema contains cycles, but the instantiated BN
does not

ID: Augustine ID: Mary
BloodType BloodType
Person Person

ID: George
(Father) (Mother)
Washington
BloodType
18 Person

Extension:
link uncertainty
 We only mentioned about distribution over attributes
of the objects in a model
Contextualization

– Only the values of the attributes were uncertain
 Uncertainty over relational structure of domain was
not addressed yet
– Structure uncertainty
• Values of FK are uncertain
– Slot chains are uncertain
 Reference uncertainty & existence uncertainty
19 OBS. Link uncertainty is not implemented in UnBBayes-PRM

Reference uncertainty
 Slots' (FK) values become a random variable
Contextualization

– Problem
• Unknown number of possible values
– It's difficult to declare CPD at schema level
– Solution
• Create partitions based on “other attributes”
– Assuming that ordinal attributes has a
known number of possible values

20

Reference uncertainty
Entity2
Entity2 Entity1
Entity1 Possible values:
Contextualization

PK PK PKs of Entity2
FKToEntity2 (unknown)
BooleanAttrib
Link to a single instance of Entity2
based on the current value of PK
Link to a set (partition) of instances of Entity2,
based on the current value of BooleanAttrib
Entity1
Entity1
Entity2
Entity2
PK Possible values:
PK FKToEntity2 2 (true/false)
BooleanAttrib Selector
21 We can now specify parents of FKs and CPD

Reference uncertainty:
instantiating BN
Contextualization

 Edge types:
– I: within single object
– II: between objects
– III: from FKs of a slot chain
– IV: from partition attributes to selectors
– V: from selectors to FK
22 Extracted from Probabilistic Relational Models (Getoor et al., SRL07)

Existence uncertainty
 Creation of a Boolean attribute “Exists” in tables
Contextualization

– Technically, entities also contain “Exists”
• But we assume instances (objects) of entities
“do exist” if they were instantiated
– So, this mechanism is mainly for
relationships
– Because “Exists” is not a FK, we can use it as a
normal random variable.
• No major changes on BN instantiation

23 Objects are related to every possible objects, with 0% ~ 100%

UnBBayes-PRM
 Open-source Java software
A Java Implementation

– GUI & inference machine
 Features
– Edit Schema and Skeleton as tables
– Edit probabilistic dependencies as CPT
– Edit constraints (PK, FK and allowed values)
– Generate BN from Skeleton
– Save/load projects from file
 Developed as a plug-in for UnBBayes:
– Alpha version (for internal use)
24 Project page: http://sourceforge.net/projects/unbbayes/

UnBBayes-PRM

25 A plugin descriptor is the main and minimal content of a plugin

UnBBayes-PRM

26 A plugin descriptor is the main and minimal content of a plugin

27
UnBBayes-PRM

UnBBayes-PRM - I/O
/* Table and PK declaration */

CREATE TABLE "Person" (
"id" VARCHAR2(300) not null,
"Father" VARCHAR2(300) ,
"Mother" VARCHAR2(300) ,
"BloodType" VARCHAR2(300)
);
ALTER TABLE "Person" ADD CONSTRAINT PK_Person
PRIMARY KEY ("id");
/* Possible values */
ALTER TABLE "Person" ADD CONSTRAINT CK_BloodType
CHECK ( "BloodType" IN ('A', 'B', 'AB', 'O'));
/* Foreign keys (relationships) */
ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Father
FOREIGN KEY ("Father") REFERENCES "Person" ("id");
ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Mother
FOREIGN KEY ("Mother") REFERENCES "Person" ("id");
28 PRM is currently stored as a SQL script. This is a temporary solution.

UnBBayes-PRM - I/O
 Dependencies are stored as in-table comments

COMMENT ON COLUMN Person.BloodType IS 'Person.BloodType()
[ FK_Person_Father ] , Person.BloodType()[ FK_Person_Mother ] ; { 0.75 0.0
0.0 0.25 0.25 0.25 0.25 0.25 (...) }';

 Basic format:
– <listOfParents>;{<listOfProbabilities>}
 <listOfParents> := comma separated list
– <parentClass>.<parentColumn>
(<aggregateFunction>){<listOfForeignKeys>}
• <listOfForeignKeys> represents a slot chain
29 This is also a temporary solution.

UnBBayes-PRM:
limitations
 No support for link uncertainty

– But existence uncertainty can be “simulated”
 Only 1 attribute as PK
 Only String types allowed
– Thus, no sequences are allowed
 No marginalization
– Cannot delete dependencies
• We must re-create attribute or edit the SQL
script
30

UnBBayes-PRM:
limitations
 2 edges (dependencies) to a same attribute is

not allowed
– Even using different slot chains
 3 aggregation functions:
– mode, min, max.
 No machine learning
 No direct access to an actual database (yet)
– Only by means of a SQL script.

31

UnBBayes-PRM:
(possible) future works
 Add extension points for plug-ins
 Integration with DBMS
– Constraints/rules can be delegated to DBMS
Conclusion

• Some of the limitations may be automatically fixed
 Implement machine learning and link
uncertainty
 Edit E/R models as diagrams
 PRM → MSBN compilation

32 DBMS = DataBase Management System

UnBBayes-PRM:
(possible) future works
 Implement Dynamic PRM
– Dynamic BN + E/R
Conclusion

 Integration with PROXIMITY¹
– RDN - Relational Dependency Network
• Generalization of BN + E/R + Relational Markov
Network

33 ¹A Java open-source tool from University of Massachusetts Amherst

Finally
 PRM looks practical
– Uncertainty on relational data
• Immediate applicability in databases
Conclusion

– Advanced DBMS can add advanced
features
 Machine learning seems to be PRM's major
concern
– It was not addressed by this presentation
34

Finally
 PRM cannot specify advanced rules and
constraints on conditional probabilities
– Some conditions must be fulfilled “manually”
Conclusion

– Some may be fulfilled by DBMS' features
 UnBBayes-PRM provides an editor and inference
engine for basic PRM

35

Questions?

Project page: http://sourceforge.net/projects/unbbayes/

UnBBayes-PRM - On Implementing Probabilistic Relational Models

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Viewers also liked

Viewers also liked (6)

More from Rommel Carvalho

More from Rommel Carvalho (20)

UnBBayes-PRM - On Implementing Probabilistic Relational Models