The Art and Science of DDS Data Modelling

1,635 views
1,422 views

Published on

The Data Distribution Service (DDS) is a standard for ubiquitous, interoperable, secure, platform independent, and real-time data sharing across network connected devices. DDS is today used in a large class of applications, such as, Power Generation, Large Scale SCADA, Air Traffic Control and Management, Smart Cities, Smart Grids, Vehicles, Medical Devices, Simulation, Aerospace, Defense and Financial Trading.

Differently from traditional message-centric technologies, DDS is data-centric – the accent is on seamless (user-defined) data sharing as opposed to message delivery. Therefore, when embracing DDS and data-centricity, data modeling becomes a key step in the design of a distributed system.

This webcast will (1) explain the role and scope of data modeling in DDS, (2) introduce the techniques at the foundation of effective and extensible Data Models, and (3) summarize the most common DDS Data Modeling Idioms.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,635
On SlideShare
0
From Embeds
0
Number of Embeds
35
Actions
Shares
0
Downloads
52
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

The Art and Science of DDS Data Modelling

  1. 1. The Art and Science of DDS Data Modelling Angelo Corsaro, PhD Chief Technology Officer OMG DDS SIG Co-Chair angelo.corsaro@prismtech.com PrismTech
  2. 2. A Recurring Question • People new to DDS recurrently ask a question: what are the techniques and • My answer is usually: Start with the powerful tools and techniques provided by relational data modelling and then add some DDS-specific spice • I’ve come to the conclusion that many people are not very familiar with relational data modelling, or perhaps it is way too long that they have studied/reviewed these concepts • This webcast, will provide a relatively well introduction to the relational data model PrismTech Copyright PrismTech, 2014 patterns that we can use to design DDS-based Systems?
  3. 3. The Relational Model
  4. 4. • Introduced by Edward Codd in 1970 as a way of representing data models for Data Bases • Simple and Elegant: A database becomes a collections of one or more relations where each relation is a table with rows and columns PrismTech Copyright PrismTech, 2014 Relational Model
  5. 5. Relation it consists of two dimensional table • The columns of a relation are called attributes • The name of the relation along with the set of attributes defines the relation schema • The rows of the relation, other than the header containing the attribute names, are called tuples PrismTech Copyright PrismTech, 2014 • The relation is the construct used representing data in the relational model,
  6. 6. Relation’s Schema - relation’s name - the name of each field/attribute, e.g. column - the domain of each field, e.g. the type of the field ! • Example: -­‐ PrismTech Student(sid:  string,  name:  string,  age:  integer,  gpa:  real) Copyright PrismTech, 2014 • The relation schema specifies:
  7. 7. Tuples • An instance of a relation is a set of tuples (records) in which each tuple has the same number of fields as in the relation schema. rows have the same number of fields (columns) ! sid ! ! ! name age gpa 1234 Peter Parker 21 4.0 2345 3456 Tony Stark Bruce Wayne 15 23 4.0 3.5 • Notice that rows are all different. This is a requirement of the relational model, as a relation instance is a collection of unique tuples (or rows) PrismTech Copyright PrismTech, 2014 • A relation’s instance can be visualised as table where each tuple is a row and all
  8. 8. • The cardinality of a relation R is defined as the number of tuples belonging to the relation • The degree, or arity, of a relation R is defined as the number of its fields PrismTech Copyright PrismTech, 2014 Cardinality and Degree
  9. 9. Keys • A superkey is a set of attributes that includes the primary key • Example: - The sid field is the key for the Students relations sid 1234 2345 3456 PrismTech name Peter Parker Tony Stark Bruce Wayne age 21 15 23 gpa 4.0 4.0 3.5 Copyright PrismTech, 2014 • The key of a relation is a set of fields that uniquely identifies a tuple
  10. 10. • A foreign key allows to introduce a link between two relations • For instance, the sid in the Courses relation is a foreign key allow to refer as well as introduce an integrity constraint to the students relations Courses cid sid grade Physics303 1234 A+ Robotics323 2345 A+ Calculus343 2345 A PrismTech Students sid 1234 2345 3456 name Peter Parker Tony Stark Bruce Wayne age 21 15 23 gpa 4.0 4.0 3.5 Copyright PrismTech, 2014 Foreign Keys
  11. 11. Quick DDS Intro
  12. 12. • DDS provides a Global Data Space abstraction that allow applications to autonomously, anonymously, securely and efficiently share data • DDS’ Global Data Space is fully distributed, highly efficient and scalable PrismTech Copyright PrismTech, 2014 Data Distribution Service (DDS)
  13. 13. • DataWriters and DataReaders are automatically and dynamically matched by the DDS Discovery • A rich set of QoS allows to control existential, temporal, and spatial properties of data PrismTech Copyright PrismTech, 2014 Data Distribution Service (DDS)
  14. 14. Information Definition
  15. 15. Topic Copyright PrismTech, 2014 • A Topic defines a domain-wide information’s class • A Topic is defined by means of a (name, type, qos) tuple, where - type: is the programming language type associated with the topic. Types are extensible and evolvable Name - qos: is a collection of policies that express the nonfunctional properties of this topic, e.g. reliability, persistence, etc. Topic e PrismTech Qo S name: identifies the topic within the domain Typ -
  16. 16. Topic and Instances • As explained in the previous slide a topic defines a class/type of information • Topic Instances are identified by means of the topic key • A Topic Key is identified by a tuple of attributes -- like in databases • Remarks: - PrismTech A Singleton topic has a single domain-wide instance A “regular” Topic can have as many instances as the number of different key values, e.g., if the key is an 8-bit character then the topic can have 256 different instances Copyright PrismTech, 2014 • Topics can be defined as Singleton or can have multiple Instances
  17. 17. • IDL is the most commonly used syntax Topic • Example: struct  Student  {        long        sid;        string    name;        int          age;        float      gpa;   };   #pragma  keylist  Student  sid         PrismTech Qo S Name e Typ • A Topic type can be defined in different syntaxes Copyright PrismTech, 2014 Topic Example
  18. 18. Topics as Relations struct  Student  {        long        sid;        string    name;        int          age;        float      gpa;   };   #pragma  keylist  Student  sid         Student(sid, name, age, gpa) name age gpa 1234 Peter Parker 21 4.0 2345 3456 PrismTech sid Tony Stark Bruce Wayne 15 23 4.0 3.5 Copyright PrismTech, 2014 • A Topic cans be seen as defining a relation
  19. 19. • Topics Types => Relation Schema • Topic Instance => Key • Topic Sample => Tuple PrismTech Copyright PrismTech, 2014 Mapping DDS to the Relational Model
  20. 20. • Start identifying corse relations and properties of data • Start decomposing based on properties • Apply a normal form - PrismTech Functional Dependencies => Boyce-Codd Normal Form Multivalued Dependencies => Fourth Normal Form Copyright PrismTech, 2014 Relational Design
  21. 21. UML Data Modelling
  22. 22. UML Data Modelling • A subset of UML can be used to model Data Models • The resulting model can be easily translated into a relational model and the used • The allowed subset of UML are: - Classes (with only attributes) - Associations - Association Classes - Subclasses - Composition and Aggregation • UML Data Models can be automatically translated into relational model as far as each “regular” class defines a primary key PrismTech Copyright PrismTech, 2014 in a DBMS or DDS
  23. 23. Class • A UML class is mapped to a relation that has the same name of the class, Student sid: int name: string age: int gpa: float PrismTech Copyright PrismTech, 2014 shares its key and attributes Student(sid, name, age, gpa)
  24. 24. • By default association can be mapped as follows, yet, depending on the multiplicity of the association different mappings may be possible/desirable ! ! C1 K1: PK O1 A C2 K2: PK O2 C1(K1, O1) C2(K2, O2) A(K1,K2) ! • The key definition in the association depends on the multiplicity PrismTech Copyright PrismTech, 2014 Association
  25. 25. 1-to-many Association M1 Use a relation to capture the association M2 Embed the association on the many side of the association C1 K1: PK O1 PrismTech 0..1 A C2 * K2: PK O2 M1 C1(K1, O1), C2(K2, O2), A(K1, K2) M2 C1(K1, O1), C2(K2, O2, K1) Copyright PrismTech, 2014 There are two ways of mapping a 1-to-many association to the relational model
  26. 26. C1 K1: PK O1 PrismTech * A C2 * K2: PK O2 C1(K1, O1) C2(K2, O2) A(K1,K2) Copyright PrismTech, 2014 many-to-many Associations
  27. 27. Relationships arity K2 K2 K1 K2 Copyright PrismTech, 2014 K1 K1 PrismTech One to Many Many to Many Key = K2 One to One Key = K1, K2
  28. 28. C1 K1: PK O1 A C2 K2: PK O2 C1(K1, O1) C2(K2, O2) Association A PrismTech A(K1,K2, a1, a2) Copyright PrismTech, 2014 Association Classes
  29. 29. tsdotd14 Self Association • Self association are modelled as traditional relations, which the only Student sid: int name: string age: int gpa: float * PrismTech Student(sid, name, age, gpa) Sibling(sidParent, sidSibling) * Slbling Copyright PrismTech, 2014 difference that attributes mau be conserved
  30. 30. Subclasses Three ways of mapping subclassing to the relational model T2 Subclass relations contain all attributes T3 One relation containing all superclass and subclass attributes A K: PK X T1 A(K, X), B(K, Y), C(K, Z) T2 A(K, X), B(K, X, Y), C(K, X, Z) B Y C Z T3 A(K, X, Y, Z) The best translation may depend on the the context, e.g. T3 good for heavily overlapping subclasses, T2 good for disjoint and complete subclasses PrismTech Copyright PrismTech, 2014 T1 Subclass relations contain the superclass key and the specialised attributes
  31. 31. Composition and Aggregation • The precondition to easily map composition to the relational model is for Whole K: PK W Part P Whole(K, W) Part(P, K) • When mapping aggregation (unfilled diamond), the key K on the Part should have a domain that allows for null values PrismTech Copyright PrismTech, 2014 the part not to have a key
  32. 32. • A subset of UML can be used to model relational data models • The mapping rules can be used to help translating existing Object Oriented data models into their relational counter-part PrismTech Copyright PrismTech, 2014 Summing Up
  33. 33. Refinement
  34. 34. Why Relation Refinement? • The UML/ER Data Models provide usually a good starting point toward the • The relations implied by the UML/ER Data Model often need to be normalised and re-organised to address performances and workload criteri • The goal of relation refinements is to remove redundancy and/or decompose a relation with smaller relations • Normal forms provide a way of measuring the amount of redundancy that may be in our data model PrismTech Copyright PrismTech, 2014 data model that we’ll actually use in the system
  35. 35. Redundancy • Redundant Storage: Information may be stored multiple times leading to • Update Anomalies: If one copy of the redundant information is update this may create inconsistencies in other copies — unless all copies are updated at the same time • Insertion Anomalies: It may not be possible to store some information, unless some other information is stored as well • Deletion Anomalies: It may not be possible to delete some information without loosing som other information as well PrismTech Copyright PrismTech, 2014 space, and perhaps time, inefficiencies
  36. 36. Decomposition • Unconsidered decomposition can lead more problems than benefits, thus - You really need to decompose the relation - You fully understand the implications of the decomposition (lossless join, dependency preservation) • Normal Forms provide good guidelines for relations decompositions as they guarantees that certain class of problems cannot be introduced • Notice that decomposition can have a performance impact as it may lead to an increase in joins PrismTech Copyright PrismTech, 2014 when decomposing you always want to ensure that:
  37. 37. Functional Dependencies • A Functional Dependency (FD) is a kind of Integrity Constraint (IC) that • Given a relation R along with two nonempty sets of attributes X and Y in R, we say that R satisfies the FD X ⟶ Y if the following holds for every pair of tuples t1 and t2 in R: ! if t1.X = t2.X then t1.Y = t2.Y • In other terms, the FD says that if two tuple agree on the set of attributes on X they also agree on the set of attributes in Y • Notice that a primary key constraint is a special kind of FD PrismTech Copyright PrismTech, 2014 generalises the concept of a key
  38. 38. Example percentile of the student GPA, e.g. which percentage of students has a GPA that is smaller of equal ! sid ! ! name age gpa percentile 1234 Peter Parker 21 4.0 100 2345 Tony Stark 3456 Bruce Wayne 15 23 4.0 3.5 100 75 ! • Clearly we have that the percentile attribute functionally depends on gpa, or equivalently gpa ⟶ percentile PrismTech Copyright PrismTech, 2014 • Let’s assume our Student relation now includes a new attribute that measure the
  39. 39. Normal Forms
  40. 40. Normal Forms • Different Normal Forms (NF) exist that provide guidance on how to decompose • If a relation is in a given normal form then we are guarantees that some anomalies cannot arise, e.g. update anomaly, etc. • The normal forms based on functional dependencies are the first normal form (1FN), second normal form (2FN), third normal form (3NF) and the Boyce-Codd normal form (BCNF) • Every relation in BCNF is also in 3NF, every relation in 3FN is also in 2FN and finally every relation in 2NF is also in 1NF • The 2NF and 3NF have only historical interest, while the BCNF has important practical applicability PrismTech Copyright PrismTech, 2014 relations
  41. 41. • A relation is in 1NF if every field contains only atomic values, that is not lists, or sets PrismTech Copyright PrismTech, 2014 1NF
  42. 42. Boyce-Codd Normal Form (BCNF) Let R be a relation, X a subset of attributes of R and a an attribute of R. R is in Boyce-Codd Normal Form (BCNF) if for every FD: X ⟶ {a} that holds over R, one of the following is true: Copyright PrismTech, 2014 • a ∊ X, that is it is a trivial FD, or • X is a superkey ! Intuitively, in a BCNF relation the only nontrivial dependencies are those in which a key determines some attributes. Each attribute must describe the key, the whole key, and nothing but the key key attr 1 attr 2 Functional Dependencies in BCNF PrismTech attr k
  43. 43. BCNF Decomposition Algorithm Input: relation R and FDs for R Compute Keys for R Repeat until all relations are in BCNF Choose a relation Ri with A ⟶ B that violates BCNS Decompose Ri into R1(A, B) and R2(A, rest) Compute FDs for R1 and R2 Compute Keys for R1 and R2 PrismTech Copyright PrismTech, 2014 Output: decomposition of R into BCNF relations with lossless join
  44. 44. 3NF • a ∊ X, that is it is a trivial FD, or • X is a superkey, or • a is part of some key for R The definition of 3NF is similar to that of BCNF, with the difference that a may be part of a key for R PrismTech Copyright PrismTech, 2014 Let R be a relation schema, X a subset of attributes of R and a an attribute of R. R is in Third Normal Form if for every FD: X ⟶ {a} that holds over R, one of the following is true:
  45. 45. Multivalued Dependencies • For a relation R we say that A ↠ B (A multi-determines B), where A and B ! ! ! ∀ t,u ∈ R: t.A = u.A then ∃ v ∈ R: v.A = t.A and v.B = t.B and v.rest = u.rest ! • Multivalued dependencies are sometimes called tuple-generating dependencies PrismTech Copyright PrismTech, 2014 are sets of fields in R, if:
  46. 46. • A relation R with multivalued dependencies (MVD) is in 4NF if for each nontrivial A ↠ B, A is a key • The 4NF implies the BCNF PrismTech Copyright PrismTech, 2014 Fourth Normal Form (4NF)
  47. 47. 4NF Decomposition Algorithm Output: decomposition of R into 4NF relations with lossless join Compute Keys for R Repeat until all relations are in 4NF Choose a relation Ri with a nontrivial A ↠ B that violates 4NF Decompose Ri into R1(A, B) and R2(A, rest) Compute FDs and MVDs for R1 and R2 Compute Keys for R1 and R2 PrismTech Copyright PrismTech, 2014 Input: relation R and FDs and MVDs for R
  48. 48. • Dependency enforcement may require joins • Query workload — due to excessive joins • Over-decomposition PrismTech Copyright PrismTech, 2014 Shortcomings of BCNF and 4NF
  49. 49. Relational Algebra
  50. 50. Selection and Projection • Relational algebra provides operators to select rows (σ) an to project columns from a relation (π) Copyright PrismTech, 2014 • These operation allow to operate on a single relation Examples: σage<20 (Student) Student sid name 1234 Peter Parker age gpa 21 4.0 2345 Tony Stark 15 3456 Bruce Wayne 23 4.0 3.5 PrismTech sid name 2345 Tony Stark age gpa 15 4.0 πname,gpa(Student) name gpa Peter Parker 4.0 Tony Stark 4.0 Bruce Wayne 3.5
  51. 51. • Join is one of the most useful operator in relational algebra and is most commonly used to combine/reassemble information from two or more relations • Join is conceptually a cross product followed by a selection and projection PrismTech Copyright PrismTech, 2014 Joins
  52. 52. • Condition joins are the most general form of joins. This operation takes a condition and two relations and is defined as follows: R ⋈c C = σc(RxS) PrismTech Copyright PrismTech, 2014 Condition Joins
  53. 53. • Equijoin is a special case of the Condition Join, where the condition predicates on attribute equality PrismTech Copyright PrismTech, 2014 Equijoin
  54. 54. • A Natural Join is a special Equijoin that operates on all the attributes having the same name in R and S PrismTech Copyright PrismTech, 2014 Natural Join
  55. 55. Back to DDS
  56. 56. Relational Design in DDS • Start decomposing based on properties (can use UML for this) • Apply a normal form - Functional Dependencies => Boyce-Codd Normal Form - Multivalued Dependencies => Fourth Normal Form • Define QoS for the resulting relations and further decompose if you incur in some QoS Mix (more later) PrismTech Copyright PrismTech, 2014 • Start identifying corse relations and properties of data
  57. 57. • DDS Supports: - Selection for a given Topic DDS queries and filters - Conditional Joins across multiple Topics via the Multi-Topics • DDS uses a subset of SQL-92 to express selections, projections and joins PrismTech Copyright PrismTech, 2014 Relational Algebra
  58. 58. • In some instances you may find that a topic (relation) R has two disjoint sets of attribute X and Y that have conflicting temporal, reliability or durability requirements • In this case this relation has to be further decomposed PrismTech Copyright PrismTech, 2014 DDS Specific Decomposition
  59. 59. Frequency Mix • Suppose you have a relation R(K, X,Y) were the set of attributes X changes • In this case you should decompose the relation R into: ! R1(K, X), R2(K, Y) ! • This will reduce the resource usage in your system, e.g. bandwidth as well as CPU but may introduce consistency issues. If consistency is essential then coherent updates should be used to atomically update R1 and R2 PrismTech Copyright PrismTech, 2014 far more frequently than the set of attributes Y (e.g. position, vs. velocity)
  60. 60. Reliability Mix some soft-state. • In this case you should decompose the relation R into: ! R1(K, X), R2(K, Y) ! • This decomposition allows to only use reliable distribution for R1 and besteffort for R2 thus reducing resource usage in the system PrismTech Copyright PrismTech, 2014 • Suppose you have a relation R(K, X,Y) were the set of attributes Y represent
  61. 61. Durability Mix a different durability than the set of attributes Y, e.g. X need sto be persistent while Y volatile • In this case you should decompose the relation R into: ! R1(K, X), R2(K, Y) ! • This will reduce the resource usage in your system and reduce the pressure on the Durability Service PrismTech Copyright PrismTech, 2014 • Suppose you have a relation R(K, X,Y) were the set of attributes X requires
  62. 62. Summing Up
  63. 63. Concluding Remarks • The relational model provides the right set of tools for designing DDS-based • DDS Topics are relations and DDS supports a subset of relational algebra to manipulate these relations (topics) • The design process is as follows: - Ensure your model is in BCNF or 4NF — make sure your understand why some violations are necessary/desirable for your system - Add QoS to your relations - PrismTech Start modelling your system using the UML Data Modelling subset Evaluate if further decomposition is required due to QoS mixes — if your data model is properly normalised Copyright PrismTech, 2014 systems
  64. 64. Learn More…
  65. 65. • A First Course in Database Systems (3rd edition), Ullman and Widom • Database Management Systems (3rd edition), Ramakrishnan and Gehrke PrismTech Copyright PrismTech, 2014 Books
  66. 66. • Jennifer Widom, Stanfords, Introduction to Databases - PrismTech A very very good course on Databases in general and specifically on relational data modelling Copyright PrismTech, 2014 coursera.org
  67. 67. Extras
  68. 68. ER Modelling
  69. 69. • Relational Data Models are commonly expressed using, some variation of, Entity-Relationship (ER) Data Models • The ER Data Model is built around the concepts of entities, attributes and relationships (not to be confused with relations!) PrismTech Copyright PrismTech, 2014 Entity Relationship(ER) Data Model
  70. 70. Entities, Attributes and Entity Sets • An entity is an object in the real world that is distinguishable from other - e.g. the iPhone, the Samsumg Galaxy Note, etc. • An entity is described through a set of attributes • An entity set identifies a collections of similar entities - e.g., Mobile Phones • Each attribute associated with an entity set must identify its domain • An entity has a primary key and potentially several candidate keys PrismTech Copyright PrismTech, 2014 objects
  71. 71. Mapping name age sid gpa Student Student Entity Set PrismTech sid 1234 2345 3456 name Peter Parker Tony Stark Bruce Wayne age 21 15 23 Student Entity Set gpa 4.0 4.0 3.5 Copyright PrismTech, 2014 • An entity set is mapped to a relation
  72. 72. • A relationship is an association between two or more entities - e.g., a student is enrolled in a course • A relationship can have descriptive attribute to record information about a relationship PrismTech Copyright PrismTech, 2014 Relationships
  73. 73. Mapping • The attributes of the resulting relation are: - the primary key of each participating entity as foreign keys - descriptive attributes as fields of the relation • The primary key of the resulting relations depends on arity of the relationship PrismTech Copyright PrismTech, 2014 • A relationship Set is mapped to a relation
  74. 74. Entity Hierarchies • In some cases it is natural to introduce (type) name Copyright PrismTech, 2014 ssn Employees hierarchies among entities • These hierarchies are represented through the ISA relationship hoursWorked ISA contractId hourlyWages HourlyEmpls PrismTech ContractEmpls
  75. 75. ISA relationships can be mapped into two ways • Map each entity to a distinct relation • Create only relations for the concrete types Notice that while the first approach is always applicable, the second is not PrismTech Copyright PrismTech, 2014 Mapping

×