Querying UML Class Diagrams


                   Georg Gottlob
             Department of Computer Science
                   University of Oxford




joint work with Andrea Calì, Giorgio Orsi and Andreas Pieris
Major Modeling Formalisms for Data and Objects
 Competes
                                                                                                        person: SSN  Name
               0..1
                                                                     Stock                             employee[1]  person[1]
          Company                 Issues
                                                             Index[0..1]:Str
0..1                      1..1                   1..1
                                                             getIndex():List
                                                                                                 Relational Data Dependencies
               1..1
                                                                      0..1
   Member

                                                                      Owns
               2..1

          Executive                          Person
                                                              1..1



              UML Class Diagrams                                                                           XML Schemas

                                                                                                          student v member
           m_name                        since                         g_name                              leads¡ v works
          member
                      (1,1)        1
                                         works
                                                     2       (1,N)
                                                                      group
                                                                                                          student v :professor
                                                                             (1,1)
                                                     [1,2]
                                                                                                       Description Logics
                                 (1,N)
student          professor                           leads
                                                 1             2
                                                                                     Context C1 inv: C1.allInstances -> forAll ( x1: C1 | C2.allInstances ->
                                                                                     forAll ( x2: C2 | x1=x2 implies x2.oclIsTyeOf(C)))
                        ER Diagrams
                                                                                             Object Constraint Language (OCL)
Datalog± : A Unifying Logical Framework


                            Datalog±

       Description Logics
          (DL-Lite, EL,…)       Relational Constraints
                                    (IDs, FKDs,…)


             Datalog
                              Conceptual Models
                                  (UML, ER,…)




 … providing: Logical foundations, semantics, decidability and
   complexity results for reasoning and query-answering,
           identification of tractable fragments, …
Datalog§
                                8X8Y (X,Y)  (X)

•   Extend Datalog with additional features such as:
     •   Existential quantification (9): TGDs      8X8Y (X,Y)  9Z (X,Z)

     •   Equality atoms (=): EGDs          8X (X)  Xi = Xj

     •   Constant false (?): Negative constraints       8X (X)  ?


•   But query answering under Datalog[9] is undecidable
    [see, e.g., Beeri & Vardi, ICALP 81]



•   Datalog[9,=,?] is syntactically restricted ! Datalog§
Restriction: Guardedness

• All 8-variables occur in one body atom - guard atom


       8X8Y8Z R(X,Y,Z) ^ S(Y) ^ P(X,Z)  9W Q(X,W)


                   guard


• Models of finite treewidth ) decidability of query answering
   [Calì, G. & Kifer, KR 08] related to work by [Andréka, Németi & van Benthem] and [Grädel]




• Query answering is PTIME-complete in data complexity
   [Calì, G. & Lukasiewicz, PODS 09]
Reasoning over UML Class Diagrams

   Competes
                0..1                                 Stock
          Company             Issues
   0..1                1..1
                                                Index[0..1]:Str
                                        1..1
                1..1                            getIndex():List
                                                          0..1
      Member
                                                          Owns
                2..1
          Executive                    Person
                                                   1..1




  Satisfiability: Does the diagram admit at least one instantiation?
Reasoning over UML Class Diagrams



• Satisfiability - the diagram has a (possibly infinite) non-empty
  instantiation



• Full Satisfiability - the diagram has a (possibly infinite) instantiation
  where each class and association is non-empty



• Finite Satisfiability - the diagram has a finite instantiation
Reasoning over UML Class Diagrams


                             Person


                                  {disjoint}



               Student                         Worker




      finitely satisfiable, e.g., {Worker(john),Person(john)}

        but not fully - student class is necessarily empty
Complexity of Reasoning over UML Class Diagrams


     • Satisfiability is EXPTIME-complete
        [Berardi, Calvanese & De Giacomo, Artificial Intelligence 05]




     • Full Satisfiability is EXPTIME-complete
       [Artale, Calvanese & Ibánez-García, ER 10]




     • Finite Satisfiability is EXPTIME-complete
        [implicit in Berardi, Calvanese & De Giacomo, Artificial Intelligence 05]
Querying UML Class Diagrams

 Executive(john)           Competes
Member(john,LU)                        0..1
                                                                          Stock
  Stock(BAY)                      Company            Issues
                           0..1               1..1            1..1
                                                                     Index[0..1]:Str
Issues(BA,BAY)
                                       1..1                          getIndex():List
Owns(john,BAY)                                                               0..1
Competes(LU,BA)             Member
                                                                             Owns
                                       2..1
                                  Executive               Person
                                                                      1..1




            Which persons have a potential conflict of interest?
Querying UML Class Diagrams

  Executive(john)           Competes
 Member(john,LU)                       0..1
                                                                          Stock
   Stock(BAY)                     Company            Issues
                           0..1               1..1            1..1
                                                                     Index[0..1]:Str
  Issues(BA,BAY)
                                       1..1                          getIndex():List
 Owns(john,BAY)                                                              0..1
 Competes(LU,BA)             Member
                                                                             Owns
                                       2..1
                                  Executive               Person
                                                                      1..1




Conflict(P)  Person(P), Company(C1), Company(C2), Stock(S),
                Owns(P,S), Member(P,C1), Issues(C2,S), Competes(C1,C2)
Querying UML Class Diagrams

 Executive(john)           Competes
Member(john,LU)                       0..1
                                                                         Stock
  Stock(BAY)                     Company            Issues
                          0..1               1..1            1..1
                                                                    Index[0..1]:Str
Issues(BA,BAY)
                                      1..1                          getIndex():List
Owns(john,BAY)                                                              0..1
Competes(LU,BA)             Member
                                                                            Owns
                                      2..1
                                 Executive               Person
                                                                     1..1




            Does anybody have a potential conflict of interest?
Querying UML Class Diagrams

 Executive(john)          Competes
Member(john,LU)                      0..1
                                                                        Stock
  Stock(BAY)                    Company            Issues
                         0..1               1..1            1..1
                                                                   Index[0..1]:Str
Issues(BA,BAY)
                                     1..1                          getIndex():List
Owns(john,BAY)                                                             0..1
Competes(LU,BA)           Member
                                                                           Owns
                                     2..1
                                Executive               Person
                                                                    1..1




 Conflict  Person(P), Company(C1), Company(C2), Stock(S),
             Owns(P,S), Member(P,C1), Issues(C2,S), Competes(C1,C2)
Querying UML Class Diagrams

 Executive(john)          Competes
Member(john,LU)                      0..1
                                                                        Stock
  Stock(BAY)                    Company            Issues
                         0..1               1..1            1..1
                                                                   Index[0..1]:Str
Issues(BA,BAY)
                                     1..1                          getIndex():List
Owns(john,BAY)                                                             0..1
Competes(LU,BA)           Member
  Person(john)                                                             Owns
                                     2..1
                                Executive               Person
                                                                    1..1




 Conflict  Person(P), Company(C1), Company(C2), Stock(S),
             Owns(P,S), Member(P,C1), Issues(C2,S), Competes(C1,C2)
Querying UML Class Diagrams

 Executive(john)          Competes
Member(john,LU)                      0..1
                                                                        Stock
  Stock(BAY)                    Company            Issues
                         0..1               1..1            1..1
                                                                   Index[0..1]:Str
Issues(BA,BAY)
                                     1..1                          getIndex():List
Owns(john,BAY)                                                             0..1
Competes(LU,BA)           Member
  Person(john)                                                             Owns
                                     2..1
 Company(LU)                    Executive               Person
 Company(BA)                                                        1..1




 Conflict  Person(P), Company(C1), Company(C2), Stock(S),
             Owns(P,S), Member(P,C1), Issues(C2,S), Competes(C1,C2)
Querying UML Class Diagrams

 Executive(john)          Competes
Member(john,LU)                      0..1
                                                                        Stock
  Stock(BAY)                    Company            Issues
                         0..1               1..1            1..1
                                                                   Index[0..1]:Str
Issues(BA,BAY)
                                     1..1                          getIndex():List
Owns(john,BAY)                                                             0..1
Competes(LU,BA)           Member
  Person(john)                                                             Owns
                                     2..1
 Company(LU)                    Executive               Person
 Company(BA)                                                        1..1


                                {P ! john, C1 ! LU, C2 ! BA, S ! BAY}

 Conflict  Person(P), Company(C1), Company(C2), Stock(S),
             Owns(P,S), Member(P,C1), Issues(C2,S), Competes(C1,C2)
Querying UML Class Diagrams: Existential Rules

  Group(DB)
                                   Member   3..1             WorksIn

                                                                               1..1
                             {disjoint}                                  Group

                                                                               0..1
                         Student          Professor
                                                      1..1    Leads

                                                                      CLeads
                                                                 since: Date




       Is there a professor who works in the database group?
Querying UML Class Diagrams: Existential Rules

  Group(DB)
                                 Member   3..1            WorksIn

                                                                            1..1
                          {disjoint}                                  Group

                                                                            0..1
                       Student         Professor
                                                   1..1    Leads

                                                                   CLeads
                                                              since: Date




              Ans  Professor(P), WorksIn(P,DB)
Querying UML Class Diagrams: Existential Rules

  Group(DB)
 Leads(z1,DB)                       Member   3..1            WorksIn
 Professor(z1)                                                                 1..1
                             {disjoint}                                  Group
 R*(z1,DB,z2)
  CLeads(z2)                                                                   0..1
                          Student         Professor
                                                      1..1    Leads

                                                                      CLeads
                                                                 since: Date




                 Ans  Professor(P), WorksIn(P,DB)
Querying UML Class Diagrams: Existential Rules

  Group(DB)
 Leads(z1,DB)                       Member   3..1            WorksIn
 Professor(z1)                                                                 1..1
                             {disjoint}                                  Group
 R*(z1,DB,z2)
  CLeads(z2)                                                                   0..1
WorksIn(z1,DB)            Student         Professor
                                                      1..1    Leads

                                                                      CLeads
                                                                 since: Date




                 Ans  Professor(P), WorksIn(P,DB)
Querying UML Class Diagrams: Existential Rules

  Group(DB)
 Leads(z1,DB)                       Member   3..1            WorksIn
 Professor(z1)                                                                 1..1
                             {disjoint}                                  Group
 R*(z1,DB,z2)
  CLeads(z2)                                                                   0..1
WorksIn(z1,DB)            Student         Professor
                                                      1..1    Leads
      …
                                                                      CLeads
                                                                 since: Date



                            {P ! z1, DB ! DB}


                 Ans  Professor(P), WorksIn(P,DB)
Querying UML Class Diagrams




                     D


                     



                           Q


        D[²Q   ,   8M (M ² D [  ! M ² Q)
Querying UML Class Diagrams




                     D


                     



                           Q


        D[²Q   ,   8M (M ² D [  ! M ² Q)

                     M¶D Æ M²
From Diagrams to First-Order Logic (Datalog§)
    Competes
               0..1                                Stock
          Company            Issues
   0..1               1..1
                                               Index[0..1]:Str
                                       1..1
               1..1                            getIndex():List
                                                         0..1
      Member
                                                         Owns
               2..1
          Executive                   Person
                                                  1..1
From Diagrams to First-Order Logic (Datalog§)
    Competes
               0..1                                Stock
          Company            Issues
   0..1               1..1
                                               Index[0..1]:Str
                                       1..1
               1..1                            getIndex():List
                                                         0..1
      Member
                                                         Owns
               2..1
          Executive                   Person
                                                  1..1



  8X8Y Member(X,Y)  Company(X) ^ Executive(Y)
  8X Company(X)  9Y9Z Member(X,Y) ^ Member(X,Z) ^ Y ≠ Z
  8X Executive(X)  9Y Member(Y,X)
                                                    [Satoh & Kaneiwa, TCS 10
                                                     and Berardi et al., AI 05]
From Diagrams to First-Order Logic (Datalog§)
    Competes
                0..1                                Stock
          Company             Issues
   0..1                1..1
                                                Index[0..1]:Str
                                         1..1
                1..1                            getIndex():List
                                                          0..1
      Member
                                                          Owns
                2..1
          Executive                    Person
                                                   1..1


 8X Company(X)  9Y Issues(X,Y)        8X Stock(X)  9Y Issues(Y,X)
 8X8Y8Z Stock(X) ^ Issues(Y,X) ^ Issues(Z,X)  Y = Z
 8X8Y Stock(X) ^ Index(X,Y)  Str(Y)
                                                     [Satoh & Kaneiwa, TCS 10
 8X8Y Stock(X) ^ getIndex(X,Y)  List(Y)              and Berardi et al., AI 05]
From Diagrams to First-Order Logic (Datalog§)
    Competes
               0..1                                Stock
          Company            Issues
   0..1               1..1
                                               Index[0..1]:Str
                                       1..1
               1..1                            getIndex():List
                                                         0..1
      Member
                                                         Owns
               2..1
          Executive                   Person
                                                  1..1




                 8X Executive(X)  Person(X)

                                                    [Satoh & Kaneiwa, TCS 10
                                                     and Berardi et al., AI 05]
Complexity of Query Answering


• EXPTIME-complete in combined complexity (everything is part of the input)
   [implicit in Berardi et al., AI 05 and Lutz, IJCAR 08]




• coNP-complete in data complexity (the diagram and the query are fixed)
   [implicit in Ortiz, Calvanese & Eiter, AAAI 06]




• Undecidable when diagrams are combined with arbitrary OCL
  (Object Constraint Language) constraints
   [folklore]
Research Challenge: Reduce High Complexity


   • Diagrams often have very large instantiations




   • Some applications require very large diagrams




   • OCL constraints, that are not expressible diagrammatically,
     lead to undecidability or high complexity
Our Goals


  • Restrict UML class diagrams to achieve tractability of query
    answering in data complexity




  • Better understanding of combined complexity




  • Add relevant OCL constraints without losing tractability of
    query answering in data complexity
Lean UML Class Diagrams

• For each attribute assertion   Attribute[ i..j ]:Type   j 2 {1,1}
Lean UML Class Diagrams

• For each attribute assertion   Attribute[ i..j ]:Type   j 2 {1,1}




                                                      mL..mU   A   nL..nU
• For each association A:                       C1                          C2

        - upper bounds mU ,nU 2 {1,1},

       - if A generalizes some other association, then mU = nU = 1
Lean UML Class Diagrams

• For each attribute assertion   Attribute[ i..j ]:Type   j 2 {1,1}




                                                      mL..mU   A   nL..nU
• For each association A:                       C1                          C2

        - upper bounds mU ,nU 2 {1,1},

       - if A generalizes some other association, then mU = nU = 1


                                                               C
• Completeness constraints are forbidden
                                                               {complete}

             8X C(X)  C1(X) _ C2(X)                 C1                     C2
Lean UML Class Diagrams

• For each attribute assertion   Attribute[ i..j ]:Type   j 2 {1,1}




                                                      mL..mU   A   nL..nU
• For each association A:                       C1                          C2

        - upper bounds mU ,nU 2 {1,1},

       - if A generalizes some other association, then mU = nU = 1


                                                               C
• Completeness constraints are forbidden
                                                               {complete}

             8X C(X)  C1(X) _ C2(X)                 C1                     C2
Lean UML Class Diagrams: Example



   Competes
               0..1                                Stock
          Company            Issues
   0..1               1..1
                                               Index[0..1]:Str
                                       1..1
               1..1                            getIndex():List
                                                        0..1
      Member
                                                        Owns
               2..1
          Executive                   Person
                                                 1..1
Lean UML Class Diagrams: Example


                        3..1             WorksIn
             Member
                                                             1..1
      {disjoint}                                       Group

                                                            0..1
  Student             Professor
                                  1..1    Leads


                                                   CLeads
                                              since: Date
Add Some Non-Diagrammatic Constraints (OCL)


              C




      C1                C2            8X C2(X) ^ C3(X)  ?
              C3



            disjoint classes



           We need negative constraints of the form

                   8X C1(X) ^ … ^ Cn(X)  ?
Add Some Non-Diagrammatic Constraints (OCL)


       C1                 C2




                 C                       8X C1(X) ^ C2(X)  C(X)


        most-specific class




        We need most-specific class constraints of the form

                  8X C1(X) ^ … ^ Cn(X)  C(X)
Add Some Non-Diagrammatic Constraints (OCL)

              Student
     Enrolled[0..1]:Course


                                    type the domain of Enrolled

            CS-Student
     Enrolled[0..1]:CS-Course


       8X8Y CS-Course(X) ^ Enrolled(Y,X)  CS-Student(Y)


            We need domain-type constraints of the form

                    8X8Y C(X) ^ Attr(Y,X)  T(Y)
Add Some Non-Diagrammatic Constraints (OCL)

              Student
     Enrolled[0..1]:Course


                                    type the domain of Enrolled

            CS-Student
     Enrolled[0..1]:CS-Course


       8X8Y CS-Course(X) ^ Enrolled(Y,X)  CS-Student(Y)


            We need domain-type constraints of the form

                    8X8Y C(X) ^ Attr(Y,X)  T(Y)

                pullback rule
Add Some Non-Diagrammatic Constraints (OCL)

              Student
     Enrolled[0..1]:Course


                                    type the domain of Enrolled

            CS-Student
     Enrolled[0..1]:CS-Course


       8X8Y CS-Course(X) ^ Enrolled(Y,X)  CS-Student(Y)


            We need domain-type constraints of the form

                    8X8Y C(X) ^ Attr(Y,X)  T(Y)

                pullback rule             will make a difference!
OCL (Object Constraint Language)

                            Context C1 inv:
                                C1.allInstances -> forAll ( x1: C1 |
 8X C1(X) ^ C2(X)  ?               C2.allInstances -> forAll ( x2: C2 |
                                        x1<>x2
                                    )
                                )



                            Context C1 inv:
                                C1.allInstances -> forAll ( x1: C1 |
                                    C2.allInstances -> forAll ( x2: C2 |
8X C1(X) ^ C2(X)  C(X)                 x1=x2 implies x2.oclIsTypeOf(C)
                                    )
                                )




                            Context Object inv:
8X8Y C(X) ^ a(Y,X)  T(Y)       Object.allInstances -> forAll ( y: Object |
                                     y.a.oclIsTypeOf(C) implies y.oclTypeOf(T)
                                )
Lean UML Class Diagrams as Rules
8X8Y C(X) ^ Attr(X,Y)  T(Y)
8X C(X)  9Y1…9Yn Attr(X,Y1) ^ … ^ Attr(X,Yn)       1 ≤ i < j ≤ n Yi    ≠ Yj
8X8Y8Z C(X) ^ Attr(X,Y) ^ Attr(X,Z)  Y = Z
8X8Y1…8Yn8Z C(X) ^ Op(X,Y1, …,Yn,Z)  T1(Y1) ^ … ^ Tn(Yn) ^ T(Z)
8X8Y1…8Yn8Z18Z2 C(X) ^ Op(X,Y1, …,Yn,Z1) ^ Op(X,Y1, …,Yn,Z2)  Z1 = Z2
8X C1(X)  C2(X)
8X C1(X) ^ … ^ Cn(X)  ?
8X1…8Xn A(X1,…,Xn)  C1(X1) ^ … ^ Cn(Xn)                               8X18X2 A(X1,X2)  C1(X1) ^ C2(X2)
8X1…8Xn8Y A(X1,…,Xn) ^ R*(X1,…,Xn,Y)  CA(Y)                           8X18X28Y A(X1,X2) ^ R*(X1,X2,Y)  CA(Y)
8X1…8Xn A(X1,…,Xn)  9Y R*(X1,…,Xn,Y)                                  8X18X2 A(X1,X2)  9Y R*(X1,X2,Y)
8X1…8Xn8Y8Z A(X1,…,Xn) ^ R*(X1,…,Xn,Y) ^ R*(X1,…,Xn,Z)  Y = Z
8X1…8Xn8Y1…8Yn8Z R*(X1,…,Xn,Z) ^ R*(Y1,…,Yn,Z) ^ CA(Z)  X1 = Y1 ^ … ^ Xn = Yn
8X C(X)  9Y1…9Yn A(X,Y1) ^ … ^ A(X,Yn)       1 ≤ i < j ≤ n Yi   ≠ Yj
8X8Y8Z C(X) ^ A(X,Y) ^ A(X,Z)  Y = Z
8X C(X)  9Y1…9Yn A(Y1,X) ^ … ^ A(Yn,X)       1 ≤ i < j ≤ n Yi   ≠ Yj
8X8Y8Z C(X) ^ A(Y,X) ^ A(Z,X)  Y = Z
8X1…8Xn A1(X1,…,Xn)  A2(X1,…, Xn)                                     8X18X2 A1(X1,X2)  A2(X1,X2)
Lean UML Class Diagrams as Rules
8X8Y C(X) ^ Attr(X,Y)  T(Y)
8X C(X)  9Y1…9Yn Attr(X,Y1) ^ … ^ Attr(X,Yn)       1 ≤ i < j ≤ n Yi    ≠ Yj
                                                                                          pre-process
8X8Y8Z C(X) ^ Attr(X,Y) ^ Attr(X,Z)  Y = Z
8X8Y1…8Yn8Z C(X) ^ Op(X,Y1, …,Yn,Z)  T1(Y1) ^ … ^ Tn(Yn) ^ T(Z)
8X8Y1…8Yn8Z18Z2 C(X) ^ Op(X,Y1, …,Yn,Z1) ^ Op(X,Y1, …,Yn,Z2)  Z1 = Z2                   check via query
8X C1(X)  C2(X)
8X C1(X) ^ … ^ Cn(X)  ?
8X1…8Xn A(X1,…,Xn)  C1(X1) ^ … ^ Cn(Xn)                               8X18X2 A(X1,X2)  C1(X1) ^ C2(X2)
8X1…8Xn8Y A(X1,…,Xn) ^ R*(X1,…,Xn,Y)  CA(Y)                           8X18X28Y A(X1,X2) ^ R*(X1,X2,Y)  CA(Y)
8X1…8Xn A(X1,…,Xn)  9Y R*(X1,…,Xn,Y)                                  8X18X2 A(X1,X2)  9Y R*(X1,X2,Y)
8X1…8Xn8Y8Z A(X1,…,Xn) ^ R*(X1,…,Xn,Y) ^ R*(X1,…,Xn,Z)  Y = Z
8X1…8Xn8Y1…8Yn8Z R*(X1,…,Xn,Z) ^ R*(Y1,…,Yn,Z) ^ CA(Z)  X1 = Y1 ^ … ^ Xn = Yn
8X C(X)  9Y1…9Yn A(X,Y1) ^ … ^ A(X,Yn)       1 ≤ i < j ≤ n Yi   ≠ Yj
8X8Y8Z C(X) ^ A(X,Y) ^ A(X,Z)  Y = Z
8X C(X)  9Y1…9Yn A(Y1,X) ^ … ^ A(Yn,X)       1 ≤ i < j ≤ n Yi   ≠ Yj
8X8Y8Z C(X) ^ A(Y,X) ^ A(Z,X)  Y = Z
8X1…8Xn A1(X1,…,Xn)  A2(X1,…, Xn)                                     8X18X2 A1(X1,X2)  A2(X1,X2)
Lean UML Class Diagrams as Rules

 8X8Y C(X) ^ Attr(X,Y)  T(Y)

 8X C(X)  9Y1…9Yn Attr(X,Y1) ^ … ^ Attr(X,Yn)             classes
 8X C1(X)  C2(X)

 8X18X2 A(X1,X2)  C1(X1) ^ C2 (X2)

 8X18X28Y A(X1,X2) ^ R*(X1,X2,Y)  CA(Y)

 8X18X2 A(X1,X2)  9Y R*(X1,X2,Y)
                                                      associations
 8X C(X)  9Y1…9Yn A(X,Y1) ^ … ^ A(X,Yn)

 8X C(X)  9Y1…9Yn A(Y1,X) ^ … ^ A(Yn,X)

 8X18X2 A1(X1,X2)  A2(X1,X2)

 8X C1(X) ^ … ^ Cn(X)  C(X)
                                             additional constraints
 8X8Y C(X) ^ Attr(Y,X)  T(Y)
Lean UML Class Diagrams as Rules

 8X8Y C(X) ^ Attr(X,Y)  T(Y)

 8X C(X)  9Y1…9Yn Attr(X,Y1) ^ … ^ Attr(X,Yn)

 8X C1(X)  C2(X)

 8X18X2 A(X1,X2)  C1(X1) ^ C2 (X2)

 8X18X28Y A(X1,X2) ^ R*(X1,X2,Y)  CA(Y)

 8X18X2 A(X1,X2)  9Y R*(X1,X2,Y)
                                                 guard atoms
 8X C(X)  9Y1…9Yn A(X,Y1) ^ … ^ A(X,Yn)

 8X C(X)  9Y1…9Yn A(Y1,X) ^ … ^ A(Yn,X)

 8X18X2 A1(X1,X2)  A2(X1,X2)

 8X C1(X) ^ … ^ Cn(X)  C(X)

 8X8Y C(X) ^ Attr(Y,X)  T(Y)
Data Complexity of Query Answering


Theorem: Query answering under Lean UML class diagrams + negative
& most-specific class & domain-type constraints is PTIME-complete
Data Complexity of Query Answering


Theorem: Query answering under Lean UML class diagrams + negative
& most-specific class & domain-type constraints is PTIME-complete


Proof:

• in PTIME: reduction to assertions 8X8Y body(X,Y)  9Z head(X,Z),
  where body(X,Y) has a guard-atom [Calì, G. & Lukasiewicz, PODS 09]



• PTIME-hardness (even without domain-type constraints): reduction
  from Path System Accessibility
Combined Complexity of Query Answering



  Theorem: Query answering under Lean UML class diagrams +
  negative & most-specific class constraints is PSPACE-complete
Combined Complexity of Query Answering
        Proof:
        •          in PSPACE: there exists a tree-like universal model

                       D
                                                                              8X8Y C(X) ^ A(X,Y)  T(Y)
                                                                   C(x)
                                                                              8X8Y A(X,Y)  C1(X) ^ C2(Y)
polynomial depth
 Relevant part:




                                                                              8X8Y A(X,Y)  B(X,Y)
                                                              A(y,z)



                                                          B(y,z)       T(z)
                                      R(a,v)   S(v,w)
Combined Complexity of Query Answering
        Proof:
        •          in PSPACE: there exists a tree-like universal model

                       D
                                                                              8X8Y C(X) ^ A(X,Y)  T(Y)
                                                                   C(x)
                                                                              8X8Y A(X,Y)  C1(X) ^ C2(Y)
polynomial depth
 Relevant part:




                                                                              8X8Y A(X,Y)  B(X,Y)
                                                              A(y,z)



                                                          B(y,z)       T(z)
                                      R(a,v)   S(v,w)



                            Q= R(X1,X2) ^ S(X2,X3) ^ B(X4,X5) ^ T(X5)
Combined Complexity of Query Answering
        Proof:
        •          in PSPACE: there exists a tree-like universal model (Chase)

                       D
                                                                             8X8Y C(X) ^ A(X,Y)  T(Y)
                                                                  C(x)
                                                                             8X8Y A(X,Y)  C1(X) ^ C2(Y)
polynomial depth
 Relevant part:




                                                                             8X8Y A(X,Y)  B(X,Y)
                                                             A(y,z)



                                                         B(y,z)       T(z)
                                      R(a,v)   S(v,w)



                            Q= R(X1,X2) ^ S(X2,X3) ^ B(X4,X5) ^ T(X5)
Combined Complexity of Query Answering
 Proof:
 •   in PSPACE: there exists a tree-like universal model

          D
                                                                 8X8Y C(X) ^ A(X,Y)  T(Y)
                                                      C(x)
                                                                 8X8Y A(X,Y)  C1(X) ^ C2(Y)
                                                                 8X8Y A(X,Y)  B(X,Y)
                                                 A(y,z)



                                             B(y,z)       T(z)
                         R(a,v)   S(v,w)



               Q= R(X1,X2) ^ S(X2,X3) ^ B(X4,X5) ^ T(X5)

With each current atom A we need to compute and memorize its type(A). This is in NP
Combined Complexity of Query Answering
 Proof:
 •   in PSPACE: there exists a tree-like universal model

          D
                                                                 8X8Y C(X) ^ A(X,Y)  T(Y)
                                                      C(x)
                                                                 8X8Y A(X,Y)  C1(X) ^ C2(Y)
                                                                 8X8Y A(X,Y)  B(X,Y)
                                                 A(y,z)



                                             B(y,z)       T(z)
                         R(a,v)   S(v,w)



               Q= R(X1,X2) ^ S(X2,X3) ^ B(X4,X5) ^ T(X5)

With each current atom A we need to compute and memorize its type(A). This is in NP
Combined Complexity of Query Answering
  Proof:
  •   in PSPACE: there exists a tree-like universal model

           D

                                                       C(x)


                                                  A(y,z)



                                              B(y,z)       T(z)
                         R(a,v)   S(v,w)



                                                                  8X8Y C(X) ^ A(X,Y)  T(Y)

With each current atom A we need to compute                       8X8Y A(X,Y)  C1(X) ^ C2(Y)
and memorize its type(A). This is in NP                           8X8Y A(X,Y)  B(X,Y)
Combined Complexity of Query Answering
  Proof:
  •   in PSPACE: there exists a tree-like universal model

           D

                                                       C(x)


                                                  A(y,z)



                                              B(y,z)       T(z)
                         R(a,v)   S(v,w)



                                                                  8X8Y C(X) ^ A(X,Y)  T(Y)

With each current atom A we need to compute                       8X8Y A(X,Y)  C1(X) ^ C2(Y)
and memorize its type(A). This is in NP                           8X8Y A(X,Y)  B(X,Y)
Combined Complexity of Query Answering
  Proof:
  •   in PSPACE: there exists a tree-like universal model

           D

                                                       C(x)


                                                  A(y,z)



                                              B(y,z)       T(z)
                         R(a,v)   S(v,w)



                                                                  8X8Y C(X) ^ A(X,Y)  T(Y)

With each current atom A we need to compute                       8X8Y A(X,Y)  C1(X) ^ C2(Y)
and memorize its type(A). This is in NP                           8X8Y A(X,Y)  B(X,Y)
Combined Complexity of Query Answering
  Proof:
  •   in PSPACE: there exists a tree-like universal model

           D

                                                       C(x)


                                                  A(y,z)



                                              B(y,z)       T(z)
                         R(a,v)   S(v,w)



                                                                  8X8Y C(X) ^ A(X,Y)  T(Y)

With each current atom A we need to compute                       8X8Y A(X,Y)  C1(X) ^ C2(Y)
and memorize its type(A). This is in NP                           8X8Y A(X,Y)  B(X,Y)
Combined Complexity of Query Answering
  Proof:
  •   in PSPACE: there exists a tree-like universal model

           D

                                                       C(x)


                                                  A(y,z)



                                              B(y,z)       T(z)
                         R(a,v)   S(v,w)



                                                                  8X8Y C(X) ^ A(X,Y)  T(Y)

With each current atom A we need to compute                       8X8Y A(X,Y)  C1(X) ^ C2(Y)
and memorize its type(A). This is in NP                           8X8Y A(X,Y)  B(X,Y)
Combined Complexity of Query Answering
  Proof:
  •   in PSPACE: there exists a tree-like universal model

           D

                                                       C(x)


                                                  A(y,z)



                                              B(y,z)       T(z)
                         R(a,v)   S(v,w)



                                                                  8X8Y C(X) ^ A(X,Y)  T(Y)

With each current atom A we need to compute                       8X8Y A(X,Y)  C1(X) ^ C2(Y)
and memorize its type(A). This is in NP                           8X8Y A(X,Y)  B(X,Y)
Combined Complexity of Query Answering
  Proof:
  •   in PSPACE: there exists a tree-like universal model

           D

                                                       C(x)


                                                  A(y,z)



                                              B(y,z)       T(z)
                         R(a,v)   S(v,w)



                                                                  8X8Y C(X) ^ A(X,Y)  T(Y)

With each current atom A we need to compute                       8X8Y A(X,Y)  C1(X) ^ C2(Y)
and memorize its type(A). This is in NP                           8X8Y A(X,Y)  B(X,Y)
Combined Complexity of Query Answering
  Proof:
  •   in PSPACE: there exists a tree-like universal model

           D

                                                       C(x)


                                                  A(y,z)



                                              B(y,z)       T(z)
                         R(a,v)   S(v,w)



                                                                  8X8Y C(X) ^ A(X,Y)  T(Y)

With each current atom A we need to compute                       8X8Y A(X,Y)  C1(X) ^ C2(Y)
and memorize its type(A). This is in NP                           8X8Y A(X,Y)  B(X,Y)
Combined Complexity of Query Answering
  Proof:
  •   in PSPACE: there exists a tree-like universal model

           D

                                                       C(x)


                                                  A(y,z)



                                              B(y,z)       T(z)
                          R(a,v)   S(v,w)



                                                                  8X8Y C(X) ^ A(X,Y)  T(Y)
               Q= R(X1,X2) ^ S(X2,X3) ^ B(X4,X5) ^ T(X5)
With each current atom A we need to compute                       8X8Y A(X,Y)  C1(X) ^ C2(Y)
and memorize its type(A). This is in NP                           8X8Y A(X,Y)  B(X,Y)
Combined Complexity of Query Answering

Proof:
•     PSPACE-hardness: simulation of the computation of a PSPACE Turing machine
      on input I = α0α1...αn-1, assuming that it uses m = nk cells


        •    Initialization rules     8X initial(X)  initial-state(X)
                                       8X initial(X)  cell0[α0,1](X)
                                       8X initial(X)  celli[αi,0](X), for each i 2 {1,…,n-1}
                                       8X initial(X)  celli[0,0](X), for each i 2 {n,…,m-1}


    initial-state    cell0[α0,1]    cell1[α1,0]   …   celln-1[αn-1,0]   celln[0,0]   …   cellm-1[0,0]




                                                  initial
Combined Complexity of Query Answering

Proof:
•    PSPACE-hardness: simulation of the computation of a PSPACE Turing machine
     on input I = α0α1...αn-1, assuming that it uses m = nk cells


      •    Configuration generation rules



                                                                    config
          8X initial(X)  config(X)                            succ[1..1]:config

          8X config(X)  9Y succ(X,Y)

          8X8Y config(X) ^ succ(X,Y)  config(Y)
                                                                    initial
Combined Complexity of Query Answering

Proof:
•    PSPACE-hardness: simulation of the computation of a PSPACE Turing machine
     on input I = α0α1...αn-1, assuming that it uses m = nk cells


      •   Rules to describe the transition from one configuration to another,
          e.g., state transition rules

          for each δ(hs1,α1i) = hs2,α2,di:

          8X8Y s1-celli [α1,1](X) ^ succ(X,Y)  state-s2(Y), for each i 2 {0,…,m-1}


    in configuration X, which has state s1, the i-th
     cell contains α1, and the cursor is over cell i                s1-celli [α1,1]
                                                              succ[0..1]:state-s2
Combined Complexity of Query Answering

Proof:
•    PSPACE-hardness: simulation of the computation of a PSPACE Turing machine
     on input I = α0α1...αn-1, assuming that it uses m = nk cells


      •   Acceptance rule     8X accept-state(X)  accept(X)

                                            accept



                                         accept-state



      •   Initial database   D = {initial(c)} - c is the initial configuration

      •   Boolean CQ         Q= accept(X)

      •   Turing machine accepts I iff D [  ² Q
Combined Complexity of Query Answering


Theorem: Query answering under Lean UML class diagrams + negative &
most-specific class & domain-type constraints is EXPTIME-complete
Combined Complexity of Query Answering


Theorem: Query answering under Lean UML class diagrams + negative &
most-specific class & domain-type constraints is EXPTIME-complete



Proof:


•    in EXPTIME: reduction to assertions 8X8Y body(X,Y)  9Z head(X,Z), where
     body(X,Y) has a guard-atom, and all the predicates are of bounded arity
     [Calì, G. & Kifer, KR 08]
Combined Complexity of Query Answering
Proof:
•    EXPTIME-hardness: simulation of an alternating PSPACE Turing machine
      •   Acceptance rules - q 2 {9,} and i 2 {1,2}:

    8X q-accept-state(X)  accept(X)               accept



                                               q-accept-state



    8X8Y accept(X) ^ succ1(Y,X)  accept1(Y)            domain-type constraints
                                                           pullback rules
    8X8Y accept(X) ^ succ2(Y,X)  accept2(Y)

    8X 9-state(X) ^ accept1(X)  accept(X)
                                                                 most-specific class
    8X 9-state(X) ^ accept2(X)  accept(X)                          constraints
    8X -state(X) ^ accept1(X) ^ accept2(X)  accept(X)
Further Restrictions

              Student
    Enrolled [0..1]:Course




            CS-Student
    Enrolled [0..1]:CS-Course
Further Restrictions

              Student
    Enrolled [0..1]:Course

                                different classes have disjoint sets
                                    of attributes and operations

            CS-Student
    Enrolled [0..1]:CS-Course



     instead of 8X8Y Student(X) ^ Enrolled(X,Y)  Course(Y)

        we have 8X8Y Student-Enrolled(X,Y)  Course(Y)



   Data complexity in AC0 and combined complexity NP-complete
Complexity of Querying UML Class Diagrams


  UML          Additional           Data            Combined
Formalism      Constraints        Complexity        Complexity

   Full          none           coNP-complete    EXPTIME-complete

             + negative
  Lean       + specific-class   PTIME-complete   EXPTIME-complete
             + domain-type

             + negative
  Lean                          PTIME-complete   PSPACE-complete
             + specific-class

Restricted   + negative
                                    in AC0         NP-complete
  Lean       + specific-class
Datalog± : A Unifying Logical Framework


                 Datalog±


      Description Logics
        (DL-Lite, EL,…)         Relational Constraints
                                    (IDs, FKDs,…)


           Datalog
                              Conceptual Models
                                   (UML, ER,…)




                           … without losing tractable data complexity
Datalog± : A Unifying Logical Framework


                 Datalog±


      Description Logics
        (DL-Lite, EL,…)         Relational Constraints
                                    (IDs, FKDs,…)


           Datalog
                              Conceptual Models
                                   (UML, ER,…)




                           … without losing tractable data complexity
Ontological Reasoning and Datalog

         DL Assertion                   Datalog Rule
 Concept Inclusion
   emp v person               emp(X)  person(X)
 Concept Product
   sen-emp £ emp v moreThan   sen-emp(X) ^ emp(Y)  moreThan(X,Y)
 (Inverse) Role Inclusion
    reports¡ v mgr            reports(X,Y)  mgr(Y,X)
 Role Transitivity
   trans(mgr)                 mgr(X,Y) ^ mgr(Y,Z)  mgr(X,Z)
Ontological Reasoning and Datalog

         DL Assertion                    Datalog Rule
 Concept Inclusion
   emp v person               emp(X)  person(X)
 Concept Product
   sen-emp £ emp v moreThan   sen-emp(X) ^ emp(Y)  moreThan(X,Y)
 (Inverse) Role Inclusion
    reports¡ v mgr            reports(X,Y)  mgr(Y,X)
 Role Transitivity
   trans(mgr)                 mgr(X,Y) ^ mgr(Y,Z)  mgr(X,Z)

 Participation
   emp v 9report              emp(X)  9Y report(X,Y)
 Disjointness
    emp u customer v ?        emp(X) ^ customer(X)  ?
 Functionality
   funct(reports)             reports(X,Y) ^ reports(X,Z)  Y = Z
Guardedness

 • All 8-variables occur in one body atom - guard atom


         8X8Y8Z R(X,Y,Z) ^ S(Y) ^ P(X,Z)  9W Q(X,W)


                     guard


 • Models of finite treewidth )         decidability of query answering
    [Calì, G. & Kifer, KR 08]



 • Query answering is PTIME-complete in data complexity
    [Calì, G. & Lukasiewicz, PODS 09]



 • Properly extends ELH (same data complexity)
Ontology Querying

ELH: Popular DL with PTIME data complexity
[Baader, IJCAI 03 and Rosati, DL 07]



                 ELH TBox              Datalog§ Representation

                   AvB                 8X A(X)  B(X)

                   Au Bv C             8X A(X) ^ B(X)  C(X)

                   A v 9R.B            8X A(X)  9Y R(X,Y) ^ B(Y)

                   9R.A v B            8X8Y R(X,Y) ^ A(Y)  B(X)

                   RvP                 8X8Y R(X,Y)  P(X,Y)
First-Order Rewritable Datalog± Fragments


          Q         



 compilation
               Q
First-Order Rewritable Datalog± Fragments


       Q         




                                       D
First-Order Rewritable Datalog± Fragments


          Q          



 compilation
               Q                  Q*
               FO+
                     translation   SQL



                                         D
First-Order Rewritable Datalog± Fragments


          Q            



 compilation
               Q                     Q*
                                             evaluation
                        translation   SQL



                                              D



                    8D: D [  ² Q , D ² Q*
Linearity

• Just one atom in the body         8X8Y R(X,Y) ! 9Z (X,Z)


                                        guard

• Linear TGDs are trivially guarded


• Linear TGDs are first-order rewritable [Calì, G. & Lukasiewicz, PODS 09]
    Query answering in AC0 data complexity


• Polynomial-size rewriting (SQL DL = NR Datalog) [G. & Schwentick, KR 12]


• Properly extends DL-Lite (same data complexity)
Ontology Querying

DL-Lite: Popular family of DLs with AC0 data complexity (OWL 2 QL)
[Calvanese, De Giacomo, Lembo, Lenzerini & Rosati, JAR 07]



              DL-Lite TBox             Datalog§ Representation


                   AvB                   8X A(X)  B(X)


                   A v 9R                8X A(X)  9Y R(X,Y)


                   9R v A                8X8Y R(X,Y)  A(X)


                   RvP                   8X8Y R(X,Y)  P(X,Y)
Finite Controllability

                                     ?
                  D[²Q              ,   D [  ²fin Q


   • Holds for inclusion dependencies
      [Rosati, PODS 06]



   • Holds for guarded Datalog± (in fact, for the guarded fragment)
      [Bárány, G. & Otto, LICS 10]




   • Different from finite-model property of the guarded fragment:
     If D [  [ Q has a model, then D [  [ Q it has a finite one.
Finite Controllability and Lean UML Class Diagrams

• For each attribute assertion Attribute[ i..j ]:Type: j = 1


                                              C1   mL..mU      A   nL..nU
• For each association A: mU = nU = 1                                       C2



• Disjointness and negative constraints are forbidden


• Most-specific class and domain-type constraints are allowed




           set of guarded TGDs )        finite controllability holds
Datalog§: Overview




   Guarded           Linear




8X8Y R(X,Y,Y) ! 9Z P(X,Z)
Datalog§: Overview




   Guarded           Linear               Sticky-join




8X8Y R(X,Y,Y) ! 9Z P(X,Z)     8X8Y8Z R(X,Y) ^ S(Y,Z) ! 9W P(Y,W)
Datalog§: Overview
                                  DL-Lite



   PTIME-complete                           FO-rewritable

  Guarded           Linear        Sticky       Sticky-join




 ELH
                             IDs + FKDs
              Lean UCDs
Datalog§: Summary of Complexity Results


                     Data            Fixed            Combined

Guarded        PTIME-complete      NP-complete    2EXPTIME-complete



Linear              in AC0         NP-complete     PSPACE-complete



Sticky-join         in AC0         NP-complete     EXPTIME-complete



  Same complexity with negative constraints and non-conflicting EGDs
Datalog§: Next Steps


                      PTIME-complete




   PTIME-complete                             FO-rewritable


   Guarded          Linear                        Sticky-join




                         ?
                             …with disjunction in the head

                             … finite-model reasoning
Thank you!
But…
   • What about joins in rule bodies?
       8A8D8P runs(D,P) ^ area(P,A)  9E employee(E,D,P,A)



   • What about the DL assertion concept product?
         8E8M elephant(E) ^ mouse(M)  biggerThan(E,M)
But…
   • What about joins in rule bodies?
       8A8D8P runs(D,P) ^ area(P,A)  9E employee(E,D,P,A)



   • What about the DL assertion concept product?
         8E8M elephant(E) ^ mouse(M)  biggerThan(E,M)


   No tree-like models guaranteed

   8X8Y R(X,Y)  9Z R(Y,Z)
                                 Infinitely many symbols in S
   8X8Y R(X,Y)  S(X)


   8X8Y S(X) ^ S(Y)  P(X,Y)     P forms an infinite clique
Stickiness



 8A8D8P runs(D,P) ^ area(P,A)  9E employee(E,D,P,A)




  8E8M elephant(E) ^ mouse(M)  biggerThan(E,M)
Stickiness


    8X8Y8Z R(X,Y) ^ P(Y,Z)  9W T(X,Y,W)

     8X8Y8Z T(X,Y,Z)  9W S(Y,W)




    8X8Y8Z R(X,Y) ^ P(Y,Z)  9W T(X,Y,W)

     8X8Y8Z T(X,Y,Z)  9W S(X,W)
Stickiness



• Properly generalize inclusion dependencies


• Backward-resolution terminates


• Query answering is in AC0 in data complexity (first-order rewritability)
   [Calì, G. & Pieris, VLDB 10]



• Properly extends DL-Lite (same data complexity)
Additional Features
   • EGDs, e.g., 8X8Y8Z reports(X,Y) ^ reports(X,Z) ! Y = Z
            Non-Conflicting EGDs: do not interact with TGDs
            Preliminary check without adding complexity


   • Negative constraints, e.g., 8X emp(X) ^ customer(X) ! ?
            Check without adding complexity


    Finite controllability does not hold
    D = {R(a,b)}
                                                 D[²Q
           8X8Y R(X,Y)  9Z R(Y,Z)                  but
    =
           8X8Y8Z R(Y,X) ^ R(Z,X)  Y = Z       D [  ²fin Q

    Q  R(A,a)
Additional Features
   • EGDs, e.g., 8X8Y8Z reports(X,Y) ^ reports(X,Z) ! Y = Z
          Non-Conflicting EGDs: do not interact with TGDs
          Preliminary check without adding complexity


   • Negative constraints, e.g., 8X emp(X) ^ customer(X) ! ?
          Check without adding complexity
Comparison with ER§ Schemata
ER§: Extended ER formalism with AC0 data complexity
[Calì, G. & Pieris, Information Systems 2012]


                                                Lean UML      ER§

IS-A among classes

IS-A among associations

¸n participation                                 n¸0       n 2 {0,1}

·1 participation

Permutation on IS-A

Values of attributes                            complex    atomic
Operations

Attribute re-use
The Chase Procedure

 Input: Database D, set of TGDs 

 Output: A model of D [ 

                          D
                              person(john)


 
       person(P)  9F father(F,P)        father(F,P)  person(F)




     chase(D,) = D [ ?
The Chase Procedure

 Input: Database D, set of TGDs 

 Output: A model of D [ 

                          D
                              person(john)


 
        person(P)  9F father(F,P)       father(F,P)  person(F)




     chase(D,) = D [ {father(z1,john)
The Chase Procedure

 Input: Database D, set of TGDs 

 Output: A model of D [ 

                           D
                               person(john)


 
        person(P)  9F father(F,P)        father(F,P)  person(F)




     chase(D,) = D [ {father(z1,john), person(z1)
The Chase Procedure

 Input: Database D, set of TGDs 

 Output: A model of D [ 

                           D
                               person(john)


 
        person(P)  9F father(F,P)         father(F,P)  person(F)




     chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1)
The Chase Procedure

 Input: Database D, set of TGDs 

 Output: A model of D [ 

                           D
                               person(john)


 
        person(P)  9F father(F,P)         father(F,P)  person(F)




     chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1), …}
The Chase Procedure

 Input: Database D, set of TGDs 

 Output: A model of D [ 

                           D
                               person(john)


 
        person(P)  9F father(F,P)         father(F,P)  person(F)




     chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1), …}

                                                           infinite instance
Query Answering via Chase
             Q    h

                      C = chase(D,)

                            D

                 h1                     h2



                                                        h2(C)
     h1(C)             .    .   .
      M1
                                                         M2


             D[²Q         ,    chase(D,) ² Q
                                [see, e.g., Deutsch, Nash & Remmel, PODS 08]
Bounded Derivation-Depth Property (BDDP)

                           D
Q


                                                    constant depth
                                                       w.r.t. D



                                   P

                      chase(D,)


              chase(D,) ² Q   )   P²Q

                                       [Calì, G. & Lukasiewitcz, PODS 09]
Bounded Derivation-Depth Property (BDDP)

                          D
Q


                                                     constant depth
                                                        w.r.t. D



                                    P

                      chase(D,)


           BDDP   )    First-Order Rewritability

                                        [Calì, G. & Lukasiewitcz, PODS 09]
First-Order Rewritable TGDs

                                                     Q


                            rewriting




                               Q

                  8D: D [  ² Q , D ² Q



    Query answering is in AC0 in data complexity   [Vardi, PODS 95]
OMG UML (Unified Modeling Language)
Standard conceptual modeling tool for software design


                      Competes
                                 0..1                                   Stock
                            Company            Issues
                                                                 Index[0..1]:Str
                     0..1               1..1             1..1
                                 1..1                            getIndex():List
                                                                         0..1
                      Member
                                                                         Owns
                                 2..1
                            Executive                   Person
                                                                 1..1


                                    Class Diagrams




Reasoning over UML models
   Model checking
   Specification recovery
   Software maintenance

Querying UML Class Diagrams - FoSSaCS 2012

  • 1.
    Querying UML ClassDiagrams Georg Gottlob Department of Computer Science University of Oxford joint work with Andrea Calì, Giorgio Orsi and Andreas Pieris
  • 2.
    Major Modeling Formalismsfor Data and Objects Competes person: SSN  Name 0..1 Stock employee[1]  person[1] Company Issues Index[0..1]:Str 0..1 1..1 1..1 getIndex():List Relational Data Dependencies 1..1 0..1 Member Owns 2..1 Executive Person 1..1 UML Class Diagrams XML Schemas student v member m_name since g_name leads¡ v works member (1,1) 1 works 2 (1,N) group student v :professor (1,1) [1,2] Description Logics (1,N) student professor leads 1 2 Context C1 inv: C1.allInstances -> forAll ( x1: C1 | C2.allInstances -> forAll ( x2: C2 | x1=x2 implies x2.oclIsTyeOf(C))) ER Diagrams Object Constraint Language (OCL)
  • 3.
    Datalog± : AUnifying Logical Framework Datalog± Description Logics (DL-Lite, EL,…) Relational Constraints (IDs, FKDs,…) Datalog Conceptual Models (UML, ER,…) … providing: Logical foundations, semantics, decidability and complexity results for reasoning and query-answering, identification of tractable fragments, …
  • 4.
    Datalog§ 8X8Y (X,Y)  (X) • Extend Datalog with additional features such as: • Existential quantification (9): TGDs 8X8Y (X,Y)  9Z (X,Z) • Equality atoms (=): EGDs 8X (X)  Xi = Xj • Constant false (?): Negative constraints 8X (X)  ? • But query answering under Datalog[9] is undecidable [see, e.g., Beeri & Vardi, ICALP 81] • Datalog[9,=,?] is syntactically restricted ! Datalog§
  • 5.
    Restriction: Guardedness • All8-variables occur in one body atom - guard atom 8X8Y8Z R(X,Y,Z) ^ S(Y) ^ P(X,Z)  9W Q(X,W) guard • Models of finite treewidth ) decidability of query answering [Calì, G. & Kifer, KR 08] related to work by [Andréka, Németi & van Benthem] and [Grädel] • Query answering is PTIME-complete in data complexity [Calì, G. & Lukasiewicz, PODS 09]
  • 6.
    Reasoning over UMLClass Diagrams Competes 0..1 Stock Company Issues 0..1 1..1 Index[0..1]:Str 1..1 1..1 getIndex():List 0..1 Member Owns 2..1 Executive Person 1..1 Satisfiability: Does the diagram admit at least one instantiation?
  • 7.
    Reasoning over UMLClass Diagrams • Satisfiability - the diagram has a (possibly infinite) non-empty instantiation • Full Satisfiability - the diagram has a (possibly infinite) instantiation where each class and association is non-empty • Finite Satisfiability - the diagram has a finite instantiation
  • 8.
    Reasoning over UMLClass Diagrams Person {disjoint} Student Worker finitely satisfiable, e.g., {Worker(john),Person(john)} but not fully - student class is necessarily empty
  • 9.
    Complexity of Reasoningover UML Class Diagrams • Satisfiability is EXPTIME-complete [Berardi, Calvanese & De Giacomo, Artificial Intelligence 05] • Full Satisfiability is EXPTIME-complete [Artale, Calvanese & Ibánez-García, ER 10] • Finite Satisfiability is EXPTIME-complete [implicit in Berardi, Calvanese & De Giacomo, Artificial Intelligence 05]
  • 10.
    Querying UML ClassDiagrams Executive(john) Competes Member(john,LU) 0..1 Stock Stock(BAY) Company Issues 0..1 1..1 1..1 Index[0..1]:Str Issues(BA,BAY) 1..1 getIndex():List Owns(john,BAY) 0..1 Competes(LU,BA) Member Owns 2..1 Executive Person 1..1 Which persons have a potential conflict of interest?
  • 11.
    Querying UML ClassDiagrams Executive(john) Competes Member(john,LU) 0..1 Stock Stock(BAY) Company Issues 0..1 1..1 1..1 Index[0..1]:Str Issues(BA,BAY) 1..1 getIndex():List Owns(john,BAY) 0..1 Competes(LU,BA) Member Owns 2..1 Executive Person 1..1 Conflict(P)  Person(P), Company(C1), Company(C2), Stock(S), Owns(P,S), Member(P,C1), Issues(C2,S), Competes(C1,C2)
  • 12.
    Querying UML ClassDiagrams Executive(john) Competes Member(john,LU) 0..1 Stock Stock(BAY) Company Issues 0..1 1..1 1..1 Index[0..1]:Str Issues(BA,BAY) 1..1 getIndex():List Owns(john,BAY) 0..1 Competes(LU,BA) Member Owns 2..1 Executive Person 1..1 Does anybody have a potential conflict of interest?
  • 13.
    Querying UML ClassDiagrams Executive(john) Competes Member(john,LU) 0..1 Stock Stock(BAY) Company Issues 0..1 1..1 1..1 Index[0..1]:Str Issues(BA,BAY) 1..1 getIndex():List Owns(john,BAY) 0..1 Competes(LU,BA) Member Owns 2..1 Executive Person 1..1 Conflict  Person(P), Company(C1), Company(C2), Stock(S), Owns(P,S), Member(P,C1), Issues(C2,S), Competes(C1,C2)
  • 14.
    Querying UML ClassDiagrams Executive(john) Competes Member(john,LU) 0..1 Stock Stock(BAY) Company Issues 0..1 1..1 1..1 Index[0..1]:Str Issues(BA,BAY) 1..1 getIndex():List Owns(john,BAY) 0..1 Competes(LU,BA) Member Person(john) Owns 2..1 Executive Person 1..1 Conflict  Person(P), Company(C1), Company(C2), Stock(S), Owns(P,S), Member(P,C1), Issues(C2,S), Competes(C1,C2)
  • 15.
    Querying UML ClassDiagrams Executive(john) Competes Member(john,LU) 0..1 Stock Stock(BAY) Company Issues 0..1 1..1 1..1 Index[0..1]:Str Issues(BA,BAY) 1..1 getIndex():List Owns(john,BAY) 0..1 Competes(LU,BA) Member Person(john) Owns 2..1 Company(LU) Executive Person Company(BA) 1..1 Conflict  Person(P), Company(C1), Company(C2), Stock(S), Owns(P,S), Member(P,C1), Issues(C2,S), Competes(C1,C2)
  • 16.
    Querying UML ClassDiagrams Executive(john) Competes Member(john,LU) 0..1 Stock Stock(BAY) Company Issues 0..1 1..1 1..1 Index[0..1]:Str Issues(BA,BAY) 1..1 getIndex():List Owns(john,BAY) 0..1 Competes(LU,BA) Member Person(john) Owns 2..1 Company(LU) Executive Person Company(BA) 1..1 {P ! john, C1 ! LU, C2 ! BA, S ! BAY} Conflict  Person(P), Company(C1), Company(C2), Stock(S), Owns(P,S), Member(P,C1), Issues(C2,S), Competes(C1,C2)
  • 17.
    Querying UML ClassDiagrams: Existential Rules Group(DB) Member 3..1 WorksIn 1..1 {disjoint} Group 0..1 Student Professor 1..1 Leads CLeads since: Date Is there a professor who works in the database group?
  • 18.
    Querying UML ClassDiagrams: Existential Rules Group(DB) Member 3..1 WorksIn 1..1 {disjoint} Group 0..1 Student Professor 1..1 Leads CLeads since: Date Ans  Professor(P), WorksIn(P,DB)
  • 19.
    Querying UML ClassDiagrams: Existential Rules Group(DB) Leads(z1,DB) Member 3..1 WorksIn Professor(z1) 1..1 {disjoint} Group R*(z1,DB,z2) CLeads(z2) 0..1 Student Professor 1..1 Leads CLeads since: Date Ans  Professor(P), WorksIn(P,DB)
  • 20.
    Querying UML ClassDiagrams: Existential Rules Group(DB) Leads(z1,DB) Member 3..1 WorksIn Professor(z1) 1..1 {disjoint} Group R*(z1,DB,z2) CLeads(z2) 0..1 WorksIn(z1,DB) Student Professor 1..1 Leads CLeads since: Date Ans  Professor(P), WorksIn(P,DB)
  • 21.
    Querying UML ClassDiagrams: Existential Rules Group(DB) Leads(z1,DB) Member 3..1 WorksIn Professor(z1) 1..1 {disjoint} Group R*(z1,DB,z2) CLeads(z2) 0..1 WorksIn(z1,DB) Student Professor 1..1 Leads … CLeads since: Date {P ! z1, DB ! DB} Ans  Professor(P), WorksIn(P,DB)
  • 22.
    Querying UML ClassDiagrams D  Q D[²Q , 8M (M ² D [  ! M ² Q)
  • 23.
    Querying UML ClassDiagrams D  Q D[²Q , 8M (M ² D [  ! M ² Q) M¶D Æ M²
  • 24.
    From Diagrams toFirst-Order Logic (Datalog§) Competes 0..1 Stock Company Issues 0..1 1..1 Index[0..1]:Str 1..1 1..1 getIndex():List 0..1 Member Owns 2..1 Executive Person 1..1
  • 25.
    From Diagrams toFirst-Order Logic (Datalog§) Competes 0..1 Stock Company Issues 0..1 1..1 Index[0..1]:Str 1..1 1..1 getIndex():List 0..1 Member Owns 2..1 Executive Person 1..1 8X8Y Member(X,Y)  Company(X) ^ Executive(Y) 8X Company(X)  9Y9Z Member(X,Y) ^ Member(X,Z) ^ Y ≠ Z 8X Executive(X)  9Y Member(Y,X) [Satoh & Kaneiwa, TCS 10 and Berardi et al., AI 05]
  • 26.
    From Diagrams toFirst-Order Logic (Datalog§) Competes 0..1 Stock Company Issues 0..1 1..1 Index[0..1]:Str 1..1 1..1 getIndex():List 0..1 Member Owns 2..1 Executive Person 1..1 8X Company(X)  9Y Issues(X,Y) 8X Stock(X)  9Y Issues(Y,X) 8X8Y8Z Stock(X) ^ Issues(Y,X) ^ Issues(Z,X)  Y = Z 8X8Y Stock(X) ^ Index(X,Y)  Str(Y) [Satoh & Kaneiwa, TCS 10 8X8Y Stock(X) ^ getIndex(X,Y)  List(Y) and Berardi et al., AI 05]
  • 27.
    From Diagrams toFirst-Order Logic (Datalog§) Competes 0..1 Stock Company Issues 0..1 1..1 Index[0..1]:Str 1..1 1..1 getIndex():List 0..1 Member Owns 2..1 Executive Person 1..1 8X Executive(X)  Person(X) [Satoh & Kaneiwa, TCS 10 and Berardi et al., AI 05]
  • 28.
    Complexity of QueryAnswering • EXPTIME-complete in combined complexity (everything is part of the input) [implicit in Berardi et al., AI 05 and Lutz, IJCAR 08] • coNP-complete in data complexity (the diagram and the query are fixed) [implicit in Ortiz, Calvanese & Eiter, AAAI 06] • Undecidable when diagrams are combined with arbitrary OCL (Object Constraint Language) constraints [folklore]
  • 29.
    Research Challenge: ReduceHigh Complexity • Diagrams often have very large instantiations • Some applications require very large diagrams • OCL constraints, that are not expressible diagrammatically, lead to undecidability or high complexity
  • 30.
    Our Goals • Restrict UML class diagrams to achieve tractability of query answering in data complexity • Better understanding of combined complexity • Add relevant OCL constraints without losing tractability of query answering in data complexity
  • 31.
    Lean UML ClassDiagrams • For each attribute assertion Attribute[ i..j ]:Type j 2 {1,1}
  • 32.
    Lean UML ClassDiagrams • For each attribute assertion Attribute[ i..j ]:Type j 2 {1,1} mL..mU A nL..nU • For each association A: C1 C2 - upper bounds mU ,nU 2 {1,1}, - if A generalizes some other association, then mU = nU = 1
  • 33.
    Lean UML ClassDiagrams • For each attribute assertion Attribute[ i..j ]:Type j 2 {1,1} mL..mU A nL..nU • For each association A: C1 C2 - upper bounds mU ,nU 2 {1,1}, - if A generalizes some other association, then mU = nU = 1 C • Completeness constraints are forbidden {complete} 8X C(X)  C1(X) _ C2(X) C1 C2
  • 34.
    Lean UML ClassDiagrams • For each attribute assertion Attribute[ i..j ]:Type j 2 {1,1} mL..mU A nL..nU • For each association A: C1 C2 - upper bounds mU ,nU 2 {1,1}, - if A generalizes some other association, then mU = nU = 1 C • Completeness constraints are forbidden {complete} 8X C(X)  C1(X) _ C2(X) C1 C2
  • 35.
    Lean UML ClassDiagrams: Example Competes 0..1 Stock Company Issues 0..1 1..1 Index[0..1]:Str 1..1 1..1 getIndex():List 0..1 Member Owns 2..1 Executive Person 1..1
  • 36.
    Lean UML ClassDiagrams: Example 3..1 WorksIn Member 1..1 {disjoint} Group 0..1 Student Professor 1..1 Leads CLeads since: Date
  • 37.
    Add Some Non-DiagrammaticConstraints (OCL) C C1 C2 8X C2(X) ^ C3(X)  ? C3 disjoint classes We need negative constraints of the form 8X C1(X) ^ … ^ Cn(X)  ?
  • 38.
    Add Some Non-DiagrammaticConstraints (OCL) C1 C2 C 8X C1(X) ^ C2(X)  C(X) most-specific class We need most-specific class constraints of the form 8X C1(X) ^ … ^ Cn(X)  C(X)
  • 39.
    Add Some Non-DiagrammaticConstraints (OCL) Student Enrolled[0..1]:Course type the domain of Enrolled CS-Student Enrolled[0..1]:CS-Course 8X8Y CS-Course(X) ^ Enrolled(Y,X)  CS-Student(Y) We need domain-type constraints of the form 8X8Y C(X) ^ Attr(Y,X)  T(Y)
  • 40.
    Add Some Non-DiagrammaticConstraints (OCL) Student Enrolled[0..1]:Course type the domain of Enrolled CS-Student Enrolled[0..1]:CS-Course 8X8Y CS-Course(X) ^ Enrolled(Y,X)  CS-Student(Y) We need domain-type constraints of the form 8X8Y C(X) ^ Attr(Y,X)  T(Y) pullback rule
  • 41.
    Add Some Non-DiagrammaticConstraints (OCL) Student Enrolled[0..1]:Course type the domain of Enrolled CS-Student Enrolled[0..1]:CS-Course 8X8Y CS-Course(X) ^ Enrolled(Y,X)  CS-Student(Y) We need domain-type constraints of the form 8X8Y C(X) ^ Attr(Y,X)  T(Y) pullback rule will make a difference!
  • 42.
    OCL (Object ConstraintLanguage) Context C1 inv: C1.allInstances -> forAll ( x1: C1 | 8X C1(X) ^ C2(X)  ? C2.allInstances -> forAll ( x2: C2 | x1<>x2 ) ) Context C1 inv: C1.allInstances -> forAll ( x1: C1 | C2.allInstances -> forAll ( x2: C2 | 8X C1(X) ^ C2(X)  C(X) x1=x2 implies x2.oclIsTypeOf(C) ) ) Context Object inv: 8X8Y C(X) ^ a(Y,X)  T(Y) Object.allInstances -> forAll ( y: Object | y.a.oclIsTypeOf(C) implies y.oclTypeOf(T) )
  • 43.
    Lean UML ClassDiagrams as Rules 8X8Y C(X) ^ Attr(X,Y)  T(Y) 8X C(X)  9Y1…9Yn Attr(X,Y1) ^ … ^ Attr(X,Yn) 1 ≤ i < j ≤ n Yi ≠ Yj 8X8Y8Z C(X) ^ Attr(X,Y) ^ Attr(X,Z)  Y = Z 8X8Y1…8Yn8Z C(X) ^ Op(X,Y1, …,Yn,Z)  T1(Y1) ^ … ^ Tn(Yn) ^ T(Z) 8X8Y1…8Yn8Z18Z2 C(X) ^ Op(X,Y1, …,Yn,Z1) ^ Op(X,Y1, …,Yn,Z2)  Z1 = Z2 8X C1(X)  C2(X) 8X C1(X) ^ … ^ Cn(X)  ? 8X1…8Xn A(X1,…,Xn)  C1(X1) ^ … ^ Cn(Xn) 8X18X2 A(X1,X2)  C1(X1) ^ C2(X2) 8X1…8Xn8Y A(X1,…,Xn) ^ R*(X1,…,Xn,Y)  CA(Y) 8X18X28Y A(X1,X2) ^ R*(X1,X2,Y)  CA(Y) 8X1…8Xn A(X1,…,Xn)  9Y R*(X1,…,Xn,Y) 8X18X2 A(X1,X2)  9Y R*(X1,X2,Y) 8X1…8Xn8Y8Z A(X1,…,Xn) ^ R*(X1,…,Xn,Y) ^ R*(X1,…,Xn,Z)  Y = Z 8X1…8Xn8Y1…8Yn8Z R*(X1,…,Xn,Z) ^ R*(Y1,…,Yn,Z) ^ CA(Z)  X1 = Y1 ^ … ^ Xn = Yn 8X C(X)  9Y1…9Yn A(X,Y1) ^ … ^ A(X,Yn) 1 ≤ i < j ≤ n Yi ≠ Yj 8X8Y8Z C(X) ^ A(X,Y) ^ A(X,Z)  Y = Z 8X C(X)  9Y1…9Yn A(Y1,X) ^ … ^ A(Yn,X) 1 ≤ i < j ≤ n Yi ≠ Yj 8X8Y8Z C(X) ^ A(Y,X) ^ A(Z,X)  Y = Z 8X1…8Xn A1(X1,…,Xn)  A2(X1,…, Xn) 8X18X2 A1(X1,X2)  A2(X1,X2)
  • 44.
    Lean UML ClassDiagrams as Rules 8X8Y C(X) ^ Attr(X,Y)  T(Y) 8X C(X)  9Y1…9Yn Attr(X,Y1) ^ … ^ Attr(X,Yn) 1 ≤ i < j ≤ n Yi ≠ Yj pre-process 8X8Y8Z C(X) ^ Attr(X,Y) ^ Attr(X,Z)  Y = Z 8X8Y1…8Yn8Z C(X) ^ Op(X,Y1, …,Yn,Z)  T1(Y1) ^ … ^ Tn(Yn) ^ T(Z) 8X8Y1…8Yn8Z18Z2 C(X) ^ Op(X,Y1, …,Yn,Z1) ^ Op(X,Y1, …,Yn,Z2)  Z1 = Z2 check via query 8X C1(X)  C2(X) 8X C1(X) ^ … ^ Cn(X)  ? 8X1…8Xn A(X1,…,Xn)  C1(X1) ^ … ^ Cn(Xn) 8X18X2 A(X1,X2)  C1(X1) ^ C2(X2) 8X1…8Xn8Y A(X1,…,Xn) ^ R*(X1,…,Xn,Y)  CA(Y) 8X18X28Y A(X1,X2) ^ R*(X1,X2,Y)  CA(Y) 8X1…8Xn A(X1,…,Xn)  9Y R*(X1,…,Xn,Y) 8X18X2 A(X1,X2)  9Y R*(X1,X2,Y) 8X1…8Xn8Y8Z A(X1,…,Xn) ^ R*(X1,…,Xn,Y) ^ R*(X1,…,Xn,Z)  Y = Z 8X1…8Xn8Y1…8Yn8Z R*(X1,…,Xn,Z) ^ R*(Y1,…,Yn,Z) ^ CA(Z)  X1 = Y1 ^ … ^ Xn = Yn 8X C(X)  9Y1…9Yn A(X,Y1) ^ … ^ A(X,Yn) 1 ≤ i < j ≤ n Yi ≠ Yj 8X8Y8Z C(X) ^ A(X,Y) ^ A(X,Z)  Y = Z 8X C(X)  9Y1…9Yn A(Y1,X) ^ … ^ A(Yn,X) 1 ≤ i < j ≤ n Yi ≠ Yj 8X8Y8Z C(X) ^ A(Y,X) ^ A(Z,X)  Y = Z 8X1…8Xn A1(X1,…,Xn)  A2(X1,…, Xn) 8X18X2 A1(X1,X2)  A2(X1,X2)
  • 45.
    Lean UML ClassDiagrams as Rules 8X8Y C(X) ^ Attr(X,Y)  T(Y) 8X C(X)  9Y1…9Yn Attr(X,Y1) ^ … ^ Attr(X,Yn) classes 8X C1(X)  C2(X) 8X18X2 A(X1,X2)  C1(X1) ^ C2 (X2) 8X18X28Y A(X1,X2) ^ R*(X1,X2,Y)  CA(Y) 8X18X2 A(X1,X2)  9Y R*(X1,X2,Y) associations 8X C(X)  9Y1…9Yn A(X,Y1) ^ … ^ A(X,Yn) 8X C(X)  9Y1…9Yn A(Y1,X) ^ … ^ A(Yn,X) 8X18X2 A1(X1,X2)  A2(X1,X2) 8X C1(X) ^ … ^ Cn(X)  C(X) additional constraints 8X8Y C(X) ^ Attr(Y,X)  T(Y)
  • 46.
    Lean UML ClassDiagrams as Rules 8X8Y C(X) ^ Attr(X,Y)  T(Y) 8X C(X)  9Y1…9Yn Attr(X,Y1) ^ … ^ Attr(X,Yn) 8X C1(X)  C2(X) 8X18X2 A(X1,X2)  C1(X1) ^ C2 (X2) 8X18X28Y A(X1,X2) ^ R*(X1,X2,Y)  CA(Y) 8X18X2 A(X1,X2)  9Y R*(X1,X2,Y) guard atoms 8X C(X)  9Y1…9Yn A(X,Y1) ^ … ^ A(X,Yn) 8X C(X)  9Y1…9Yn A(Y1,X) ^ … ^ A(Yn,X) 8X18X2 A1(X1,X2)  A2(X1,X2) 8X C1(X) ^ … ^ Cn(X)  C(X) 8X8Y C(X) ^ Attr(Y,X)  T(Y)
  • 47.
    Data Complexity ofQuery Answering Theorem: Query answering under Lean UML class diagrams + negative & most-specific class & domain-type constraints is PTIME-complete
  • 48.
    Data Complexity ofQuery Answering Theorem: Query answering under Lean UML class diagrams + negative & most-specific class & domain-type constraints is PTIME-complete Proof: • in PTIME: reduction to assertions 8X8Y body(X,Y)  9Z head(X,Z), where body(X,Y) has a guard-atom [Calì, G. & Lukasiewicz, PODS 09] • PTIME-hardness (even without domain-type constraints): reduction from Path System Accessibility
  • 49.
    Combined Complexity ofQuery Answering Theorem: Query answering under Lean UML class diagrams + negative & most-specific class constraints is PSPACE-complete
  • 50.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D 8X8Y C(X) ^ A(X,Y)  T(Y) C(x) 8X8Y A(X,Y)  C1(X) ^ C2(Y) polynomial depth Relevant part: 8X8Y A(X,Y)  B(X,Y) A(y,z) B(y,z) T(z) R(a,v) S(v,w)
  • 51.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D 8X8Y C(X) ^ A(X,Y)  T(Y) C(x) 8X8Y A(X,Y)  C1(X) ^ C2(Y) polynomial depth Relevant part: 8X8Y A(X,Y)  B(X,Y) A(y,z) B(y,z) T(z) R(a,v) S(v,w) Q= R(X1,X2) ^ S(X2,X3) ^ B(X4,X5) ^ T(X5)
  • 52.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model (Chase) D 8X8Y C(X) ^ A(X,Y)  T(Y) C(x) 8X8Y A(X,Y)  C1(X) ^ C2(Y) polynomial depth Relevant part: 8X8Y A(X,Y)  B(X,Y) A(y,z) B(y,z) T(z) R(a,v) S(v,w) Q= R(X1,X2) ^ S(X2,X3) ^ B(X4,X5) ^ T(X5)
  • 53.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D 8X8Y C(X) ^ A(X,Y)  T(Y) C(x) 8X8Y A(X,Y)  C1(X) ^ C2(Y) 8X8Y A(X,Y)  B(X,Y) A(y,z) B(y,z) T(z) R(a,v) S(v,w) Q= R(X1,X2) ^ S(X2,X3) ^ B(X4,X5) ^ T(X5) With each current atom A we need to compute and memorize its type(A). This is in NP
  • 54.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D 8X8Y C(X) ^ A(X,Y)  T(Y) C(x) 8X8Y A(X,Y)  C1(X) ^ C2(Y) 8X8Y A(X,Y)  B(X,Y) A(y,z) B(y,z) T(z) R(a,v) S(v,w) Q= R(X1,X2) ^ S(X2,X3) ^ B(X4,X5) ^ T(X5) With each current atom A we need to compute and memorize its type(A). This is in NP
  • 55.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D C(x) A(y,z) B(y,z) T(z) R(a,v) S(v,w) 8X8Y C(X) ^ A(X,Y)  T(Y) With each current atom A we need to compute 8X8Y A(X,Y)  C1(X) ^ C2(Y) and memorize its type(A). This is in NP 8X8Y A(X,Y)  B(X,Y)
  • 56.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D C(x) A(y,z) B(y,z) T(z) R(a,v) S(v,w) 8X8Y C(X) ^ A(X,Y)  T(Y) With each current atom A we need to compute 8X8Y A(X,Y)  C1(X) ^ C2(Y) and memorize its type(A). This is in NP 8X8Y A(X,Y)  B(X,Y)
  • 57.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D C(x) A(y,z) B(y,z) T(z) R(a,v) S(v,w) 8X8Y C(X) ^ A(X,Y)  T(Y) With each current atom A we need to compute 8X8Y A(X,Y)  C1(X) ^ C2(Y) and memorize its type(A). This is in NP 8X8Y A(X,Y)  B(X,Y)
  • 58.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D C(x) A(y,z) B(y,z) T(z) R(a,v) S(v,w) 8X8Y C(X) ^ A(X,Y)  T(Y) With each current atom A we need to compute 8X8Y A(X,Y)  C1(X) ^ C2(Y) and memorize its type(A). This is in NP 8X8Y A(X,Y)  B(X,Y)
  • 59.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D C(x) A(y,z) B(y,z) T(z) R(a,v) S(v,w) 8X8Y C(X) ^ A(X,Y)  T(Y) With each current atom A we need to compute 8X8Y A(X,Y)  C1(X) ^ C2(Y) and memorize its type(A). This is in NP 8X8Y A(X,Y)  B(X,Y)
  • 60.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D C(x) A(y,z) B(y,z) T(z) R(a,v) S(v,w) 8X8Y C(X) ^ A(X,Y)  T(Y) With each current atom A we need to compute 8X8Y A(X,Y)  C1(X) ^ C2(Y) and memorize its type(A). This is in NP 8X8Y A(X,Y)  B(X,Y)
  • 61.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D C(x) A(y,z) B(y,z) T(z) R(a,v) S(v,w) 8X8Y C(X) ^ A(X,Y)  T(Y) With each current atom A we need to compute 8X8Y A(X,Y)  C1(X) ^ C2(Y) and memorize its type(A). This is in NP 8X8Y A(X,Y)  B(X,Y)
  • 62.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D C(x) A(y,z) B(y,z) T(z) R(a,v) S(v,w) 8X8Y C(X) ^ A(X,Y)  T(Y) With each current atom A we need to compute 8X8Y A(X,Y)  C1(X) ^ C2(Y) and memorize its type(A). This is in NP 8X8Y A(X,Y)  B(X,Y)
  • 63.
    Combined Complexity ofQuery Answering Proof: • in PSPACE: there exists a tree-like universal model D C(x) A(y,z) B(y,z) T(z) R(a,v) S(v,w) 8X8Y C(X) ^ A(X,Y)  T(Y) Q= R(X1,X2) ^ S(X2,X3) ^ B(X4,X5) ^ T(X5) With each current atom A we need to compute 8X8Y A(X,Y)  C1(X) ^ C2(Y) and memorize its type(A). This is in NP 8X8Y A(X,Y)  B(X,Y)
  • 64.
    Combined Complexity ofQuery Answering Proof: • PSPACE-hardness: simulation of the computation of a PSPACE Turing machine on input I = α0α1...αn-1, assuming that it uses m = nk cells • Initialization rules 8X initial(X)  initial-state(X) 8X initial(X)  cell0[α0,1](X) 8X initial(X)  celli[αi,0](X), for each i 2 {1,…,n-1} 8X initial(X)  celli[0,0](X), for each i 2 {n,…,m-1} initial-state cell0[α0,1] cell1[α1,0] … celln-1[αn-1,0] celln[0,0] … cellm-1[0,0] initial
  • 65.
    Combined Complexity ofQuery Answering Proof: • PSPACE-hardness: simulation of the computation of a PSPACE Turing machine on input I = α0α1...αn-1, assuming that it uses m = nk cells • Configuration generation rules config 8X initial(X)  config(X) succ[1..1]:config 8X config(X)  9Y succ(X,Y) 8X8Y config(X) ^ succ(X,Y)  config(Y) initial
  • 66.
    Combined Complexity ofQuery Answering Proof: • PSPACE-hardness: simulation of the computation of a PSPACE Turing machine on input I = α0α1...αn-1, assuming that it uses m = nk cells • Rules to describe the transition from one configuration to another, e.g., state transition rules for each δ(hs1,α1i) = hs2,α2,di: 8X8Y s1-celli [α1,1](X) ^ succ(X,Y)  state-s2(Y), for each i 2 {0,…,m-1} in configuration X, which has state s1, the i-th cell contains α1, and the cursor is over cell i s1-celli [α1,1] succ[0..1]:state-s2
  • 67.
    Combined Complexity ofQuery Answering Proof: • PSPACE-hardness: simulation of the computation of a PSPACE Turing machine on input I = α0α1...αn-1, assuming that it uses m = nk cells • Acceptance rule 8X accept-state(X)  accept(X) accept accept-state • Initial database D = {initial(c)} - c is the initial configuration • Boolean CQ Q= accept(X) • Turing machine accepts I iff D [  ² Q
  • 68.
    Combined Complexity ofQuery Answering Theorem: Query answering under Lean UML class diagrams + negative & most-specific class & domain-type constraints is EXPTIME-complete
  • 69.
    Combined Complexity ofQuery Answering Theorem: Query answering under Lean UML class diagrams + negative & most-specific class & domain-type constraints is EXPTIME-complete Proof: • in EXPTIME: reduction to assertions 8X8Y body(X,Y)  9Z head(X,Z), where body(X,Y) has a guard-atom, and all the predicates are of bounded arity [Calì, G. & Kifer, KR 08]
  • 70.
    Combined Complexity ofQuery Answering Proof: • EXPTIME-hardness: simulation of an alternating PSPACE Turing machine • Acceptance rules - q 2 {9,} and i 2 {1,2}: 8X q-accept-state(X)  accept(X) accept q-accept-state 8X8Y accept(X) ^ succ1(Y,X)  accept1(Y) domain-type constraints pullback rules 8X8Y accept(X) ^ succ2(Y,X)  accept2(Y) 8X 9-state(X) ^ accept1(X)  accept(X) most-specific class 8X 9-state(X) ^ accept2(X)  accept(X) constraints 8X -state(X) ^ accept1(X) ^ accept2(X)  accept(X)
  • 71.
    Further Restrictions Student Enrolled [0..1]:Course CS-Student Enrolled [0..1]:CS-Course
  • 72.
    Further Restrictions Student Enrolled [0..1]:Course different classes have disjoint sets of attributes and operations CS-Student Enrolled [0..1]:CS-Course instead of 8X8Y Student(X) ^ Enrolled(X,Y)  Course(Y) we have 8X8Y Student-Enrolled(X,Y)  Course(Y) Data complexity in AC0 and combined complexity NP-complete
  • 73.
    Complexity of QueryingUML Class Diagrams UML Additional Data Combined Formalism Constraints Complexity Complexity Full none coNP-complete EXPTIME-complete + negative Lean + specific-class PTIME-complete EXPTIME-complete + domain-type + negative Lean PTIME-complete PSPACE-complete + specific-class Restricted + negative in AC0 NP-complete Lean + specific-class
  • 74.
    Datalog± : AUnifying Logical Framework Datalog± Description Logics (DL-Lite, EL,…) Relational Constraints (IDs, FKDs,…) Datalog Conceptual Models (UML, ER,…) … without losing tractable data complexity
  • 75.
    Datalog± : AUnifying Logical Framework Datalog± Description Logics (DL-Lite, EL,…) Relational Constraints (IDs, FKDs,…) Datalog Conceptual Models (UML, ER,…) … without losing tractable data complexity
  • 76.
    Ontological Reasoning andDatalog DL Assertion Datalog Rule Concept Inclusion emp v person emp(X)  person(X) Concept Product sen-emp £ emp v moreThan sen-emp(X) ^ emp(Y)  moreThan(X,Y) (Inverse) Role Inclusion reports¡ v mgr reports(X,Y)  mgr(Y,X) Role Transitivity trans(mgr) mgr(X,Y) ^ mgr(Y,Z)  mgr(X,Z)
  • 77.
    Ontological Reasoning andDatalog DL Assertion Datalog Rule Concept Inclusion emp v person emp(X)  person(X) Concept Product sen-emp £ emp v moreThan sen-emp(X) ^ emp(Y)  moreThan(X,Y) (Inverse) Role Inclusion reports¡ v mgr reports(X,Y)  mgr(Y,X) Role Transitivity trans(mgr) mgr(X,Y) ^ mgr(Y,Z)  mgr(X,Z) Participation emp v 9report emp(X)  9Y report(X,Y) Disjointness emp u customer v ? emp(X) ^ customer(X)  ? Functionality funct(reports) reports(X,Y) ^ reports(X,Z)  Y = Z
  • 78.
    Guardedness • All8-variables occur in one body atom - guard atom 8X8Y8Z R(X,Y,Z) ^ S(Y) ^ P(X,Z)  9W Q(X,W) guard • Models of finite treewidth ) decidability of query answering [Calì, G. & Kifer, KR 08] • Query answering is PTIME-complete in data complexity [Calì, G. & Lukasiewicz, PODS 09] • Properly extends ELH (same data complexity)
  • 79.
    Ontology Querying ELH: PopularDL with PTIME data complexity [Baader, IJCAI 03 and Rosati, DL 07] ELH TBox Datalog§ Representation AvB 8X A(X)  B(X) Au Bv C 8X A(X) ^ B(X)  C(X) A v 9R.B 8X A(X)  9Y R(X,Y) ^ B(Y) 9R.A v B 8X8Y R(X,Y) ^ A(Y)  B(X) RvP 8X8Y R(X,Y)  P(X,Y)
  • 80.
    First-Order Rewritable Datalog±Fragments Q  compilation Q
  • 81.
  • 82.
    First-Order Rewritable Datalog±Fragments Q  compilation Q Q* FO+ translation SQL D
  • 83.
    First-Order Rewritable Datalog±Fragments Q  compilation Q Q* evaluation translation SQL D 8D: D [  ² Q , D ² Q*
  • 84.
    Linearity • Just oneatom in the body 8X8Y R(X,Y) ! 9Z (X,Z) guard • Linear TGDs are trivially guarded • Linear TGDs are first-order rewritable [Calì, G. & Lukasiewicz, PODS 09]  Query answering in AC0 data complexity • Polynomial-size rewriting (SQL DL = NR Datalog) [G. & Schwentick, KR 12] • Properly extends DL-Lite (same data complexity)
  • 85.
    Ontology Querying DL-Lite: Popularfamily of DLs with AC0 data complexity (OWL 2 QL) [Calvanese, De Giacomo, Lembo, Lenzerini & Rosati, JAR 07] DL-Lite TBox Datalog§ Representation AvB 8X A(X)  B(X) A v 9R 8X A(X)  9Y R(X,Y) 9R v A 8X8Y R(X,Y)  A(X) RvP 8X8Y R(X,Y)  P(X,Y)
  • 86.
    Finite Controllability ? D[²Q , D [  ²fin Q • Holds for inclusion dependencies [Rosati, PODS 06] • Holds for guarded Datalog± (in fact, for the guarded fragment) [Bárány, G. & Otto, LICS 10] • Different from finite-model property of the guarded fragment: If D [  [ Q has a model, then D [  [ Q it has a finite one.
  • 87.
    Finite Controllability andLean UML Class Diagrams • For each attribute assertion Attribute[ i..j ]:Type: j = 1 C1 mL..mU A nL..nU • For each association A: mU = nU = 1 C2 • Disjointness and negative constraints are forbidden • Most-specific class and domain-type constraints are allowed set of guarded TGDs ) finite controllability holds
  • 88.
    Datalog§: Overview Guarded Linear 8X8Y R(X,Y,Y) ! 9Z P(X,Z)
  • 89.
    Datalog§: Overview Guarded Linear Sticky-join 8X8Y R(X,Y,Y) ! 9Z P(X,Z) 8X8Y8Z R(X,Y) ^ S(Y,Z) ! 9W P(Y,W)
  • 90.
    Datalog§: Overview DL-Lite PTIME-complete FO-rewritable Guarded Linear Sticky Sticky-join ELH IDs + FKDs Lean UCDs
  • 91.
    Datalog§: Summary ofComplexity Results Data Fixed  Combined Guarded PTIME-complete NP-complete 2EXPTIME-complete Linear in AC0 NP-complete PSPACE-complete Sticky-join in AC0 NP-complete EXPTIME-complete Same complexity with negative constraints and non-conflicting EGDs
  • 92.
    Datalog§: Next Steps PTIME-complete PTIME-complete FO-rewritable Guarded Linear Sticky-join ? …with disjunction in the head … finite-model reasoning
  • 93.
  • 94.
    But… • What about joins in rule bodies? 8A8D8P runs(D,P) ^ area(P,A)  9E employee(E,D,P,A) • What about the DL assertion concept product? 8E8M elephant(E) ^ mouse(M)  biggerThan(E,M)
  • 95.
    But… • What about joins in rule bodies? 8A8D8P runs(D,P) ^ area(P,A)  9E employee(E,D,P,A) • What about the DL assertion concept product? 8E8M elephant(E) ^ mouse(M)  biggerThan(E,M) No tree-like models guaranteed 8X8Y R(X,Y)  9Z R(Y,Z) Infinitely many symbols in S 8X8Y R(X,Y)  S(X) 8X8Y S(X) ^ S(Y)  P(X,Y) P forms an infinite clique
  • 96.
    Stickiness 8A8D8P runs(D,P)^ area(P,A)  9E employee(E,D,P,A) 8E8M elephant(E) ^ mouse(M)  biggerThan(E,M)
  • 97.
    Stickiness 8X8Y8Z R(X,Y) ^ P(Y,Z)  9W T(X,Y,W) 8X8Y8Z T(X,Y,Z)  9W S(Y,W) 8X8Y8Z R(X,Y) ^ P(Y,Z)  9W T(X,Y,W) 8X8Y8Z T(X,Y,Z)  9W S(X,W)
  • 98.
    Stickiness • Properly generalizeinclusion dependencies • Backward-resolution terminates • Query answering is in AC0 in data complexity (first-order rewritability) [Calì, G. & Pieris, VLDB 10] • Properly extends DL-Lite (same data complexity)
  • 99.
    Additional Features • EGDs, e.g., 8X8Y8Z reports(X,Y) ^ reports(X,Z) ! Y = Z Non-Conflicting EGDs: do not interact with TGDs Preliminary check without adding complexity • Negative constraints, e.g., 8X emp(X) ^ customer(X) ! ? Check without adding complexity Finite controllability does not hold D = {R(a,b)} D[²Q 8X8Y R(X,Y)  9Z R(Y,Z) but = 8X8Y8Z R(Y,X) ^ R(Z,X)  Y = Z D [  ²fin Q Q  R(A,a)
  • 100.
    Additional Features • EGDs, e.g., 8X8Y8Z reports(X,Y) ^ reports(X,Z) ! Y = Z Non-Conflicting EGDs: do not interact with TGDs Preliminary check without adding complexity • Negative constraints, e.g., 8X emp(X) ^ customer(X) ! ? Check without adding complexity
  • 101.
    Comparison with ER§Schemata ER§: Extended ER formalism with AC0 data complexity [Calì, G. & Pieris, Information Systems 2012] Lean UML ER§ IS-A among classes IS-A among associations ¸n participation n¸0 n 2 {0,1} ·1 participation Permutation on IS-A Values of attributes complex atomic Operations Attribute re-use
  • 102.
    The Chase Procedure Input: Database D, set of TGDs  Output: A model of D [  D person(john)  person(P)  9F father(F,P) father(F,P)  person(F) chase(D,) = D [ ?
  • 103.
    The Chase Procedure Input: Database D, set of TGDs  Output: A model of D [  D person(john)  person(P)  9F father(F,P) father(F,P)  person(F) chase(D,) = D [ {father(z1,john)
  • 104.
    The Chase Procedure Input: Database D, set of TGDs  Output: A model of D [  D person(john)  person(P)  9F father(F,P) father(F,P)  person(F) chase(D,) = D [ {father(z1,john), person(z1)
  • 105.
    The Chase Procedure Input: Database D, set of TGDs  Output: A model of D [  D person(john)  person(P)  9F father(F,P) father(F,P)  person(F) chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1)
  • 106.
    The Chase Procedure Input: Database D, set of TGDs  Output: A model of D [  D person(john)  person(P)  9F father(F,P) father(F,P)  person(F) chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1), …}
  • 107.
    The Chase Procedure Input: Database D, set of TGDs  Output: A model of D [  D person(john)  person(P)  9F father(F,P) father(F,P)  person(F) chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1), …} infinite instance
  • 108.
    Query Answering viaChase Q h C = chase(D,) D h1 h2 h2(C) h1(C) . . . M1 M2 D[²Q , chase(D,) ² Q [see, e.g., Deutsch, Nash & Remmel, PODS 08]
  • 109.
    Bounded Derivation-Depth Property(BDDP) D Q constant depth w.r.t. D P chase(D,) chase(D,) ² Q ) P²Q [Calì, G. & Lukasiewitcz, PODS 09]
  • 110.
    Bounded Derivation-Depth Property(BDDP) D Q constant depth w.r.t. D P chase(D,) BDDP ) First-Order Rewritability [Calì, G. & Lukasiewitcz, PODS 09]
  • 111.
    First-Order Rewritable TGDs  Q rewriting Q 8D: D [  ² Q , D ² Q Query answering is in AC0 in data complexity [Vardi, PODS 95]
  • 112.
    OMG UML (UnifiedModeling Language) Standard conceptual modeling tool for software design Competes 0..1 Stock Company Issues Index[0..1]:Str 0..1 1..1 1..1 1..1 getIndex():List 0..1 Member Owns 2..1 Executive Person 1..1 Class Diagrams Reasoning over UML models Model checking Specification recovery Software maintenance