Ontology Evolution and Data Quality
An Empirical Analysis
Nandana Mihindukulasooriya, María Poveda Villalón,
Raúl García-Castro, and Asunción Gómez-Pérez
Ontology Engineering Group (OEG)
Universidad Politécnica de Madrid
Acknowledgments:
4V: Volumen, Velocidad, Variedad y Validez en la gestin innovadora de
datos (TIN2013-46238-C4-2-R)
http://loupe.linkeddata.es
Research Questions
• How have collaborative ontologies evolved in
practice?
• How classes and properties have changed?
• How does different communities handle those
changes?
• What is the impact of ontology changes to
data quality?
• What data quality issues are caused by ontology
evolution?
2Ontology Engineering Group, Universidad Politécnica de Madrid
Ontology Selection Criteria
• Popularity (wide-use) in the LOD Cloud datasets
• LOD Cloud State – Widely deployed vocabularies
• Availability of multiple versions of the ontology
• at least 5 versions
• at least 2 year time span
• Collaborative development
• A large number of participants
• Different communities
• W3C, Academic, Industrial, etc.
3Ontology Engineering Group, Universidad Politécnica de Madrid
Ontologies Studied
4Ontology Engineering Group, Universidad Politécnica de Madrid
Ontology Versions Timespan
Count Range
DBpedia 12 3.2 ~ 2016-04 2008/10 ~ 2016/10
Schema.org 24 0.91 ~ 2.2 2012/04 ~ 2015/11
W3C PROV-O 7 Initial – W3C Rec 2012/05 ~ 2015/01
FOAF 10 Initial – 0.99 2005/04 ~ 2014/01
• 4 Ontologies
• 53 versions
Ontologies Studied - Size
5Ontology Engineering Group, Universidad Politécnica de Madrid
0
500
1000
1500
2000
2500
3000
DBpedia v2016-04 Schema.org v2.2 PROV-O Rec FOAF v0.99
754
652
50 13
2848
992
68 62
Class/PropertyCount
Ontology
Ontology Size
Classes Properties
Data extraction
6Ontology Engineering Group, Universidad Politécnica de Madrid
http://loupe.linkeddata.es/
https://lov.okfn.org/
DBpedia
• Process
• Wiki-based approach
• Guidance on how to add classes and properties
• Not much tracking until recently
• Community
• Several sub-communities (language chapters)
• 488 with editor rights, 14 active last 30 days
7Ontology Engineering Group, Universidad Politécnica de Madrid
DBpedia
8Ontology Engineering Group, Universidad Politécnica de Madrid
174
204
255
272
319
359
529
683
735 739
754
32
57
17
48 41
171
159
57
9 15
-2 -6 0 -1 -1 -1 -5 -5 -5 0
-100
0
100
200
300
400
500
600
700
800
3.3 3,4 3,5 3,6 3,7 3,8 3,9 2014 2015-04 2015-10 2016-04
NumberofClasses
Ontology Version
DBpedia Classes
Total Count Additions Deletions
Quality issues
9Ontology Engineering Group, Universidad Politécnica de Madrid
Class Last
Version
LOD
Cache
instances
Italian DBpedia
2016-04
triples
dbo:Bullfighter 2015-04 2 -
dbo:Comics 2014 256 2241
dbo:Imdb 2015-04 3 -
dbo:Installment 2015-04 601 -
dbo:Pornstar 3.9 2 -
• Instances of deleted classes
• Redundant classes
• AdultActor and PornStar
Unstable classes
10Ontology Engineering Group, Universidad Politécnica de Madrid
Class 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 20
14
20
15
-04
20
15
-10
20
16
-04
dbo:Area
dbo:Municipality
DBpedia properties
11Ontology Engineering Group, Universidad Politécnica de Madrid
720
2168
1274 1335
1643
1775
2333
2795 2819 2833 2849
1719
304
98
325
135
566 508
127
23 18
-271
-1198
-37 -17 -3 -8 -46 -103
-9 -2
-1500
-1000
-500
0
500
1000
1500
2000
2500
3000
3500
3,2 3,4 3,5 3,6 3,7 3,8 3,9 2014 2015-05 2015-10 2016-04
NumberofProperties
Ontology Version
Total Count Additions Deletions
Quality Problems
12Ontology Engineering Group, Universidad Politécnica de Madrid
Property Removed
Version
esDBpedia
2016-04
triples
itDBpedia
2016-04
triples
dbo:buriedPlace 2015-04 4519 0
dbo:diseasesdb 2015-04 4346 0
dbo:emedicineTopic 2015-04 1977 0
dbo:foundingPerson 2015-10 2158 0
dbo:medlineplus 2015-04 3300 0
dbo:coordinates 2016-04 0 180
dbo:score 2016-04 0 26873
• Triples containing deleted properties
• Redundant classes
• dbo:dbo:color / dbo:colour
• dbo:foundingDate / dbo:formationDate
Unstable properties
13Ontology Engineering Group, Universidad Politécnica de Madrid
Property 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 20
14
20
15
-04
20
15
-10
20
16
-04
dbo:activeYears
dbo:bloodType
dbo:classis
dbo:coordinates
dbo:currentTeam
dbo:established
dbo:father
dbo:foundationDate
dbo:greekName
dbo:hasAnnotation
dbo:mother
dbo:nickname
dbo:organisation
Schema.org
14Ontology Engineering Group, Universidad Politécnica de Madrid
• Process
• Issue tracking + mailing lists
• ‘pending’, a staging area for new terms
• Community + Steering Group Review
• Community
• 48 contributors in GitHub
• Mainly industrial participants
Schema.org
15Ontology Engineering Group, Universidad Politécnica de Madrid
302
391 393
416
428 428
531
552 558 558
582 585 585 585 588 589 590 593 593
618 620
638 645 652
89
2
23
12
0
103
21
6 0
25
3 0 0 3 1 1 3 0
25
2
18
7 70 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0
-100
0
100
200
300
400
500
600
700
0,91 0,95 0,97 0,99 1.0a 1.0b 1.0c 1.0d 1.0e 1.0f 1,1 1,2 1,4 1,5 1,6 1,7 1,8 1,9 1,91 1,92 1,93 2.0 2.1 2.2
NumberofClasses
Ontology Version
Total Count Additions Deletions
Deprecating classes
16Ontology Engineering Group, Universidad Politécnica de Madrid
4320 instances in
LOD Cache
Schema.org properties
17Ontology Engineering Group, Universidad Politécnica de Madrid
286
465 466
544
581 582
627
675
711 711
777 792 794 798 803 806 806 816 816
878 891
965 976 992
179
1
78
37
1
45 48 36
0
66
15 3 4 5 3 0 10 1
62
14
74
12 160 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 -1 0 -1 0 -1 0
-200
0
200
400
600
800
1000
1200
0,91 0,95 0,97 0,99 1.0a 1.0b 1.0c 1.0d 1.0e 1.0f 1,1 1,2 1,4 1,5 1,6 1,7 1,8 1,9 1,91 1,92 1,93 2.0 2.1 2.2
NumerofProperties
Ontology Versions
Total Count Additions Deletions
Deleting properties
18Ontology Engineering Group, Universidad Politécnica de Madrid
0 triples in LOD Cache
FOAF
• Process
• Mailing list discussions
• Term stages (“unstable”, “testing”, “stable”, and “archaic”)
• Community
• 2 editors, numerous contributors
19Ontology Engineering Group, Universidad Politécnica de Madrid
FOAF Classes
20Ontology Engineering Group, Universidad Politécnica de Madrid
12 12 12 12 12 12 12 12
13 13
0 0 0 0 0 0 0
1
00
2
4
6
8
10
12
14
03/04/2005 19/05/2005 03/06/2005 14/01/2007 0,9 0,91 0,96 0,97 0,98 0,99
NumberofClasses
Ontology Version
Total Count Additions
Evolution of FOAF classes
21Ontology Engineering Group, Universidad Politécnica de Madrid
Classes 0.97 0.98 0.99
foaf:Agent
foaf:Person
foaf:Group
foaf:Organization
foaf:Document
foaf:Image
foaf:PersonalProfileDocument
foaf:OnlineAccount
foaf:Project
foaf:OnlineChatAccount
foaf:OnlineEcommerceAccount
foaf:OnlineGamingAccount
foaf:LabelProperty N/A
Unstable Testing Stable
FOAF Properties
22Ontology Engineering Group, Universidad Politécnica de Madrid
52 52
53 53 53
54
58 58
62 62
1 1
0 0
1
4
0
4
00 0 0 0 0 0 0 0 0
03/04/2005 19/05/2005 03/06/2005 14/01/2007 0,9 0,91 0,96 0,97 0,98 0,99
NumberofProperties
Ontology Version
FOAF Properties
Total Count Additions Deletions
Evolution of FOAF properties
23Ontology Engineering Group, Universidad Politécnica de Madrid
PROV-O
• Process
• Developed by W3C Provenance Working Group
• Weekly teleconferences, F2Fs meetings
• W3C issue tracker
• Community
• 3 editors, 7 contributors, 59 WG members
• A mix of academic and industrial participants
24Ontology Engineering Group, Universidad Politécnica de Madrid
PROV-O classes
25Ontology Engineering Group, Universidad Politécnica de Madrid
38
30 30 30
50 50 50
8
1
0
20
0 0
-16
-1
0 0 0 0
-20
-10
0
10
20
30
40
50
60
NumberofClasses
Ontology Version
Total Count Additions Removals
PROV-O properties
26Ontology Engineering Group, Universidad Politécnica de Madrid
60
52
50 50
68 68 68
-22
-3
0 0 0 0
14
1 0
18
0 0
-30
-20
-10
0
10
20
30
40
50
60
70
80
03/05/2012 24/07/2012 11/12/2012 12/03/2013 30/04/2013 07/06/2014 11/01/2015
NumberofProperties
Ontology Version
Total Count Additions Deletions
Conclusions
• Selected ontologies serve different type of use cases
• Top-down vs bottom-up methodologies
• Level of manual curation
• One approach does not fit all
• It’s not easy to find practical guidelines & best practices for
community-driven ontology evolution
• Schema.org manages ontology evolution better than Dbpedia
ontology
• Some differences between DBpedia and Schema.org
• Monolithic vs modular approach
• A good editorial process and governance
• Tracking changes and communication
• None of them do not seem to follow the theoretical
methodologies or tool frameworks found in literature
27Ontology Engineering Group, Universidad Politécnica de Madrid
Questions?
Backup Slides
29Ontology Engineering Group, Universidad Politécnica de Madrid
Ontology Changes
30Ontology Engineering Group, Universidad Politécnica de Madrid
Class
Class
Hierarchy
Property Property
Hierarchy
Property
Domain
Property
Range
Symmetric
Property
Inverse
Property
Transitive
Property
Min
Cardinality
Max
Cardinality
Functional
Property Inverse Functional
Property
Annotations
Class
Equivalence
Property
Equivalence
Ontology Changes Analyzed
31Ontology Engineering Group, Universidad Politécnica de Madrid
Class
Class
Hierarchy
Property Property
Hierarchy
Property
Domain
Property
Range
Symmetric
Property
Inverse
Property
Transitive
Property
Min
Cardinality
Max
Cardinality
Functional
Property Inverse Functional
Property
Annotations
Class
Equivalence
Property
Equivalence
32Ontology Engineering Group, Universidad Politécnica de Madrid
Ontology Changes
Example – Class changes from v1.0f to v1.1
33Ontology Engineering Group, Universidad Politécnica de Madrid
Version N Version N + 1
Schema.org
v1.0f - 558
(2014-02-05)
v1.1 - 582
(2014-04-04)
OnSitePickup
…
…
-1
557
+25
Answer
Airline
BusReservation
BusTrip
Car
EmailMessage
Flight
FlightReservation
EventReservation
Seat
Taxi
Reservation
TaxiReservation
Ticket
Vehicle
TrainTrip
∆ +24
Example – DBpedia Class Changes Table
34Ontology Engineering Group, Universidad Politécnica de Madrid
Ver. Date Count ∆ + -
3.3 2009-07-03 174
3.4 2009-11-11 204 30 32 -2
3.5 2010-04-12 255 51 57 -6
3.6 2011-01-17 272 17 17 0
3.7 2011-09-11 319 47 48 -1
3.8 2012-08-06 359 40 41 -1
3.9 2013-09-17 529 170 171 -1
2014 2014-09-09 683 154 159 -5
2015-04 2015-09-04 735 52 57 -5
2015-10 2016-03-31 739 4 9 -5
2016-04 2016-10-15 754 15 15 0
DBpedia
Comparable metric for changes
• Not easy to compare because the variation of the
ontology size
• DBpedia 2848 properties vs FOAF 62 properties
• A comparable measure
• ( 𝑣=0
𝑛 𝑇𝑒𝑟𝑚𝑠 𝐴𝑑𝑑𝑒𝑑𝑣
𝑇𝑜𝑡𝑎𝑙 𝑇𝑒𝑟𝑚𝑠 𝐶𝑜𝑢𝑛𝑡𝑣
/ n ) * 100
• ( 𝑣=0
𝑛 𝑇𝑒𝑟𝑚𝑠 𝐷𝑒𝑙𝑒𝑡𝑒𝑑 𝑣
𝑇𝑜𝑡𝑎𝑙 𝑇𝑒𝑟𝑚𝑠 𝐶𝑜𝑢𝑛𝑡𝑣
/ n ) * 100
35Ontology Engineering Group, Universidad Politécnica de Madrid
36Ontology Engineering Group, Universidad Politécnica de Madrid
Ontology Term Lifetime
Term Lifetime (DBpedia properties example)
37Ontology Engineering Group, Universidad Politécnica de Madrid
3.3 3.4 3.5 3.73.6 3.8 3.9 2014
dbo:county
2015
/ 04
2015
/ 10
dbo:rating
dbo:pages
dbo:priceMoney
Term Lifetime
Term Duration
38Ontology Engineering Group, Universidad Politécnica de Madrid
3.3 3.4 3.5 3.73.6 3.8 3.9 2014
dbo:county
2015
/ 04
2015
/ 10
dbo:rating
dbo:pages
dbo:priceMoney
Versions Months
dbo:county 10 81
dbo:rating 7 63
dbo:pages 4 18
dbo:priceMoney 5 49
Term Duration
Term Lifetime (DBpedia properties example)
39Ontology Engineering Group, Universidad Politécnica de Madrid
3.3 3.4 3.5 3.73.6 3.8 3.9 2014
dbo:county
2015
/ 04
2015
/ 10
dbo:rating
dbo:pages
dbo:priceMoney
Versions Versions
Normalized
Months Months
Normalized
dbo:county 10 1.0 81 1.00
dbo:rating 7 0.7 63 0.78
dbo:pages 4 0.4 18 0.22
dbo:priceMoney 5 0.5 49 0.60
Term Lifetime
40Ontology Engineering Group, Universidad Politécnica de Madrid
Term Lifetime Composition
DBpedia 2016-04 properties example
41Ontology Engineering Group, Universidad Politécnica de Madrid
Average Term (Property) Lifetime
42Ontology Engineering Group, Universidad Politécnica de Madrid
Lifetime
(Versions)
Number of
terms
1 15
2 23
3 122
4 453
5 509
6 127
7 308
8 97
9 287
10 522
11 385
Avg. Term Lifetime
=
𝑘=1
𝑛
𝐿𝑖𝑓𝑒𝑡𝑖𝑚𝑒 𝑘 ∗ 𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝑇𝑒𝑟𝑚𝑠𝑘
𝑇𝑜𝑡𝑎𝑙𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝑇𝑒𝑟𝑚𝑠
Average Term (Property) Lifetime
of DBpedia 2016-04 = 7.20
Average Term (Property) Lifetime
43Ontology Engineering Group, Universidad Politécnica de Madrid
Normalized
Lifetime
(Versions)
Number of
terms
0.09 15
0.18 23
0.27 122
0.36 453
0.45 509
0.55 127
0.64 308
0.73 97
0.82 287
0.91 522
1.00 385
Average Term (Property) Lifetime
of DBpedia 2016-04
0.65
44Ontology Engineering Group, Universidad Politécnica de Madrid
Potential Quality Issues
Removing vocabulary terms (classes and properties)
• What happens to the downstream ontologies that use
or extend the vocabulary term?
• Instances of removed concepts
• Triples containing removed properties
• How these changes are notified to the users of the
ontology?
• Can these detected automatically using test-driven
ontology engineering and Linked Data generation?
45Ontology Engineering Group, Universidad Politécnica de Madrid
Adding terms (Classes and properties)
• Less problematic than deletions
• Can add duplicates lowering the conciseness
• dbo:color / dbo:colour
• dbo:foundingDate / dbo:formationDate
• Dbo:AdultActor / dbo:PornStar
46Ontology Engineering Group, Universidad Politécnica de Madrid
Comparison to Software Engineering
• Staging pre-releases
• alpha, beta
• Major changes
• Major / minor versions
• Test-driven development
• Detection of impact of versions changes automatically and
immediately
47Ontology Engineering Group, Universidad Politécnica de Madrid
Future work
• Analyze the correlation between data quality issues
and term lifetime
• H1: More errors when using terms with shorter term
lifetime?
• H2: More errors in using ontologies with shorter average
term lifetime?
• Correlation ontology changes and conciseness?
• Correlation between ontology changes and
development methodology and tools used?
48Ontology Engineering Group, Universidad Politécnica de Madrid

Collaborative Ontology Evolution and Data Quality - An Empirical Analysis

  • 1.
    Ontology Evolution andData Quality An Empirical Analysis Nandana Mihindukulasooriya, María Poveda Villalón, Raúl García-Castro, and Asunción Gómez-Pérez Ontology Engineering Group (OEG) Universidad Politécnica de Madrid Acknowledgments: 4V: Volumen, Velocidad, Variedad y Validez en la gestin innovadora de datos (TIN2013-46238-C4-2-R) http://loupe.linkeddata.es
  • 2.
    Research Questions • Howhave collaborative ontologies evolved in practice? • How classes and properties have changed? • How does different communities handle those changes? • What is the impact of ontology changes to data quality? • What data quality issues are caused by ontology evolution? 2Ontology Engineering Group, Universidad Politécnica de Madrid
  • 3.
    Ontology Selection Criteria •Popularity (wide-use) in the LOD Cloud datasets • LOD Cloud State – Widely deployed vocabularies • Availability of multiple versions of the ontology • at least 5 versions • at least 2 year time span • Collaborative development • A large number of participants • Different communities • W3C, Academic, Industrial, etc. 3Ontology Engineering Group, Universidad Politécnica de Madrid
  • 4.
    Ontologies Studied 4Ontology EngineeringGroup, Universidad Politécnica de Madrid Ontology Versions Timespan Count Range DBpedia 12 3.2 ~ 2016-04 2008/10 ~ 2016/10 Schema.org 24 0.91 ~ 2.2 2012/04 ~ 2015/11 W3C PROV-O 7 Initial – W3C Rec 2012/05 ~ 2015/01 FOAF 10 Initial – 0.99 2005/04 ~ 2014/01 • 4 Ontologies • 53 versions
  • 5.
    Ontologies Studied -Size 5Ontology Engineering Group, Universidad Politécnica de Madrid 0 500 1000 1500 2000 2500 3000 DBpedia v2016-04 Schema.org v2.2 PROV-O Rec FOAF v0.99 754 652 50 13 2848 992 68 62 Class/PropertyCount Ontology Ontology Size Classes Properties
  • 6.
    Data extraction 6Ontology EngineeringGroup, Universidad Politécnica de Madrid http://loupe.linkeddata.es/ https://lov.okfn.org/
  • 7.
    DBpedia • Process • Wiki-basedapproach • Guidance on how to add classes and properties • Not much tracking until recently • Community • Several sub-communities (language chapters) • 488 with editor rights, 14 active last 30 days 7Ontology Engineering Group, Universidad Politécnica de Madrid
  • 8.
    DBpedia 8Ontology Engineering Group,Universidad Politécnica de Madrid 174 204 255 272 319 359 529 683 735 739 754 32 57 17 48 41 171 159 57 9 15 -2 -6 0 -1 -1 -1 -5 -5 -5 0 -100 0 100 200 300 400 500 600 700 800 3.3 3,4 3,5 3,6 3,7 3,8 3,9 2014 2015-04 2015-10 2016-04 NumberofClasses Ontology Version DBpedia Classes Total Count Additions Deletions
  • 9.
    Quality issues 9Ontology EngineeringGroup, Universidad Politécnica de Madrid Class Last Version LOD Cache instances Italian DBpedia 2016-04 triples dbo:Bullfighter 2015-04 2 - dbo:Comics 2014 256 2241 dbo:Imdb 2015-04 3 - dbo:Installment 2015-04 601 - dbo:Pornstar 3.9 2 - • Instances of deleted classes • Redundant classes • AdultActor and PornStar
  • 10.
    Unstable classes 10Ontology EngineeringGroup, Universidad Politécnica de Madrid Class 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 20 14 20 15 -04 20 15 -10 20 16 -04 dbo:Area dbo:Municipality
  • 11.
    DBpedia properties 11Ontology EngineeringGroup, Universidad Politécnica de Madrid 720 2168 1274 1335 1643 1775 2333 2795 2819 2833 2849 1719 304 98 325 135 566 508 127 23 18 -271 -1198 -37 -17 -3 -8 -46 -103 -9 -2 -1500 -1000 -500 0 500 1000 1500 2000 2500 3000 3500 3,2 3,4 3,5 3,6 3,7 3,8 3,9 2014 2015-05 2015-10 2016-04 NumberofProperties Ontology Version Total Count Additions Deletions
  • 12.
    Quality Problems 12Ontology EngineeringGroup, Universidad Politécnica de Madrid Property Removed Version esDBpedia 2016-04 triples itDBpedia 2016-04 triples dbo:buriedPlace 2015-04 4519 0 dbo:diseasesdb 2015-04 4346 0 dbo:emedicineTopic 2015-04 1977 0 dbo:foundingPerson 2015-10 2158 0 dbo:medlineplus 2015-04 3300 0 dbo:coordinates 2016-04 0 180 dbo:score 2016-04 0 26873 • Triples containing deleted properties • Redundant classes • dbo:dbo:color / dbo:colour • dbo:foundingDate / dbo:formationDate
  • 13.
    Unstable properties 13Ontology EngineeringGroup, Universidad Politécnica de Madrid Property 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 20 14 20 15 -04 20 15 -10 20 16 -04 dbo:activeYears dbo:bloodType dbo:classis dbo:coordinates dbo:currentTeam dbo:established dbo:father dbo:foundationDate dbo:greekName dbo:hasAnnotation dbo:mother dbo:nickname dbo:organisation
  • 14.
    Schema.org 14Ontology Engineering Group,Universidad Politécnica de Madrid • Process • Issue tracking + mailing lists • ‘pending’, a staging area for new terms • Community + Steering Group Review • Community • 48 contributors in GitHub • Mainly industrial participants
  • 15.
    Schema.org 15Ontology Engineering Group,Universidad Politécnica de Madrid 302 391 393 416 428 428 531 552 558 558 582 585 585 585 588 589 590 593 593 618 620 638 645 652 89 2 23 12 0 103 21 6 0 25 3 0 0 3 1 1 3 0 25 2 18 7 70 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 -100 0 100 200 300 400 500 600 700 0,91 0,95 0,97 0,99 1.0a 1.0b 1.0c 1.0d 1.0e 1.0f 1,1 1,2 1,4 1,5 1,6 1,7 1,8 1,9 1,91 1,92 1,93 2.0 2.1 2.2 NumberofClasses Ontology Version Total Count Additions Deletions
  • 16.
    Deprecating classes 16Ontology EngineeringGroup, Universidad Politécnica de Madrid 4320 instances in LOD Cache
  • 17.
    Schema.org properties 17Ontology EngineeringGroup, Universidad Politécnica de Madrid 286 465 466 544 581 582 627 675 711 711 777 792 794 798 803 806 806 816 816 878 891 965 976 992 179 1 78 37 1 45 48 36 0 66 15 3 4 5 3 0 10 1 62 14 74 12 160 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 -1 0 -1 0 -1 0 -200 0 200 400 600 800 1000 1200 0,91 0,95 0,97 0,99 1.0a 1.0b 1.0c 1.0d 1.0e 1.0f 1,1 1,2 1,4 1,5 1,6 1,7 1,8 1,9 1,91 1,92 1,93 2.0 2.1 2.2 NumerofProperties Ontology Versions Total Count Additions Deletions
  • 18.
    Deleting properties 18Ontology EngineeringGroup, Universidad Politécnica de Madrid 0 triples in LOD Cache
  • 19.
    FOAF • Process • Mailinglist discussions • Term stages (“unstable”, “testing”, “stable”, and “archaic”) • Community • 2 editors, numerous contributors 19Ontology Engineering Group, Universidad Politécnica de Madrid
  • 20.
    FOAF Classes 20Ontology EngineeringGroup, Universidad Politécnica de Madrid 12 12 12 12 12 12 12 12 13 13 0 0 0 0 0 0 0 1 00 2 4 6 8 10 12 14 03/04/2005 19/05/2005 03/06/2005 14/01/2007 0,9 0,91 0,96 0,97 0,98 0,99 NumberofClasses Ontology Version Total Count Additions
  • 21.
    Evolution of FOAFclasses 21Ontology Engineering Group, Universidad Politécnica de Madrid Classes 0.97 0.98 0.99 foaf:Agent foaf:Person foaf:Group foaf:Organization foaf:Document foaf:Image foaf:PersonalProfileDocument foaf:OnlineAccount foaf:Project foaf:OnlineChatAccount foaf:OnlineEcommerceAccount foaf:OnlineGamingAccount foaf:LabelProperty N/A Unstable Testing Stable
  • 22.
    FOAF Properties 22Ontology EngineeringGroup, Universidad Politécnica de Madrid 52 52 53 53 53 54 58 58 62 62 1 1 0 0 1 4 0 4 00 0 0 0 0 0 0 0 0 03/04/2005 19/05/2005 03/06/2005 14/01/2007 0,9 0,91 0,96 0,97 0,98 0,99 NumberofProperties Ontology Version FOAF Properties Total Count Additions Deletions
  • 23.
    Evolution of FOAFproperties 23Ontology Engineering Group, Universidad Politécnica de Madrid
  • 24.
    PROV-O • Process • Developedby W3C Provenance Working Group • Weekly teleconferences, F2Fs meetings • W3C issue tracker • Community • 3 editors, 7 contributors, 59 WG members • A mix of academic and industrial participants 24Ontology Engineering Group, Universidad Politécnica de Madrid
  • 25.
    PROV-O classes 25Ontology EngineeringGroup, Universidad Politécnica de Madrid 38 30 30 30 50 50 50 8 1 0 20 0 0 -16 -1 0 0 0 0 -20 -10 0 10 20 30 40 50 60 NumberofClasses Ontology Version Total Count Additions Removals
  • 26.
    PROV-O properties 26Ontology EngineeringGroup, Universidad Politécnica de Madrid 60 52 50 50 68 68 68 -22 -3 0 0 0 0 14 1 0 18 0 0 -30 -20 -10 0 10 20 30 40 50 60 70 80 03/05/2012 24/07/2012 11/12/2012 12/03/2013 30/04/2013 07/06/2014 11/01/2015 NumberofProperties Ontology Version Total Count Additions Deletions
  • 27.
    Conclusions • Selected ontologiesserve different type of use cases • Top-down vs bottom-up methodologies • Level of manual curation • One approach does not fit all • It’s not easy to find practical guidelines & best practices for community-driven ontology evolution • Schema.org manages ontology evolution better than Dbpedia ontology • Some differences between DBpedia and Schema.org • Monolithic vs modular approach • A good editorial process and governance • Tracking changes and communication • None of them do not seem to follow the theoretical methodologies or tool frameworks found in literature 27Ontology Engineering Group, Universidad Politécnica de Madrid
  • 28.
  • 29.
    Backup Slides 29Ontology EngineeringGroup, Universidad Politécnica de Madrid
  • 30.
    Ontology Changes 30Ontology EngineeringGroup, Universidad Politécnica de Madrid Class Class Hierarchy Property Property Hierarchy Property Domain Property Range Symmetric Property Inverse Property Transitive Property Min Cardinality Max Cardinality Functional Property Inverse Functional Property Annotations Class Equivalence Property Equivalence
  • 31.
    Ontology Changes Analyzed 31OntologyEngineering Group, Universidad Politécnica de Madrid Class Class Hierarchy Property Property Hierarchy Property Domain Property Range Symmetric Property Inverse Property Transitive Property Min Cardinality Max Cardinality Functional Property Inverse Functional Property Annotations Class Equivalence Property Equivalence
  • 32.
    32Ontology Engineering Group,Universidad Politécnica de Madrid Ontology Changes
  • 33.
    Example – Classchanges from v1.0f to v1.1 33Ontology Engineering Group, Universidad Politécnica de Madrid Version N Version N + 1 Schema.org v1.0f - 558 (2014-02-05) v1.1 - 582 (2014-04-04) OnSitePickup … … -1 557 +25 Answer Airline BusReservation BusTrip Car EmailMessage Flight FlightReservation EventReservation Seat Taxi Reservation TaxiReservation Ticket Vehicle TrainTrip ∆ +24
  • 34.
    Example – DBpediaClass Changes Table 34Ontology Engineering Group, Universidad Politécnica de Madrid Ver. Date Count ∆ + - 3.3 2009-07-03 174 3.4 2009-11-11 204 30 32 -2 3.5 2010-04-12 255 51 57 -6 3.6 2011-01-17 272 17 17 0 3.7 2011-09-11 319 47 48 -1 3.8 2012-08-06 359 40 41 -1 3.9 2013-09-17 529 170 171 -1 2014 2014-09-09 683 154 159 -5 2015-04 2015-09-04 735 52 57 -5 2015-10 2016-03-31 739 4 9 -5 2016-04 2016-10-15 754 15 15 0 DBpedia
  • 35.
    Comparable metric forchanges • Not easy to compare because the variation of the ontology size • DBpedia 2848 properties vs FOAF 62 properties • A comparable measure • ( 𝑣=0 𝑛 𝑇𝑒𝑟𝑚𝑠 𝐴𝑑𝑑𝑒𝑑𝑣 𝑇𝑜𝑡𝑎𝑙 𝑇𝑒𝑟𝑚𝑠 𝐶𝑜𝑢𝑛𝑡𝑣 / n ) * 100 • ( 𝑣=0 𝑛 𝑇𝑒𝑟𝑚𝑠 𝐷𝑒𝑙𝑒𝑡𝑒𝑑 𝑣 𝑇𝑜𝑡𝑎𝑙 𝑇𝑒𝑟𝑚𝑠 𝐶𝑜𝑢𝑛𝑡𝑣 / n ) * 100 35Ontology Engineering Group, Universidad Politécnica de Madrid
  • 36.
    36Ontology Engineering Group,Universidad Politécnica de Madrid Ontology Term Lifetime
  • 37.
    Term Lifetime (DBpediaproperties example) 37Ontology Engineering Group, Universidad Politécnica de Madrid 3.3 3.4 3.5 3.73.6 3.8 3.9 2014 dbo:county 2015 / 04 2015 / 10 dbo:rating dbo:pages dbo:priceMoney Term Lifetime
  • 38.
    Term Duration 38Ontology EngineeringGroup, Universidad Politécnica de Madrid 3.3 3.4 3.5 3.73.6 3.8 3.9 2014 dbo:county 2015 / 04 2015 / 10 dbo:rating dbo:pages dbo:priceMoney Versions Months dbo:county 10 81 dbo:rating 7 63 dbo:pages 4 18 dbo:priceMoney 5 49 Term Duration
  • 39.
    Term Lifetime (DBpediaproperties example) 39Ontology Engineering Group, Universidad Politécnica de Madrid 3.3 3.4 3.5 3.73.6 3.8 3.9 2014 dbo:county 2015 / 04 2015 / 10 dbo:rating dbo:pages dbo:priceMoney Versions Versions Normalized Months Months Normalized dbo:county 10 1.0 81 1.00 dbo:rating 7 0.7 63 0.78 dbo:pages 4 0.4 18 0.22 dbo:priceMoney 5 0.5 49 0.60 Term Lifetime
  • 40.
    40Ontology Engineering Group,Universidad Politécnica de Madrid Term Lifetime Composition
  • 41.
    DBpedia 2016-04 propertiesexample 41Ontology Engineering Group, Universidad Politécnica de Madrid
  • 42.
    Average Term (Property)Lifetime 42Ontology Engineering Group, Universidad Politécnica de Madrid Lifetime (Versions) Number of terms 1 15 2 23 3 122 4 453 5 509 6 127 7 308 8 97 9 287 10 522 11 385 Avg. Term Lifetime = 𝑘=1 𝑛 𝐿𝑖𝑓𝑒𝑡𝑖𝑚𝑒 𝑘 ∗ 𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝑇𝑒𝑟𝑚𝑠𝑘 𝑇𝑜𝑡𝑎𝑙𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝑇𝑒𝑟𝑚𝑠 Average Term (Property) Lifetime of DBpedia 2016-04 = 7.20
  • 43.
    Average Term (Property)Lifetime 43Ontology Engineering Group, Universidad Politécnica de Madrid Normalized Lifetime (Versions) Number of terms 0.09 15 0.18 23 0.27 122 0.36 453 0.45 509 0.55 127 0.64 308 0.73 97 0.82 287 0.91 522 1.00 385 Average Term (Property) Lifetime of DBpedia 2016-04 0.65
  • 44.
    44Ontology Engineering Group,Universidad Politécnica de Madrid Potential Quality Issues
  • 45.
    Removing vocabulary terms(classes and properties) • What happens to the downstream ontologies that use or extend the vocabulary term? • Instances of removed concepts • Triples containing removed properties • How these changes are notified to the users of the ontology? • Can these detected automatically using test-driven ontology engineering and Linked Data generation? 45Ontology Engineering Group, Universidad Politécnica de Madrid
  • 46.
    Adding terms (Classesand properties) • Less problematic than deletions • Can add duplicates lowering the conciseness • dbo:color / dbo:colour • dbo:foundingDate / dbo:formationDate • Dbo:AdultActor / dbo:PornStar 46Ontology Engineering Group, Universidad Politécnica de Madrid
  • 47.
    Comparison to SoftwareEngineering • Staging pre-releases • alpha, beta • Major changes • Major / minor versions • Test-driven development • Detection of impact of versions changes automatically and immediately 47Ontology Engineering Group, Universidad Politécnica de Madrid
  • 48.
    Future work • Analyzethe correlation between data quality issues and term lifetime • H1: More errors when using terms with shorter term lifetime? • H2: More errors in using ontologies with shorter average term lifetime? • Correlation ontology changes and conciseness? • Correlation between ontology changes and development methodology and tools used? 48Ontology Engineering Group, Universidad Politécnica de Madrid