Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
MANAGING CHANGES IN CLASSIFICATION:
the case of UDC
Aida Slavic
Editor-in-Chief, UDC
aida.slavic@udcc.org
FOCUS
•  Bibliographic classification in the linked data
environment
•  Practical issues to do with changes in
classificat...
539.1 Nuclear physics. Atomic physics. Molecular physics
539.12 Elementary and simple particles
539.123/.124 Leptons. Incl...
539.1 Nuclear physics. Atomic physics. Molecular physics
539.12 Elementary and simple particles
539.123/.124 Leptons. Incl...
NOTATION - LANGUAGE INDEPENDENT
Class =162.3 Czech
SKOS export from UDC
Summary
CLASS vs CONCEPT
§  Class notation rarely represent a single concept
§  Sometimes the notation serves for practical grou...
NOTATION: A PLACE HOLDER
598.2 Aves (Birds)
598.24 Gruiformes. Charadriiformes. Ciconiiformes
598.244 Ciconiiformes
598.24...
NOTATION: A CONTAINER OF INFORMATION
582.53 Alismatales
Including: Strictly extinct genus Heleophyton
SN: Class here also ...
BIBLIOGRAPHIC CLASSIFICATIONS
§  deal with recorded knowledge, i.e. after it has been
embodied in documents
§  organize ...
NOT AN ONTOLOGY…
§  Bibliographic classifications are primarily concerned with
subjects
Subject = systematized body of id...
BIBLIOGRAPHIC CLASSIFICATIONS
Two dominant characteristics:
§  disciplinary organization - organize the universe of
knowl...
POLYHIERARCHY
§  in the universe of knowledge one concept can belong
to more than one broader category
Domestic
animals
P...
“DISTRIBUTED RELATIVES”
Chemical industry
Pest-control chemicals
Chemicals for controlling rodents. Rodenticides
Mouse
Agr...
LINKING CONCEPTS ACROSS KNOWLEDGE
	
  
	
  	
  	
  	
  
Sharks
Natural	
  Sciences
Biology
Animals
Vertebrata
Pisces	
  (F...
681 PRECISION MECHANISMS AND INSTRUMENTS
681.1 Apparatus with wheel or motor mechanisms
681.2 Instrument-making in general...
IT HAPPENS ALL THE TIME... STARTS AS ONE CONCEPT...
§  Finding logical place for new and pervasive concepts
NANOTECHNOLOG...
=2 Western langauges
=20 English
=3 Germanic languages
=4 Romance or Neo-Latin languages
=50 Italian
=60 Spanish
=690 Port...
=2 Western langauges
=20 English
=3 Germanic languages
=4 Romance or Neo-Latin languages
=50 Italian
=60 Spanish
=690 Port...
MORE CULTURAL BIAS…
2 RELIGION. FAITHS
21/28 CHRISTIANITY
21 Natural theology. Theodicy. De Deo
22 The Bible. Holy scriptu...
EXAMPLE 3: CORRECTED 15 YEARS AGO
2 RELIGION. FAITHS
21/28 Christianity
21 Natural theology. Theodicy. De Deo
22 The Bible...
GEO-POLITICAL ENTITIES
§  new entities are being created, many entities become
‘historical’
§  administrative subdivisio...
TYPE OF CHANGES IN SCHEME
§  Relocation: moving/introducing entire
hierarchies from one place of classification
structure...
TRADITIONAL APPROACH IN HANDLING CHANGES
Changes as published in the Extension and Corrections to the UDC
More information...
NOTATION BECOMES AMBIGUOUS
Bible
27-23
now
represented22
reused
was
represented
22 Religions originating in Far
East
Reuse...
CANCELLATION MAPPING DATA
CLASS ID: 16544
NOTATION: 22 CAPTION: Religions originating in the Far East
INTRODUCED/DATE: 001...
UDC CHANGES AND LIBRARIES
§  libraries continue to use classification numbers 20-50
years or longer – few libraries have ...
COMPLEX CLASSIFICATION STRINGS
Any part of the complex subject description can change over
time
Such complex UDC codes are...
GOOD PRACTICE IN MANAGING SUBJECT ACCESS
DOCUMENT
IsDescribedBy
IsDescribedIn
CAN LINKED DATA SOLVE THE PROBLEM?
LINKED DATA THAT CANNOT BE LINKED
§  National library of Hungary
<bibo:Document rdf:about="http://nektar.oszk.hu/resource...
ON THE OTHER HAND…
§  UDC archive contains
historical data and tracks
changes of UDC numbers
(from 1900-1990 in paper
for...
URI
§  Option 2: notation + database ID
....//UDCMRF/22_17054 [Bible]
...//UDCMRF/22_16554 [Religions originating in the ...
STANDARDS: LACKING APPROPRIATE SOLUTION
§  Solution 2 (by C. Guéret): extending SKOS/MRF data with
either
§  event ontol...
SOLUTION 1: TOWARDS UDC CONCEPT (A. Isaac)
udcmrf:reference/22
"22"^^udc:notationskos:notation
udcmrf:22_17054
skos:prefLa...
SOLUTION 2: CLASS CHANGES AS AN EVENT (C. Guéret)
§  this would allow to publish/share all UDC classes that ever existed
...
DATA NEED SHARING: NOTATION & CONCEPT HISTORY
§  Whenever UDC notation is re-used e.g.
§  notation used for: term descri...
DATA NEED SHARING: CANCELLATION
§  UDC number may be cancelled but its record and its ID stays
permanently
§  cancellati...
UDC LINKED DATA ARCHITECTURE WILL GET MORE COMPLICATED
Towards look-up service based
on classification RDF triple
store… (...
CONCLUDING REMARKS
§  UDC RDF triple store should contain all data necessary to
resolve and interpret strings coming from...
CFP closes on 8th March http://seminar.udcc.org/2015/
THANK YOU
Upcoming SlideShare
Loading in …5
×

Aida Slavic Managing KOS: Evolution of concepts and their representation

747 views

Published on

Aida Slavic (UDC) “Managing KOS: Evolution of concepts and their representation"
Presentation at the KnoweScape workshop "Evolution and variation of classification systems" March 4-5, 2015 Amsterdam

Published in: Education
  • Be the first to comment

  • Be the first to like this

Aida Slavic Managing KOS: Evolution of concepts and their representation

  1. 1. MANAGING CHANGES IN CLASSIFICATION: the case of UDC Aida Slavic Editor-in-Chief, UDC aida.slavic@udcc.org
  2. 2. FOCUS •  Bibliographic classification in the linked data environment •  Practical issues to do with changes in classification scheme §  Consequences these changes have on information exchange §  Importance of publishing historical classification data as linked data
  3. 3. 539.1 Nuclear physics. Atomic physics. Molecular physics 539.12 Elementary and simple particles 539.123/.124 Leptons. Including: Muons 539.123 Neutrinos 539.123.6 Antineutrinos 539.124 Electrons (including beta-particles) 539.124.6 Positrons 539.125/.126 Hadrons. Baryons and mesons 539.125 Nucleons 539.125.4 Protons 539.125.46 Antiprotons 539.125.5 Neutrons 539.125.56 Antineutrons 539.126.3 Mesons 539.126.4 Resonances 539.126.6 Hyperons ALPHABETICAL vs SYSTEMATIC Antineutrinos Antineutrons Antiprotons Atomic physics Baryons Beta-particles Bosons Electrons Hadrons Hyperons Leptons Mesons Molecular physics Muons Neutrinos Neutrons Nuclear physics Nuclei Nucleons Positrons Protons Resonances words alone can only be arranged or ordered alphabetically Classification orders concepts systematically
  4. 4. 539.1 Nuclear physics. Atomic physics. Molecular physics 539.12 Elementary and simple particles 539.123/.124 Leptons. Including: Muons 539.123 Neutrinos 539.123.6 Antineutrinos 539.124 Electrons (including beta-particles) 539.124.6 Positrons 539.125/.126 Hadrons. Baryons and mesons 539.125 Nucleons 539.125.4 Protons 539.125.46 Antiprotons 539.125.5 Neutrons 539.125.56 Antineutrons 539.126.3 Mesons 539.126.4 Resonances 539.126.6 Hyperons 539.1 Nuclear physics. Atomic physics. Molecular physics 539.12 Elementary and simple particles 539.123/.124 Leptons. Including: Muons 539.123 Neutrinos 539.123.6 Antineutrinos 539.124 Electrons (including beta-particles) 539.124.6 Positrons 539.125/.126 Hadrons. Baryons and mesons 539.125 Nucleons 539.125.4 Protons 539.125.46 Antiprotons 539.125.5 Neutrons 539.125.56 Antineutrons 539.126.3 Mesons 539.126.4 Resonances 539.126.6 Hyperons NOTATION Antineutrinos Antineutrons Antiprotons Atomic physics Baryons Beta-particles Bosons Electrons Hadrons Hyperons Leptons Mesons Mesons Molecular physics Muons Neutrinos Neutrons Nuclear physics Nuclei Nucleons Positrons Protons Resonances alphabetical order systematic order semantic relationships fixed by notation NOTATION – enables mechanical ordering of subjects
  5. 5. NOTATION - LANGUAGE INDEPENDENT Class =162.3 Czech SKOS export from UDC Summary
  6. 6. CLASS vs CONCEPT §  Class notation rarely represent a single concept §  Sometimes the notation serves for practical grouping of phenomena §  This causes many issues when it comes to using ontology-based standards as vehicles for presenting and managing classification schemes
  7. 7. NOTATION: A PLACE HOLDER 598.2 Aves (Birds) 598.24 Gruiformes. Charadriiformes. Ciconiiformes 598.244 Ciconiiformes 598.244.2 Ciconiidae Including: Storks (genera Ciconia and Mycteria); the Jabiru (genus Jabiru); openbill storks (genus Anastomus) and adjutants (genus Leptoptilos) Note: “storks” (in English) can be roughly taken as a common term for most of the extant species of class Ciconiidae ...in other languages species in this class do not have the same common name e.g. the English word ‘storks’ cannot be translated accurately in other languages
  8. 8. NOTATION: A CONTAINER OF INFORMATION 582.53 Alismatales Including: Strictly extinct genus Heleophyton SN: Class here also Alismatidae (scientifically outdated) ... 597.2/.5 Pisces (fishes) (scientifically outdated) Note: Bibliographic classifications often have to contain concepts – even after these stop to be scientifically relevant.
  9. 9. BIBLIOGRAPHIC CLASSIFICATIONS §  deal with recorded knowledge, i.e. after it has been embodied in documents §  organize literature about entities and not entities themselves §  have to fulfil additional requirements with respect to the context in which knowledge may be created, presented, recorded or used
  10. 10. NOT AN ONTOLOGY… §  Bibliographic classifications are primarily concerned with subjects Subject = systematized body of ideas Concept = an idea §  What is the subject (forms of knowledge)? Mining, Chemistry, Medicine §  What is the subject about? (topics) mining of gold physical properties of water angina pectoris
  11. 11. BIBLIOGRAPHIC CLASSIFICATIONS Two dominant characteristics: §  disciplinary organization - organize the universe of knowledge by disciplines i.e. forms of knowledge - based on some scientific and educational consensus §  aspect classification - groups phenomena according to the way they are researched, described and studied in documents
  12. 12. POLYHIERARCHY §  in the universe of knowledge one concept can belong to more than one broader category Domestic animals Pets Carnivora Canidae Dog
  13. 13. “DISTRIBUTED RELATIVES” Chemical industry Pest-control chemicals Chemicals for controlling rodents. Rodenticides Mouse Agriculture Animal husbandry Rodents kept for fur Mouse Zoology Mammals Rodentia. Lagomorpha Myiomorpha Muridae. Mice and rats Mouse Agriculture Plant protection Control of plant diseases and pests Destruction of vertebrate pests Mouse see also see also see also
  14. 14. LINKING CONCEPTS ACROSS KNOWLEDGE           Sharks Natural  Sciences Biology Animals Vertebrata Pisces  (Fishes) Elasmobranchii Sharks Arts.  Recreation.  Entertainment.  Sport Film.  Cinema  (motion  pictures) Film  genres Documentary  films Documentaries  about  sharks Social  Sciences Economic  science Economic  sectors Tourism Adventure  tourism Swimming  with  sharks Arts.  Recreation.  Entertainment.  Sport Sport Sport  fishing Sea  fishing Shark  fishing Applied  Sciences Agriculture Fishing Fishing  for  deep-­‐sea  species Shark  fishing Applied  Sciences Industries Leather  industry Fish  skin Sharkskin
  15. 15. 681 PRECISION MECHANISMS AND INSTRUMENTS 681.1 Apparatus with wheel or motor mechanisms 681.2 Instrument-making in general. Instrumentation. 681.3 Computers first placed here before 1980s 681.5 Automatic control engineering 681.6 Graphic reproduction machines and equipment 681.7 Optical apparatus and instruments 681.8 Technical acoustics. Musical instruments NEW KNOWLEDGE EMERGES Relocated to a new class 004 UDC 004/006 Dewey
  16. 16. IT HAPPENS ALL THE TIME... STARTS AS ONE CONCEPT... §  Finding logical place for new and pervasive concepts NANOTECHNOLOGYmedicine technology industry computer technology agriculture BIOTECHNOLOGY agriculture biology genetics industry medicine
  17. 17. =2 Western langauges =20 English =3 Germanic languages =4 Romance or Neo-Latin languages =50 Italian =60 Spanish =690 Portuguese =7 Classic languages. Latin and Greek =81 Slavonic langauges =88 Baltic languages =9 Oriental, African and other languages =91 Various Indo-European languages =92 Semitic languages =94 Hamitic languages ... REMOVING BIAS Wrong classification of languages - causes wrong classification of: - peoples - literatures - philology
  18. 18. =2 Western langauges =20 English =3 Germanic languages =4 Romance or Neo-Latin languages =50 Italian =60 Spanish =690 Portuguese =7 Classic languages. Latin and Greek =81 Slavonic langauges =88 Baltic languages =9 Oriental, African and other languages =91 Various Indo-European languages =92 Semitic languages =94 Hamitic langauges ... CORRECTED 25 YEARS AGO (UDC) causes wrong classification of: - peoples - literatures - linguistics Change to new scientific classification (1980s) =1/=2 Indo-European languages =3 Caucasian & other languages. Basque =4 Afro-Asiatic, Nilo-Saharan, Congo-Kordofanian, Khoisan =5 Ural-Altaic, Japanese, Korean, Ainu, Palaeo-Siberian, Eskimo-Aleut, Dravidian, Sino-Tibetan =6 Austro-Asiatic. Austronesian =7 Indo-Pacific, Australian =8 American Indian (Amerindian) languages =9 Artificial languages
  19. 19. MORE CULTURAL BIAS… 2 RELIGION. FAITHS 21/28 CHRISTIANITY 21 Natural theology. Theodicy. De Deo 22 The Bible. Holy scripture 23 Dogmatic theology 24 Practical theology 25 Pastoral theology 26 Christian church in general 27 General history of the Christian church 28 Christian churches, sects 29 NON CHRISTIAN RELIGIONS
  20. 20. EXAMPLE 3: CORRECTED 15 YEARS AGO 2 RELIGION. FAITHS 21/28 Christianity 21 Natural theology. Theodicy. De Deo 22 The Bible. Holy scripture 23 Dogmatic theology 24 Practical theology 25 Pastoral theology 26 Christian church in general 27 General history of the Christian church 28 Christian churches, sects 29 NON CHRISTIAN RELIGIONS NOW..... 2 RELIGION. FAITHS 21 Prehistoric and primitive religions 22 Religions of the Far East 23 Religions of the Indian subcontinent 24 Buddhism 25 Religions of antiquity 26 Judaism 27 Christianity 28 Islam 29 Modern spiritual movements -1 Theory, nature of religion -2 Evidence of religion -3 Persons in religion -4 Religious practice -5 Worship. Rites. Cult -6 Processes in religion -7 Religious organization -8 Various properties -9 History of the faith, religion, denomination or church
  21. 21. GEO-POLITICAL ENTITIES §  new entities are being created, many entities become ‘historical’ §  administrative subdivisions of modern countries change (approximately every 20 years) §  counties, districts, administrative units §  at the same time.. §  ‘old’ subjects have and will continue to have literature written about them §  Roman Empire, Venetian Republic, Austro-Hungarian Empire (Bukowina, Galizia), British Empire, USSR, Czechoslovakia, Yugoslavia §  Living and inanimate objects and cultural artefacts are studied and written about long after they are extinct, out of use or practice
  22. 22. TYPE OF CHANGES IN SCHEME §  Relocation: moving/introducing entire hierarchies from one place of classification structure to another e.g. 40% of UDC has changed from 1990-2008 §  class is cancelled §  new classes added §  class scope may change §  description may change, references may change
  23. 23. TRADITIONAL APPROACH IN HANDLING CHANGES Changes as published in the Extension and Corrections to the UDC More information about semantics of changes is kept in the UDC database (apart from revision field indicators, date of changes, date of introduction, source of change)
  24. 24. NOTATION BECOMES AMBIGUOUS Bible 27-23 now represented22 reused was represented 22 Religions originating in Far East Reuse of a notation for different concepts 26-23 §  unpopular but unavoidable §  can happen 10-50 years apart (desirable) or instantly (to avoid)
  25. 25. CANCELLATION MAPPING DATA CLASS ID: 16544 NOTATION: 22 CAPTION: Religions originating in the Far East INTRODUCED/DATE: 0012 REPLACES ID 15991: 299.1 Religions of Oriental Peoples NOTATION HISTORY: yes USED FOR: ID:17054: Bible REPLACED BY: ID:17355: Christian Bible Managing notation history in the UDC database:
  26. 26. UDC CHANGES AND LIBRARIES §  libraries continue to use classification numbers 20-50 years or longer – few libraries have resources to re- classify §  libraries rarely record the UDC number provenance – if they do this may represent a particular language edition §  consequence: new and old concept representations are used side by side causing many issues in managing/ mapping changes to facilitate information exchange
  27. 27. COMPLEX CLASSIFICATION STRINGS Any part of the complex subject description can change over time Such complex UDC codes are typical of in bibliographic databases/library catalogues
  28. 28. GOOD PRACTICE IN MANAGING SUBJECT ACCESS DOCUMENT IsDescribedBy IsDescribedIn
  29. 29. CAN LINKED DATA SOLVE THE PROBLEM?
  30. 30. LINKED DATA THAT CANNOT BE LINKED §  National library of Hungary <bibo:Document rdf:about="http://nektar.oszk.hu/resource/manifestation/2645471"> <dcterms:subject> <rdf:Description> <dcam:memberOf rdf:resource="http://purl.org/dc/terms/UDC"/> <rdf:value>894.511-32</rdf:value> </rdf:Description> </dcterms:subject> §  Trondheim - Library of Norwegian University Of Science And Technology (NTNU) – TEKORD http://ckan.net/package/tekord) •  all sets contain obsolete records cancelled from UDC 25 years ago or longer •  all sets contain complex UDC numbers that need to be parsed in order to be validated and linked
  31. 31. ON THE OTHER HAND… §  UDC archive contains historical data and tracks changes of UDC numbers (from 1900-1990 in paper form) §  from 1990-2014 changes in UDC recorded in the database – these can be accessed in the UDC Online service §  UDC Online can be used as a vehicle for a proper support to libraries – allowing for validation, parsing, number builder but also for storing and downloading UDC strings as authority records
  32. 32. URI §  Option 2: notation + database ID ....//UDCMRF/22_17054 [Bible] ...//UDCMRF/22_16554 [Religions originating in the Far East] §  Option 1: using unique database ID for the class (avoiding notation as an identifier as it can have different meanings over time): ....//UDCMRF/17054 [Bible] ...//UDCMRF/16554 [Religions originating in the Far East] This approach was used in UDC Summary LD http://udcdata.info/ §  Option 3: notation + ‘release stamp’ - problem: does notation introduced in UDC MRF93 continues to mean the same in MRF00 release? ....//UDCMRF/MRF93/22 [Bible] ....//UDCMRF/MRF99/22 [Bible] ....//UDCMRF/MRF00/22 [Religions originating in the Far East] Together with the ‘absolute’ MRF ….UDCMRF/22
  33. 33. STANDARDS: LACKING APPROPRIATE SOLUTION §  Solution 2 (by C. Guéret): extending SKOS/MRF data with either §  event ontology (LODE http://linkedevents.org/ontology/) §  PROV ontology (provenance) which would allow publishing/sharing information about what is §  SKOS lacks solution to represent historical data or to track historical changes and one has to look for solutions in other ontology-type standards for representing vocabularies §  Solution 1 (by A. Isaac): Extending SKOS using dc terms to model changes as isVersionOf and isReplacedBy relationships – introducing notation as a udcmrf:reference that can aggregate different concepts - but most importantly to allow for the introduction of concept into UDC (an empty node) But it is not only about indicating the relationship – rather it is about documenting the change. Hence a more complex model would be needed
  34. 34. SOLUTION 1: TOWARDS UDC CONCEPT (A. Isaac) udcmrf:reference/22 "22"^^udc:notationskos:notation udcmrf:22_17054 skos:prefLabel "Bible"@en "22"^^udc:notation skos:notation udcmrf:22_16544 skos:prefLabel "Far East religions"@en dct:isReplacedBy ore:aggregates ore:aggregates "299.1"^^udc:notation skos:notation udcmrf:299.1_15999 skos:prefLabel "Religions of Oriental Peoples"@en dct:isReplacedBy udc:concept- FarEastReligion dct:isVersionOf dct:isVersionOf
  35. 35. SOLUTION 2: CLASS CHANGES AS AN EVENT (C. Guéret) §  this would allow to publish/share all UDC classes that ever existed with all data related to the class lifecycle as well as with the various attributes relevant for automatic linking or replacement §  Such an approach would have to be supported with an appropriate service model §  Works with URI that is based on a ‘release stamp’ and notation dc:Creation rdfs:subClassOf lode:Event udc:Replacement rdfs:subClassOf lode:Event udc:Reuse rdfs:subClassOf lode:Event to get something similar to the following: udc:class/22 ical:hasEvent udc:event/1 udc:event/1 rdf:type udc:Creation udc:event/1 lode:involved udc:release/MRF10 udc:event/1 rdfs:comment “new class”
  36. 36. DATA NEED SHARING: NOTATION & CONCEPT HISTORY §  Whenever UDC notation is re-used e.g. §  notation used for: term describing concept for which the notation was previously used §  old concept moved to: ID of the class to which the concept was moved §  date of concept move §  source of concept move §  Whenever a concept is moved from one class to another §  concept that moved: term representing concept §  concept previously at: ID of the class at which concept was before §  date of move §  source of move
  37. 37. DATA NEED SHARING: CANCELLATION §  UDC number may be cancelled but its record and its ID stays permanently §  cancellation date (date of cancellation) §  cancellation source (issue of Extensions & Corrections in which this is published) §  replaced by: ID of the record to which the UDC number is redirected §  replacement type controlled list of types, expressing what the cancelled number is replaced with: new class, colon combination; combination with common auxiliary; combination with special auxiliary; other §  replacement (semantic) alignment controlled list: exact match, to broader, to narrower, approximation
  38. 38. UDC LINKED DATA ARCHITECTURE WILL GET MORE COMPLICATED Towards look-up service based on classification RDF triple store… (C. Guéret)
  39. 39. CONCLUDING REMARKS §  UDC RDF triple store should contain all data necessary to resolve and interpret strings coming from library catalogues (including historical UDC data) §  libraries should not need to worry about resolving the semantics of UDC codes §  UDC linked data should be supported by a front-end service (number look-up/resolution service) – which would enable parsing, validating and resolving URI for UDC codes
  40. 40. CFP closes on 8th March http://seminar.udcc.org/2015/ THANK YOU

×