Presentación del Dr. Getaneh Alemu (Solent University, Reino Unido), en el II Congreso de Información, Comunicación e Investigación (CICI 2018) “Metadatos y Organización de la Información”. Facultad de Filosofía y Letras de la Universidad Autónoma de Chihuahua, México. Evento organizado por el Cuerpo Académico 'Estudios de la Información' y el Grupo Disciplinar ‘Información, Lenguaje, Comunicación y Desarrollo Sostenible’. 29 de octubre de 2018.
1. A THEORY OF
METADATA ENRICHING & FILTERING
GETANEH ALEMU, PHD
2ND INFORMATION, COMMUNICATION
& RESEARCH CONFERENCE
UNIVERSIDAD AUTÓNOMA DE CHIHUAHUA
(MÉXICO)
OCTOBER 29TH 2018
2. WHAT IS METADATA?
• Metadata is “data about data”
• Metadata = about-ness
• Metadata is what you enter into a search engine, such as Google or your
library catalogue (the author of a book, a song title, a product name, etc)
• Metadata is your key-word in the sea of information
• Metadata is the tags, likes, dislikes, ratings, recommendations, reviews
• Metadata is the naming of people, things, places and objects
• Metadata is a language for finding, re-finding and discovering
3. WHY METADATA?
• Because I simply can’t imagine life without metadata
• Without it, we lose our sense of direction, compass, navigation,
search, exploration and discovery in the ocean of data and
information
• It is using metadata that we filter, sift through, prioritise, choose,
buy and sell
4. EXISTING METADATA CHALLENGES
Growing library collections
Ever changing technologies
Changing users’ expectations
Limitations of contemporary standards-based metadata approaches
The social space of documents is missing (Otlet, 1934)
Scant use of theories/theoretical frameworks in the inclusion of socially-
constructed metadata
5. GROWING COLLECTIONS
• The Library of Congress > 164 million information objects
• The British library > 150 million items
• Europeana.eu > 51,533,591 artworks, artefacts, books
• The Digital Public Library of America > 20,597,354 items
• Project Gutenberg > 56,000 free and public domain e-books
• World Digital Library > 19,147 items
• The Internet Archive > 15 petabytes of webpages
6. PRINT ERA CATALOGUING PRINCIPLES
• The principle of sufficiency and necessity (“Keep It Simple”)
• The principle of user convenience (Cataloguer knows better for
you)
• The principle of representation (The title page is all what matters)
• The principle of standardisation (coalescing into a single standard)
(Svenonius, 2000; IFLA, 2009)
7. RESEARCH METHODOLOGY
A social constructivist
approach
Cultural artefacts very often
lend themselves to various
interpretations and contexts
Constructivist grounded
theory method (Charmaz,
2006)
Theory building rather than
testing
8. RESEARCH METHODOLOGY
Study No.
Interviewees
Study Total Number Profession Sub-total
Study One 11
LIS MSc Students 8
LIS PhD Student 1
LIS Lecturers 2
Study Two
21
Librarians 10
LIS Researchers 5
LIS Lecturers 2
Metadata Consultants 4
Study Three 25
Under-graduate Students (BSC) 5
Post-graduate Students (MSc=4 & PhD=6) 10
Lecturers (other than LIS) 10
Total 57
14. METADATA DIVERSITY
• Expert-created metadata fails to adequately represent users’ terminologies
• Metadata experts might not anticipate the diverse interpretations inherent in users
• Disparity between controlled terminologies and terminologies used by users
• Human beings by nature do not always agree on a single about-ness, interpretation and
classification of things (Shirky, 2008; Weinberger, 2007)
• Classification and metadata are affected by socio-cultural, linguistic and political factors hence
metadata (Bowker & Star, 1999)
• Whilst people, places, objects and events are real objective (verifiable) facts, the metadata that
describes them is a social construct hence could be intensely subjective (Gartner, 2016)
19. ENRICH THEN FILTER
Separation of metadata content (enriching) and interface (filtering)
Enriching as a continuous process
From user-centred to user-driven metadata enriching and filtering
Metadata diversity better conforming to users’ needs
Seamless linking
‘Useful’ rather than ‘perfect’ metadata
Post-hoc user-driven filtering
20. WHAT IS LINKED DATA?
• Linked Data is data model
• Identifies data
• Describes data
• Links/relations between data elements
• Structured data elements
• Analogous to the way relational database systems function
• But Linked Data is aimed to operate at a web scale
• Web-scale data linking
22. WHY LINKED DATA?
• Making sense of data / annotating data
• Re‐usability
• Cross‐linking
• Integration and sharing of data (Berners‐Lee, 2009; Shadbolt,
2010; W3C, 2011).
“Adding a page provides content, but adding a link provides the organization,
structure and endorsement to information on the Web which turn the content as a
whole into something of great value” (Berners‐Lee (2007)
Linked Data is expressed in several overarching technological frameworks
including RDF, RDFS, OWL, SPARQL and URI.
23. CHALLENGES TO ADOPT LINKED DATA
T E C H N O L O G I E S
• Document centric rather than data-centric protocols
• Lack of scalability
• Portability issues
• Lack of interoperability
• Incompatible formats
24. LINKED DATA PRINCIPLES
https://www.w3.org/DesignIssues/LinkedData.html
1. Use URIs to name (identify) things.
2. Use HTTP URIs so that these things can be
looked up (interpreted, "dereferenced").
3. Provide useful information about what a name
identifies when it's looked up, using open
standards such as RDF, SPARQL, etc.
4. Refer to other things using their HTTP URI-
based names when publishing data on the Web.
25. HOW LINKED DATA?
Linked Data is expressed in several overarching technological frameworks including RDF, RDFS,
OWL, SPARQL and URI.
Resource Description Framework (RDF)
RDF is a data model to describe any concept or object (physical and abstract) using simple
Subject‐Predicate‐Object (also called triple) statements (Allemnag and Hendler, 2008).
It helps to describe an object through a set of self‐describing attributes (properties) and relations.
Unlike contemporary metadata schemas, RDF properties and relations are uniquely identified and
explicitly described in a manner that is machine processable. It is a simple, but robust and scalable
data model aimed at web scale rather than limited to a specific domain or applications.
26. HOW LINKED DATA?
Linked Data is expressed in several overarching technological frameworks including RDF, RDFS,
OWL, SPARQL and URI.
Resource Description Framework (RDF)
https://www.w3.org/TR/rdf-schema/
<RDF> <Description about="http://www.yourdomainname.com/RDF"> <book>Everything is
miscellaneous></book> <author>http://www.w3schools.com</homepage> </Description> </RDF>
RDF Triples ( Subject --> Relation/predicate Object)
Everything is miscellaneous isAuthoredBy David Weinberger
27. HOW LINKED DATA?
Resource Description Framework (RDF)
Subject Predicate Object
rdf:Statement is an instance of rdfs:Class. It is intended to represent the class of RDF
statements. An RDF statement is the statement made by a token of an RDF triple. The subject of
an RDF statement is the instance of rdfs:Resource identified by the subject of the triple. The
predicate of an RDF statement is the instance of rdf:Property identified by the predicate of the
triple. The object of an RDF statement is the instance of rdfs:Resource identified by the object
of the triple. rdf:Statement is in the domain of the
properties rdf:predicate, rdf:subject and rdf:object. Different individual rdf:Statement instances
may have the same values for their rdf:predicate, rdf:subject and rdf:objectproperties.
5.3.2 rdf:subject
https://www.w3.org/TR/rdf-schema/#ch_reificationvocab
28. HOW LINKED DATA?
http://w3schools.sinsixx.com/rdf/rdf_rules.asp.htm
<?xml version="1.0"?><RDF> <Description about="http://www.w3schools.com/RDF"> <author>Jan Egil
Refsnes</author> <homepage>http://www.w3schools.com</homepage> </Description> </RDF>
RDF Statements
The combination of a Resource, a Property, and a Property value forms a Statement (known as the subject, predicate and object of a Statement).
Let's look at some example statements to get a better understanding:
Statement: "The author of http://www.w3schools.com/RDF is Jan Egil Refsnes".
•The subject of the statement above is: http://www.w3schools.com/RDF
•The predicate is: author
•The object is: Jan Egil Refsnes
Statement: "The homepage of http://www.w3schools.com/RDF is http://www.w3schools.com".
•The subject of the statement above is: http://www.w3schools.com/RDF
•The predicate is: homepage
•The object is: http://www.w3schools.com
32. • From expert-provided metadata to a mixed metadata approach where both
the experts and users continually enhancing metadata
• From the principle of metadata simplicity to the principle of metadata
enriching
• From human-readable metadata to structured, uniquely identified and
interlinked metadata (metadata linking)
• From metadata silos to metadata openness enabling metadata sharing and
re-use (metadata openness)
• From a single interface to user-led, re-configurable interface (metadata
filtering)
T H E T H EO RY O F METAD ATA EN R I C H I N G & F I LT ER I N G
33. T H E T H EO RY O F METAD ATA EN R I C H I N G & F I LT ER I N G
34. PRACTICAL IMPLICATIONS
The balancing act of metadata enriching versus quality
‘Useful’ rather than ‘perfect’ metadata
Controlled vocabularies: taxonomies, thesauri, ontologies
Ontologies/thesauri afford us to create open & scalable metadata
structure
Allowing us to incorporate multiple interpretations of things
Incorporating multiple access points
35. THE FUTURE OF METADATA:
E N R I C H E D , L I N K E D , O P E N A N D F I LT E R E D
T H E T H EO RY O F METAD ATA EN R I C H I N G & F I LT ER I N G
Alemu, G., Stevens, B., Ross, P. (2012). Towards a conceptual framework for user-driven semantic metadata interoperability in digital libraries: A social constructivist approach. New Library World. 113 (1/2), 38-54
Alemu, G., Stevens, B., & Ross, P. (2011). A constructivist grounded theory approach to semantic metadata interoperability in digital libraries: preliminary reflections. Paper presented at QQML 2011, Athens.
Alemu, G., Stevens, B., Ross, P., & Chandler, J. (2015). The Use of a Constructivist Grounded Theory Method to Explore the Role of Socially-Constructed Metadata (Web 2.0) Approaches. QQML Journal, September 2015 Issue (pp. 517-540).
Alemu, G., Stevens, B., Ross, P. (2012). Towards a conceptual framework for user-driven semantic metadata interoperability in digital libraries: A social constructivist approach. New Library World. 113 (1/2), 38-54
Alemu, G., Stevens, B., & Ross, P. (2011). A constructivist grounded theory approach to semantic metadata interoperability in digital libraries: preliminary reflections. Paper presented at QQML 2011, Athens.
Alemu, G., Stevens, B., Ross, P., & Chandler, J. (2015). The Use of a Constructivist Grounded Theory Method to Explore the Role of Socially-Constructed Metadata (Web 2.0) Approaches. QQML Journal, September 2015 Issue (pp. 517-540).
Alemu, G., Stevens, B., Ross, P. (2012). Towards a conceptual framework for user-driven semantic metadata interoperability in digital libraries: A social constructivist approach. New Library World. 113 (1/2), 38-54
Alemu, G., Stevens, B., & Ross, P. (2011). A constructivist grounded theory approach to semantic metadata interoperability in digital libraries: preliminary reflections. Paper presented at QQML 2011, Athens.
Alemu, G., Stevens, B., Ross, P., & Chandler, J. (2015). The Use of a Constructivist Grounded Theory Method to Explore the Role of Socially-Constructed Metadata (Web 2.0) Approaches. QQML Journal, September 2015 Issue (pp. 517-540).
Alemu, G., Stevens, B., Ross, P. (2012). Towards a conceptual framework for user-driven semantic metadata interoperability in digital libraries: A social constructivist approach. New Library World. 113 (1/2), 38-54
Alemu, G., Stevens, B., & Ross, P. (2011). A constructivist grounded theory approach to semantic metadata interoperability in digital libraries: preliminary reflections. Paper presented at QQML 2011, Athens.
Alemu, G., Stevens, B., Ross, P., & Chandler, J. (2015). The Use of a Constructivist Grounded Theory Method to Explore the Role of Socially-Constructed Metadata (Web 2.0) Approaches. QQML Journal, September 2015 Issue (pp. 517-540).
Alemu, G., Stevens, B., Ross, P. (2012). Towards a conceptual framework for user-driven semantic metadata interoperability in digital libraries: A social constructivist approach. New Library World. 113 (1/2), 38-54
Alemu, G., Stevens, B., & Ross, P. (2011). A constructivist grounded theory approach to semantic metadata interoperability in digital libraries: preliminary reflections. Paper presented at QQML 2011, Athens.
Alemu, G., Stevens, B., Ross, P., & Chandler, J. (2015). The Use of a Constructivist Grounded Theory Method to Explore the Role of Socially-Constructed Metadata (Web 2.0) Approaches. QQML Journal, September 2015 Issue (pp. 517-540).
Alemu, G., Stevens, B., Ross, P. (2012). Towards a conceptual framework for user-driven semantic metadata interoperability in digital libraries: A social constructivist approach. New Library World. 113 (1/2), 38-54
Alemu, G., Stevens, B., & Ross, P. (2011). A constructivist grounded theory approach to semantic metadata interoperability in digital libraries: preliminary reflections. Paper presented at QQML 2011, Athens.
Alemu, G., Stevens, B., Ross, P., & Chandler, J. (2015). The Use of a Constructivist Grounded Theory Method to Explore the Role of Socially-Constructed Metadata (Web 2.0) Approaches. QQML Journal, September 2015 Issue (pp. 517-540).
"A rose by any other name would smell as sweet" is a popular reference to William Shakespeare's play Romeo and Juliet, in which Juliet seems to argue that it does not matter that Romeo is from her family's rival house of Montague, that is, that he is named "Montague". The reference is often used to imply that the names of things do not affect what they really are. Source: https://en.wikipedia.org/wiki/A_rose_by_any_other_name_would_smell_as_sweet
Shakespeare is when it comes to the identity of a person but for librarians and search engine experts what you call a thing affects find-ability, search-ability, discoverability.
As part of my PhD which I completed in June 2014, using constructivist grounded research method, I developed a theory of metadata enriching and filtering. The theory includes four overarching principles, namely the principle of metadata enriching, linking, openness and filtering. My PhD is two words: enriching and filtering.
The theory of metadata enriching and filtering espouses that metadata should be enriched through standardised and socially-constructed metadata approaches. ... In theory, metadata creation and enhancement (metadata enriching) is a continuous process and it involves authors, publishers, suppliers, librarians and users.