SlideShare a Scribd company logo
1 of 27
Download to read offline
AN IN-DEPTH ANALYSIS OF TAGS AND CONTROLLED
METADATA FOR BOOK SEARCH
TOINE BOGERS
VIVIEN PETRAS
MARCH 23, 2017iCONFERENCE 2017
OUTLINE
▸ Introduction
▸ Methodology & Experimental Setup
▸ Analysis
– Tags vs. Controlled Vocabularies
– Book Search Requests
– Failure Analysis
▸ Conclusions & Future Work
2
INTRODUCTION
MOTIVATION
▸ Readers often struggle with existing systems (i.e., library
catalogs, Amazon, eBook sellers) to discover new books
– Information needs are contextual, personal & complex
– Book metadata does not contain the necessary information
4
EARLIER WORK
▸ iConference 2015
– Tags outperform controlled vocabularies for search, but
sometimes controlled vocabularies are better.
– Controlled vocabularies contains more unique terms, tags
more repetition of terms.
▸ Why?
– Terminology
– Popularity / frequency
– Type of request
5
STUDY OBJECTIVES
▸ Why are tags better than controlled vocabularies for book
search?
– Which types of book search requests are better addressed
using tags and which using CV?
– Which book search requests fail completely and what
characterizes such requests?
6
METHODOLOGY&
EXPERIMENTAL SETUP
EXPERIMENTAL SETUP
▸ Controlled Vocabulary content (CV)
– DDC class labels
– Subjects
– Geographic names
– Category labels
– LCSH terms
▸ Tags
– Each tag occurs as many times as it has been assigned by
the users
▸ Unique tags
– Each tag occurs only once
8
AMAZON/LIBRARYTHING COLLECTION 9
Tags
Tags
Controlled Vocabulary Content (CV)
DDC class labels
subjects
geographic names
category labels
LCSH terms
Unique Tags
Unique Tags per record
ANNOTATED LT TOPIC
10
Recommended
books
Topic title
Narrative
EXPERIMENTAL SETUP
▸ Amazon / LibraryThing collection of book records
– 2 million records
▸ LibraryThing forum topics for search requests
– 334 search requests for testing
▸ Relevance judgements
– Recommendations from LT members with graded relevance scoring
(highest relevance if book is added by searcher)
▸ Evaluation metric
– Normalized Discounted Cumulated Gain (NDCG@10)
▸ IR system
– Indri 5.4 toolkit
10
ANALYSIS
TAGS vs. CONTROLLED VOCABULARIES
▸ Question 1: Is there a difference in performance between
CV and Tags in retrieval?
▸ Answer
– Tags perform significantly
better than CV
– The combination of both
results in even better
performance than just for
tags, but not significantly so
– Losing tag frequency
information helps rather than
hurts performance (also not
significantly)
12
TAGS vs. CONTROLLED VOCABULARIES
▸ Question 2: Do tags outperform CV because of the so-
called popularity effect?
▸ Answer
– No, there does not seem to be a popularity effect
– Types = unique words in a record
– Tokens = all instances of words in a record
13
TAGS vs. CONTROLLED VOCABULARIES
▸ Question 3: Do Tags and
CV complement or cancel
each other out?
▸ Answer
– Tags and CV
complement each
other: they are
successful on different
sets of requests
– But most zero-difference
requests (74.0%)
actually fail completely!
When and why?
14
REQUESTS – RELEVANCE ASPECTS
▸ What makes a suggested book relevant to the user?
– Distinguish between eight relevance aspects (Reuter, 2007;
Koolen et al., 2015)
16
REQUESTS – RELEVANCE ASPECTS
Aspect Description
% of requests
(N = 87)
Accessibility Language, length, or level of difficulty of a book 9.2 %
Content Topic, plot, genre, style, or comprehensiveness 79.3 %
Engagement
Fit a certain mood or interest, are considered high
quality, or provide a certain reading experience
25.3 %
Familiarity
Similar to known books or related to a previous
experience
47.1 %
Known-item
The user is trying to identify a known book, but cannot
remember the metadata that would locate it
12.6 %
Metadata
With a certain title or by a certain author or publisher, in
a particular format, or certain year
23.0 %
Novelty Unusual or quirky, or containing novel content 3.4 %
Socio-cultural
Related to the user's socio-cultural background or
values; popular or obscure
13.8 %
16
REQUESTS – RELEVANCE ASPECTS
▸ Question 4: What types of book requests are best served
by the Unique tags and CV collections?
▸ Answer
– CV terms show a tendency to work best for requests that
touch upon aspects of engagement
– Other requests are best served by Unique tags
17
REQUESTS – RELEVANCE ASPECTS
0,00 0,10 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90 1,00
Socio-cultural
(N = 10)
Novelty
(N = 2)
Metadata
(N = 17)
Known-item
(N = 11)
Familiarity
(N = 36)
Engagement
(N = 21)
Content
(N = 63)
Accessibility
(N = 7)
Unique tags
CV
0.0 0.20.1 0.40.3 0.60.5 0.80.7 1.00.9
Socio-cultural
(N = 10)
0.1127
0.0428
Novelty
(N = 2)
0.5304
0.0000
Metadata
(N = 17)
0.2454
0.1259
Known-item
(N = 11)
0.3593
0.1818
Familiarity
(N = 36)
0.1833
0.0701
Engagement
(N = 21)
0.1121
0.1425
Content
(N = 63)
0.1965
0.0821
Accessibility
(N = 7)
0.1235
0.0749
Performance grouped by relevance aspect
NDCG@10
18
REQUESTS – TYPE OF BOOK
▸ Question 5: What types of book requests (fiction or non-
fiction) are best served by Unique tags or CV?
▸ Answer
– Unique tags work significantly better for fiction
– CV work better for non-fiction (but not significantly so)
19
FAILURE ANALYSIS
▸ Question 6: Do failed book search requests fail because of
data sparsity, a lower recall base, or a lack of examples?
▸ Answer
– Neither sparsity nor the size of the recall base are the
reason for retrieval failure
– The number of examples provided by the requester has
significant positive influence on performance
(N = 247)
(N = 87)
(N = 334)
20
FAILURE ANALYSIS
▸ Question 7: Do book search requests fail because of their
relevance aspects?
▸ Answer
– No, relevance
aspects are
distributed equally
for successful &
failed requests
– Only Accessibility-
and Metadata-
related search
requests seem to
fail more often
21
FAILURE ANALYSIS
▸ Question 8: Does the type of book that is being requested
(fiction vs. non-fiction) have an influence on whether
requests succeed or fail?
▸ Answer
– Requests for works of fiction fail significantly more often
22
CONCLUSIONS &
FUTURE WORK
FINDINGS
▸ Tags outperform CV...
– ...probably because their terminology is closer to the user‘s
language (not because of the popularity effect)
▸ Sometimes CV are better, for example, for non-fiction books...
– ...whereas tags are better for fiction and for content-related,
familiarity or known-item searches
▸ We believe that tags are simply better able to match the user‘s
language when looking for books
– Although they are still not that great at it!
– Book search is still hard, especially for fiction books
25
OPEN QUESTIONS
▸ How can book metadata be adapted to be closer to the
vocabulary used in real-world book search requests?
▸ What other aspects (besides type of requested book or
relevance aspect of search request) contribute to request
difficulty?
▸ Our question to you:
– What other questions can we ask of this data?
26
QUESTIONS?
Paper URL: http://bit.ly/iconf2017

More Related Content

Viewers also liked

RDA, MARC and BIBFRAME: transition and interaction
RDA, MARC and BIBFRAME: transition and interactionRDA, MARC and BIBFRAME: transition and interaction
RDA, MARC and BIBFRAME: transition and interaction
Gordon Dunsire
 
Beyond MARC: MARC, linked data, and Bibframe
Beyond MARC: MARC, linked data, and BibframeBeyond MARC: MARC, linked data, and Bibframe
Beyond MARC: MARC, linked data, and Bibframe
Thomas Meehan
 
Cataloging with RDA: An Overview
Cataloging with RDA: An OverviewCataloging with RDA: An Overview
Cataloging with RDA: An Overview
Emily Nimsakont
 

Viewers also liked (10)

Semantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAMESemantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAME
 
Subject Headings & Classification, or, Why librarians don't seem to think lik...
Subject Headings & Classification, or, Why librarians don't seem to think lik...Subject Headings & Classification, or, Why librarians don't seem to think lik...
Subject Headings & Classification, or, Why librarians don't seem to think lik...
 
RDA, MARC and BIBFRAME: transition and interaction
RDA, MARC and BIBFRAME: transition and interactionRDA, MARC and BIBFRAME: transition and interaction
RDA, MARC and BIBFRAME: transition and interaction
 
Beyond MARC: MARC, linked data, and Bibframe
Beyond MARC: MARC, linked data, and BibframeBeyond MARC: MARC, linked data, and Bibframe
Beyond MARC: MARC, linked data, and Bibframe
 
BIBFRAME and Moving Away From MARC
BIBFRAME and Moving Away From MARCBIBFRAME and Moving Away From MARC
BIBFRAME and Moving Away From MARC
 
MARC and BIBFRAME
MARC and BIBFRAMEMARC and BIBFRAME
MARC and BIBFRAME
 
Tools of our Trade (RDA, MARC21) 2010-03-15
Tools of our Trade (RDA, MARC21) 2010-03-15Tools of our Trade (RDA, MARC21) 2010-03-15
Tools of our Trade (RDA, MARC21) 2010-03-15
 
RDA and the semantic Web
RDA and the semantic WebRDA and the semantic Web
RDA and the semantic Web
 
BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?
 
Cataloging with RDA: An Overview
Cataloging with RDA: An OverviewCataloging with RDA: An Overview
Cataloging with RDA: An Overview
 

Similar to An In-depth Analysis of Tags and Controlled Metadata for Book Search

natureofinquiryandresearch-191011224537.pdf
natureofinquiryandresearch-191011224537.pdfnatureofinquiryandresearch-191011224537.pdf
natureofinquiryandresearch-191011224537.pdf
JARYLPILLAZAR1
 
Marketing Research Ch04
Marketing Research Ch04Marketing Research Ch04
Marketing Research Ch04
guestf8364c
 
Questioning Practices And Strategies
Questioning Practices And  StrategiesQuestioning Practices And  Strategies
Questioning Practices And Strategies
robbi makely
 
Arte387 Ch3
Arte387 Ch3Arte387 Ch3
Arte387 Ch3
SCWARTED
 
TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title
TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title
TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title
TakishaPeck109
 
Essential questions
Essential questionsEssential questions
Essential questions
Carla Piper
 
Questionnaire design dr. s l gupta
Questionnaire design dr. s l guptaQuestionnaire design dr. s l gupta
Questionnaire design dr. s l gupta
Ravindra Sharma
 

Similar to An In-depth Analysis of Tags and Controlled Metadata for Book Search (20)

Nature of inquiry and research
Nature of inquiry and researchNature of inquiry and research
Nature of inquiry and research
 
natureofinquiryandresearch-191011224537.pdf
natureofinquiryandresearch-191011224537.pdfnatureofinquiryandresearch-191011224537.pdf
natureofinquiryandresearch-191011224537.pdf
 
Marketing Research Ch04
Marketing Research Ch04Marketing Research Ch04
Marketing Research Ch04
 
natureofinquiryandresearch-191011224537.pptx
natureofinquiryandresearch-191011224537.pptxnatureofinquiryandresearch-191011224537.pptx
natureofinquiryandresearch-191011224537.pptx
 
Questioning Practices And Strategies
Questioning Practices And  StrategiesQuestioning Practices And  Strategies
Questioning Practices And Strategies
 
Research questions and hypotheses_Hang_Vietnam
Research questions and hypotheses_Hang_VietnamResearch questions and hypotheses_Hang_Vietnam
Research questions and hypotheses_Hang_Vietnam
 
Identifying and formulating a research question: Ayurveda Perspective
Identifying and formulating a research question: Ayurveda Perspective Identifying and formulating a research question: Ayurveda Perspective
Identifying and formulating a research question: Ayurveda Perspective
 
Classroom Assessment Techniques
Classroom Assessment TechniquesClassroom Assessment Techniques
Classroom Assessment Techniques
 
PPT-Final.pptx
PPT-Final.pptxPPT-Final.pptx
PPT-Final.pptx
 
2. practical research ii nature of inquiry & research
2. practical research ii nature of inquiry & research2. practical research ii nature of inquiry & research
2. practical research ii nature of inquiry & research
 
2-171124011016.pdf
2-171124011016.pdf2-171124011016.pdf
2-171124011016.pdf
 
Arte387 Ch3
Arte387 Ch3Arte387 Ch3
Arte387 Ch3
 
TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title
TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title
TITLE OF PAPER HERE1TITLE OF PAPER HERE3Full Title
 
QUALITATIVE DATA ANALYSIS.ppt
QUALITATIVE DATA ANALYSIS.pptQUALITATIVE DATA ANALYSIS.ppt
QUALITATIVE DATA ANALYSIS.ppt
 
Summary+of+comments+based+on+scoring+on+feb++29+2012
Summary+of+comments+based+on+scoring+on+feb++29+2012Summary+of+comments+based+on+scoring+on+feb++29+2012
Summary+of+comments+based+on+scoring+on+feb++29+2012
 
Search vs Text Classification
Search vs Text ClassificationSearch vs Text Classification
Search vs Text Classification
 
Essential questions
Essential questionsEssential questions
Essential questions
 
Searching Databases.docx
Searching Databases.docxSearching Databases.docx
Searching Databases.docx
 
Searching Databases.docx
Searching Databases.docxSearching Databases.docx
Searching Databases.docx
 
Questionnaire design dr. s l gupta
Questionnaire design dr. s l guptaQuestionnaire design dr. s l gupta
Questionnaire design dr. s l gupta
 

More from Toine Bogers

Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while DrivingHands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Toine Bogers
 
A Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index SizeA Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index Size
Toine Bogers
 

More from Toine Bogers (14)

"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C..."If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
"If I like BLANK, what else will I like?": Analyzing a Human Recommendation C...
 
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while DrivingHands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
Hands-free but not Eyes-free: A Usability Evaluation of Siri while Driving
 
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
“Looking for an Amazing Game I Can Relax and Sink Hours into...”: A Study of ...
 
A Study of Usage and Usability of Intelligent Personal Assistants in Denmark
A Study of Usage and Usability of Intelligent Personal Assistants in DenmarkA Study of Usage and Usability of Intelligent Personal Assistants in Denmark
A Study of Usage and Usability of Intelligent Personal Assistants in Denmark
 
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
“What was this movie about this chick?”: A Comparative Study of Relevance Asp...
 
"I just scroll through my stuff until I find it or give up": A Contextual Inq...
"I just scroll through my stuff until I find it or give up": A Contextual Inq..."I just scroll through my stuff until I find it or give up": A Contextual Inq...
"I just scroll through my stuff until I find it or give up": A Contextual Inq...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Defining and Supporting Narrative-driven Recommendation
Defining and Supporting Narrative-driven RecommendationDefining and Supporting Narrative-driven Recommendation
Defining and Supporting Narrative-driven Recommendation
 
Personalized search
Personalized searchPersonalized search
Personalized search
 
A Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index SizeA Longitudinal Analysis of Search Engine Index Size
A Longitudinal Analysis of Search Engine Index Size
 
Measuring System Performance in Cultural Heritage Systems
Measuring System Performance in Cultural Heritage SystemsMeasuring System Performance in Cultural Heritage Systems
Measuring System Performance in Cultural Heritage Systems
 
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
How 'Social' are Social News Sites? Exploring the Motivations for Using Reddi...
 
Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?
 
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on TwitterMicro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
Micro-Serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter
 

Recently uploaded

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
University of Hertfordshire
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 

Recently uploaded (20)

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 

An In-depth Analysis of Tags and Controlled Metadata for Book Search

  • 1. AN IN-DEPTH ANALYSIS OF TAGS AND CONTROLLED METADATA FOR BOOK SEARCH TOINE BOGERS VIVIEN PETRAS MARCH 23, 2017iCONFERENCE 2017
  • 2. OUTLINE ▸ Introduction ▸ Methodology & Experimental Setup ▸ Analysis – Tags vs. Controlled Vocabularies – Book Search Requests – Failure Analysis ▸ Conclusions & Future Work 2
  • 4. MOTIVATION ▸ Readers often struggle with existing systems (i.e., library catalogs, Amazon, eBook sellers) to discover new books – Information needs are contextual, personal & complex – Book metadata does not contain the necessary information 4
  • 5. EARLIER WORK ▸ iConference 2015 – Tags outperform controlled vocabularies for search, but sometimes controlled vocabularies are better. – Controlled vocabularies contains more unique terms, tags more repetition of terms. ▸ Why? – Terminology – Popularity / frequency – Type of request 5
  • 6. STUDY OBJECTIVES ▸ Why are tags better than controlled vocabularies for book search? – Which types of book search requests are better addressed using tags and which using CV? – Which book search requests fail completely and what characterizes such requests? 6
  • 8. EXPERIMENTAL SETUP ▸ Controlled Vocabulary content (CV) – DDC class labels – Subjects – Geographic names – Category labels – LCSH terms ▸ Tags – Each tag occurs as many times as it has been assigned by the users ▸ Unique tags – Each tag occurs only once 8
  • 9. AMAZON/LIBRARYTHING COLLECTION 9 Tags Tags Controlled Vocabulary Content (CV) DDC class labels subjects geographic names category labels LCSH terms Unique Tags Unique Tags per record
  • 11. EXPERIMENTAL SETUP ▸ Amazon / LibraryThing collection of book records – 2 million records ▸ LibraryThing forum topics for search requests – 334 search requests for testing ▸ Relevance judgements – Recommendations from LT members with graded relevance scoring (highest relevance if book is added by searcher) ▸ Evaluation metric – Normalized Discounted Cumulated Gain (NDCG@10) ▸ IR system – Indri 5.4 toolkit 10
  • 13. TAGS vs. CONTROLLED VOCABULARIES ▸ Question 1: Is there a difference in performance between CV and Tags in retrieval? ▸ Answer – Tags perform significantly better than CV – The combination of both results in even better performance than just for tags, but not significantly so – Losing tag frequency information helps rather than hurts performance (also not significantly) 12
  • 14. TAGS vs. CONTROLLED VOCABULARIES ▸ Question 2: Do tags outperform CV because of the so- called popularity effect? ▸ Answer – No, there does not seem to be a popularity effect – Types = unique words in a record – Tokens = all instances of words in a record 13
  • 15. TAGS vs. CONTROLLED VOCABULARIES ▸ Question 3: Do Tags and CV complement or cancel each other out? ▸ Answer – Tags and CV complement each other: they are successful on different sets of requests – But most zero-difference requests (74.0%) actually fail completely! When and why? 14
  • 16. REQUESTS – RELEVANCE ASPECTS ▸ What makes a suggested book relevant to the user? – Distinguish between eight relevance aspects (Reuter, 2007; Koolen et al., 2015) 16
  • 17. REQUESTS – RELEVANCE ASPECTS Aspect Description % of requests (N = 87) Accessibility Language, length, or level of difficulty of a book 9.2 % Content Topic, plot, genre, style, or comprehensiveness 79.3 % Engagement Fit a certain mood or interest, are considered high quality, or provide a certain reading experience 25.3 % Familiarity Similar to known books or related to a previous experience 47.1 % Known-item The user is trying to identify a known book, but cannot remember the metadata that would locate it 12.6 % Metadata With a certain title or by a certain author or publisher, in a particular format, or certain year 23.0 % Novelty Unusual or quirky, or containing novel content 3.4 % Socio-cultural Related to the user's socio-cultural background or values; popular or obscure 13.8 % 16
  • 18. REQUESTS – RELEVANCE ASPECTS ▸ Question 4: What types of book requests are best served by the Unique tags and CV collections? ▸ Answer – CV terms show a tendency to work best for requests that touch upon aspects of engagement – Other requests are best served by Unique tags 17
  • 19. REQUESTS – RELEVANCE ASPECTS 0,00 0,10 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90 1,00 Socio-cultural (N = 10) Novelty (N = 2) Metadata (N = 17) Known-item (N = 11) Familiarity (N = 36) Engagement (N = 21) Content (N = 63) Accessibility (N = 7) Unique tags CV 0.0 0.20.1 0.40.3 0.60.5 0.80.7 1.00.9 Socio-cultural (N = 10) 0.1127 0.0428 Novelty (N = 2) 0.5304 0.0000 Metadata (N = 17) 0.2454 0.1259 Known-item (N = 11) 0.3593 0.1818 Familiarity (N = 36) 0.1833 0.0701 Engagement (N = 21) 0.1121 0.1425 Content (N = 63) 0.1965 0.0821 Accessibility (N = 7) 0.1235 0.0749 Performance grouped by relevance aspect NDCG@10 18
  • 20. REQUESTS – TYPE OF BOOK ▸ Question 5: What types of book requests (fiction or non- fiction) are best served by Unique tags or CV? ▸ Answer – Unique tags work significantly better for fiction – CV work better for non-fiction (but not significantly so) 19
  • 21. FAILURE ANALYSIS ▸ Question 6: Do failed book search requests fail because of data sparsity, a lower recall base, or a lack of examples? ▸ Answer – Neither sparsity nor the size of the recall base are the reason for retrieval failure – The number of examples provided by the requester has significant positive influence on performance (N = 247) (N = 87) (N = 334) 20
  • 22. FAILURE ANALYSIS ▸ Question 7: Do book search requests fail because of their relevance aspects? ▸ Answer – No, relevance aspects are distributed equally for successful & failed requests – Only Accessibility- and Metadata- related search requests seem to fail more often 21
  • 23. FAILURE ANALYSIS ▸ Question 8: Does the type of book that is being requested (fiction vs. non-fiction) have an influence on whether requests succeed or fail? ▸ Answer – Requests for works of fiction fail significantly more often 22
  • 25. FINDINGS ▸ Tags outperform CV... – ...probably because their terminology is closer to the user‘s language (not because of the popularity effect) ▸ Sometimes CV are better, for example, for non-fiction books... – ...whereas tags are better for fiction and for content-related, familiarity or known-item searches ▸ We believe that tags are simply better able to match the user‘s language when looking for books – Although they are still not that great at it! – Book search is still hard, especially for fiction books 25
  • 26. OPEN QUESTIONS ▸ How can book metadata be adapted to be closer to the vocabulary used in real-world book search requests? ▸ What other aspects (besides type of requested book or relevance aspect of search request) contribute to request difficulty? ▸ Our question to you: – What other questions can we ask of this data? 26