JCDL’18 Doctoral Consortium
A Semantically Enriched
Recommendation & Visualization
Approach for Academic Literature
Corinna Breitinger*
Information Science Group &
Human Computer Interaction Group
University of Konstanz
*Sponsored by SIGIR Student Travel Grant
Short Bio
2
National Institute of Informatics
Tokyo
(2014)
Bachelor of Science
UC Berkeley, California
(2011)
(2011-2014)
6/7/18
Master of Science
Linnaeus University, Sweden
(2016)
PhD Research: Information Science /
Human Computer Interaction
University of Konstanz
@BreitingerC – Semantically Enriched Recommendations for Academic Literature
PhD Stipend
Outline
• Introduction & Problem Setting
• Research Objective & Research Tasks
• Background
• Prior Work
• Planned Research
6/7/18 3@BreitingerC – Semantically Enriched Recommendations for Academic Literature
Recommender Systems
46/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
Academic Recommender Systems
56/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
Annual publication volume on
“Research Paper Recommender Systems’’
Prior Work: J. Beel, B. Gipp, S. Langer, and C. Breitinger, “Research-paper recommender
systems: a literature survey’’ International Journal on Digital Libraries, pp. 1-34, 2015.
Academic Recommender Systems
66/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
76/7/18
Academic Recommender Systems
@BreitingerC – Semantically Enriched Recommendations for Academic Literature
Problem
1. Existing content-based filtering recommendation approaches
focus on:
• textual similarity
• bibliometrics (citation counts, venue’s citation counts, etc.)
à Do not take into account the variety of semantic markers that
are especially prevalent in STEM literature:
• Academic citations
• Mathematical identifiers and equations
• Figures, graphs, and images
2. List-based visualizations of recommendation sets fail to communicate
the presence of such semantic features
86/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
Academic Literature Contains Valuable
Text-independent Semantic Information
96/7/18
(a)1
(b)
Mathematical expressions:
Citation-based similarity:
1 Source: https://citeplag.org/compare/110389/136117
(a) (b)
Figures / Graphs / Tables:
@BreitingerC – Semantically Enriched Recommendations for Academic Literature
Text-based measures
• Identifying textual similarity (string-
based similarity measures)
• Considering semantic similarity, e.g.
synonyms (knowledge-based similarity)
Citation-based measures
• Citation Proximity Analysis (CPA)
[Gipp & Beel, 2009]
• Citation-based Plagiarism Detection
(CbPD) [Gipp, Meuschke, Breitinger,
2014]
106/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
Mathematical Language
• Measures from Mathematical
Information Retrieval (MIR) –
extracting semantic meaning from
mathematical content
[Schubotz et al., 2016]
Figures/ Images/ Graphs
• Measures to assess the similarity of
images & diagram understanding
• Feature-point methods
• SIFT, SURF, BRISK
• Perceptual hashing
• pHash, minHash
Semantic Similarity Measures to be
Considered
Research Objective
Conceive, implement, and evaluate a
semantically-enriched recommendation
approach that considers text-independent
semantic markers (e.g. academic citations,
mathematical formulae, and figures) to
improve the recommendation of academic
literature in the STEM fields.
116/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
Research Tasks
1. Review today’s literature recommendation approaches:
• What is being done to improve recommendation
for less text-heavy literature from the STEM fields?
• What are the special requirements for literature
recommendation in the STEM field?
2. Conceive & design a recommendation approach that
takes into account semantic markers prevalent in STEM
literature (formulas, figures, and citations).
3. Implement the novel approach in a literature
recommender system.
6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 12
Research Tasks
4. Derive appropriate weights for the different similarity
measures depending on the user (his/ her information
need) and the research discipline
• e.g. math-heavy publications should place a higher weight
on formulas and a lower weight on text-based similarity
5. Conceive visualization concepts to support the user in
sense-making of the recommended literature sets.
6. Evaluate the recommender system
• User studies
6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 13
Background: Recommender Systems
6/7/18 14@BreitingerC – Semantically Enriched Recommendations for Academic Literature
• Recommendation approaches:
• User-based approaches, e.g. collaborative filtering (CF)
• Content-based filtering (CBF)
• Combination of approaches
• Academic Literature Recommendation:
• In a review of 62 approaches for research paper recommendation,
we found the majority of reviewed systems (55%) used content-based
approaches for recommending related academic literature [Beel et al.,
2016]
Prior Work
Beel, J., Gipp, B., Langer, S., and Breitinger, C. Research-paper Recommender Systems: A Literature Survey. International
Journal on Digital Libraries 17, 4 (2016), 305–338.
6/7/18 15
Prior Work
B. Gipp, N. Meuschke, and C. Breitinger, “Citation-based Plagiarism Detection: Practicability on a Large-scale Scientific Corpus” Journal
of the American Society for Information Science and Technology (JASIST), vol. 65, iss. 2, pp. 1527-1540, 2014.
N. Meuschke, C. Gondek, D. Seebacher, C. Breitinger, D. Keim, and B. Gipp, “An Adaptive Image-based Plagiarism Detection Approach”
in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Fort Worth, USA, 2018.
à Applied to: academic plagiarism detection (PD) use case
à Potential for: Recommending semantically relevant academic literature
@BreitingerC – Semantically Enriched Recommendations for Academic Literature
Background: Semantic Similarity
Measures
Prior Work: Citation-based Approaches
6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 16
Doc C
Doc E
Doc D
Section 1
This is an exampl etext withreferences to different documents for illustratingtheusageof
citation analysis for plagiari sm detection. This is an exampl etext withreferences to
different documents for illustrati ng the usage of citationanalysis forplagiarism detection .
This is ain-text citation [1].This is an exampl etext withreferences to different documents
for illustrating the usage of citation analysis for plagiari sm detection . This is an exampl e
text withreferenc es to differentdocuments fori llustratingthe usage ofci tation analysis
for plagiarism detection.
Section 2
Another in-text citation [2].tThis is anexample text with references todifferent
documents for illustrati ng the usage of citationanalysis forplagiarism detection. This is an
ex ampletext with references to different documents for illustrati ng the usageof citation
anal ysis for plagiarism detection. This is arepeated in-text citation [1].
This is an exampl etext withreferences to different documents for illustratingtheusageof
citation analysis for plagiari sm detection. This is an exampl etext withreferences to
different documents for illustrati ng the usage of citationanalysis forplagiarism detection .
Setion 3
A third in-text citation [3].This is an exampl etext withreferences to different documents
for illustrating the usage of citation analysis for plagiari sm detection . This is an exampl e
text withreferenc es to differentdocuments fori llustratingthe usage ofci tation analysis
for plagiarism detection. a final i n-text-citation[2].
References
[1]
[2]
[3]
Document B
This is an exampl etext withreferences to different documents for illustratingtheusage
ofci tation analysis for plagi arism detection. This is ain-text citation [1].This is an
ex ampletext with references to different documents for illustrati ng the usageof citation
anal ysis for plagiarism detection. Another exampl efor ani n-text citation [2].
This is an exampl etext withreferences to different documents for illustratingtheusage
ofci tation analysis for plagi arism detection.
This is an exampl etext withreferences to different documents for illustratingtheusage
ofci tation analysis for plagi arism detection. This is an exampl etext withreferences to
different documents for illustrati ng the usage of citationanalysis forplagiarism
detection. This is an exampl etext withreferences to different documents for illustrating
the usage ofcitation analysi s for pl agiarism detection.
This is an exampl etext withreferences to different documents for illustratingtheusage
ofci tation analysis for plagi arism detection. This is an exampl etext withreferences to
different documents for illustrati ng the usage of citationanalysis forplagiarism
detection. Here’s a third in-text citation [3].This is an exampl etext withreferences to
different documents for illustrati ng the usage of citationanalysis forplagiarism
detection.
This is an exampl etext withreferences to different documents for illustratingtheusage
ofci tation analysis for plagi arism detection.
Document A
References
[1]
[2]
[3]
EDC DECDC
Citation Pattern Citation Pattern
Doc A Doc B
Ins.EIns.DC
DECDC
Pattern Comparison
Doc A
Doc B
Prior Work
Gipp, B., Meuschke, N., and Breitinger, C. Citation-based Plagiarism Detection: Practicability on a Large-scale Scientific Corpus.
Journal of the American Society for Information Science and Technology (JASIST) 65, 8 (2014), 1527–1540.
à Applied to the PD use case
• Citation Proximity
Analysis (CPA)
• CbPD Approaches
• Longest Common
Citation Sequence
(LCCS)
• Greedy Citation
Tiling (GCT)
• Citation Chunking
(Cit-Chunk)
6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 17
Prior Work
B. Gipp, N. Meuschke, C. Breitinger, M. Lipinski, and A. Nürnberger. Demonstration of the First Citation-based Plagiarism Detection
Prototype. Proc. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR), pages 1119–1120, 2013.
B. Gipp, N. Meuschke, and C. Breitinger, “Citation-based Plagiarism Detection: Practicability on a Large-scale Scientific
Corpus” Journal of the American Society for Information Science and Technology (JASIST), vol. 65, iss. 2, pp. 1527-1540, 2014.
• A visualization concept for citation-based similarity: CitePlag
Prior Work: Citation-based Approaches
Prior Work: Citation-based Approaches
6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 18
Prior Work
M. Schwarzer, C. Breitinger, M. Schubotz, N. Meuschke, and B. Gipp, “Citolytics - A Link-based Recommender System for Wikipedia”
in Proceedings of the 11th ACM Conference on Recommender Systems (RecSys), 2017.
Citolytics: A link-based
Recommender System
• CPA approach for the first time
applied to the recommendation
use case
Ongoing: Citation-based Approaches
6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 19
• Currently: User study of
link-based recommendation
performance
• Findings:
àlink-based vs. text-based
recommendations address
different information needs
àSerendipitous vs. expected
àExpert vs. novice
Citolytics: A link-based
Recommender System
Prior Work: Image-based Similarity
6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 20
Prior Work
N. Meuschke, C. Gondek, D. Seebacher, C. Breitinger, D. Keim, and B. Gipp, “An Adaptive Image-based Plagiarism Detection
Approach” in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Fort Worth, USA, 2018.
• Images/ charts / diagrams are highly valuable text-
independent features [Kembhavi et al., 2016]
• Variations in the representation of images must
still be perceived as semantically similar
• PD approach by Meuschke et al. integrates
perceptual hashing with newly developed
similarity assessments for images, such as
ratio hashing and position-aware OCR text
matching
• Photographs & diagrams à perceptual hashing
(+OCR matching)
• Bar charts à ratio hashing
Prior Work: Image-based Similarity
@BreitingerC – Semantically Enriched Recommendations for Academic Literature 21
• ‘Adaptive process’ for image-based semantic similarity identification:
6/7/18
Prior Work
N. Meuschke, C. Gondek, D. Seebacher, C. Breitinger, D. Keim, and B. Gipp, “An Adaptive Image-based Plagiarism Detection
Approach” in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Fort Worth, USA, 2018.
6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 22
Related Work
[1] N. Meuschke, M. Schubotz, F. Hamborg, T. Skopal, and B. Gipp. 2017. Analyzing Mathematical Content to Detect Academic
Plagiarism. In Proc. CIKM.
[2] M. Schubotz, L. Kraemer, N. Meuschke, F. Hamborg, and B. Gipp, “Evaluating and Improving the Extraction of Mathematical
Identifier Definitions” in Proceedings of the 8th International Conference of the CLEF Association (CLEF 2017), 2017.
Related Work: Mathematical Formula
Analysis
Mathematical Information Retrieval (MIR)
• Meuschke et al. demonstrated the benefit of analyzing the
similarity of mathematical expressions to improve detection
of heavily disguised academic plagiarism [1]
• Schubotz et al. improved the extraction of math identifier
definitions from the surrounding text –> results in a better
understanding of the semantic meaning of a given formula [2]
Applying Mathematical Formula
Analysis to Recommendations
6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 23
• Consider Formula Semantics
• the semantic meaning of a mathematical identifier or an entire
mathematical equation (e.g. defined by the accepted name for
the identifier or equation)
• Consider Formula Patterns
• Sequence of several mathematical formulas contained in a STEM
document, which holds meaning (e.g. a derivation or proof)
à The consideration of mathematical formula concepts and patterns
can improve content-based recommendation approaches for STEM
literature
Semantically-enriched Literature
Recommendations
246/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
Task 1: Creation of suitable recommendation corpus
• PMC OAS
• Life Sciences and Biomedical field
• arXiv
• Physics, Mathematics, Statistics, Quantitative Biology,
Computer Science, and Electrical Engineering
à ensure a broad and representative spectrum of STEM fields
Semantically-enriched Literature
Recommendations
256/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
Task 2: Adapt and combine the presented semantic
similarity analysis approaches (currently applied to
the PD use case) to suit the recommendation use
case
• Adjust parameters and test weightings to tailor
existing approaches to STEM literature
recommendation (and visualization)
Semantically-enriched Literature
Recommendations
266/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
Related Work
J. Beel, A. Aizawa, C. Breitinger, and B. Gipp, “Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia,” in Proceedings of the
ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017.
J. Beel, B. Gipp, S. Langer, M. Genzmehr, E. Wilde, A. Nuernberger, and J. Pitman, “Introducing Mr. DLib, a Machine-readable Digital
Library” in Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL`11), Ottawa, Canada, 2011.
Task 3: Implement the semantically-enriched approach in a
literature recommendation system
• Make use of framework
provided by Mr.DLib
Semantically-enriched Literature
Recommendations
276/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
Task 4:
Conceive a suitable
visualization concept
• Quick identification &
navigation of relevant
semantic markers
Semantically-enriched Literature
Recommendations
286/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
Task 5: Evaluation
• Hypothesis: taking into consideration semantic markers (citations,
formulas) and similarity of images improves content-based literature
recommendation & exploration for STEM literature
• Comparison of different weighting schemes (textual similarity, citation-
based similarity, formula-based similarity, image-based similarity)
• Expert user study:
• Usefulness of recommendations given pre-defined information needs
• Easy of exploring and identifying relevant semantic features
Completed Research
6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 29
• Implementation of a link-based recommendation approach
as first proof of concept
• Citolytics - applied to the Wikipedia corpus
• System requirements engineering study completed
• to be submitted: Conference on Human Information Interaction
and Retrieval (CHIIR)
• Current:
• creation of a representative STEM literature dataset to be used
for recommendation generation
Summary
6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 31
Contributions:
1) Improve content-based recommendations by taking into
consideration the similarity of semantic markers (citations,
formulas) and figures
à valuable for STEM literature recommendation!
2) Conceive a visualization concept, with the help of user
studies, to aid users in more quickly identifying and
navigating through these otherwise hard-to-identify forms
of semantic relevance contained in academic literature
recommendations
Prior Work
326/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
Semantic document similarity analysis
B. Gipp, N. Meuschke, and C. Breitinger, “Citation-based Plagiarism Detection: Practicability on a Large-scale Scientific
Corpus.” Journal of the American Society for Information Science and Technology (JASIST), vol. 65, iss. 2, pp. 1527-1540,
2014.
B. Gipp, N. Meuschke, C. Breitinger, M. Lipinski, and A. Nuernberger, “Demonstration of Citation Pattern Analysis for
Plagiarism Detection.” In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in
Information Retrieval, Dublin, UK, 2013.
Meuschke, Gipp, and Breitinger. CitePlag: A Citation-based PDS Prototype. Proc. Int. Plagiarism Conf.,2012.
User Modeling
J. Beel, S. Langer, G. M. Kapitsaki, C. Breitinger, and B. Gipp, “Exploring the Potential of User Modeling based on Mind
Maps,” in User Modeling, Adaptation and Personalization – 23rd International Conference, UMAP 2015, Dublin, Ireland,
2015, pp. 3-17.
Recommender Systems
J. Beel, B. Gipp, S. Langer, and C. Breitinger, “Research-paper recommender systems: a literature survey,” International
Journal on Digital Libraries, pp. 1-34, 2015.
Trusted timestamping
C. Breitinger. “Using the Blockchain of Cryptocurrencies to Encourage Open Discussion and Sharing of Ideas”. Master
thesis: Linnaeus University, Sweden. June 2016.
References
[Beel et al. 2016] Beel, J., Gipp, B., Langer, S., and Breitinger, C. Research-paper Recommender Systems: A
Literature Survey. International Journal on Digital Libraries 17, 4 (2016), 305–338.
[Gipp & Beel, 2009] B. Gipp and J. Beel, “Citation Proximity Analysis (CPA) – A New Approach for Identifying Related
Work Based on Co-Citation Analysis,” in Proceedings of the 12th International Conference on
Scientometrics and Informetrics (ISSI’09), Rio de Janeiro, Brazil, 2009.
[Gipp, Meuschke, Breitinger, 2014] Gipp, Meuschke, Breitinger. “Citation-based Plagiarism Detection: Practicability
on a Large-scale Scientific Corpus”. Journal of the American Society for Information Science
and Technology, 65 (2): 1527–1540, 2014.
[Kembhavi et al., 2016] Kembhavi, Aniruddha, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, and Ali
Farhadi. "A Diagram Is Worth A Dozen Images." arXiv preprint arXiv:1603.07396 (2016).
[Meuschke et al. 2018] N. Meuschke, C. Gondek, D. Seebacher, C. Breitinger, D. Keim, and B. Gipp, “An Adaptive
Image-based Plagiarism Detection Approach” in Proceedings of the ACM/IEEE-CS Joint
Conference on Digital Libraries (JCDL), Fort Worth, USA, 2018.
[Schubotz et al. 2016] M. Schubotz, A. Grigorev, M. Leich, H. S. Cohl, N. Meuschke, B. Gipp, A. S. Youssef, and V.
Markl, “Semantification of Identifiers in Mathematics for Better Math Information Retrieval,”
in Proceedings of the 39th Int. ACM SIGIR Conference on Research and Development in
Information Retrieval, 2016.
336/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
Thank you!
6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 34
Contact:
corinna.breitinger@uni.kn
@BreitingerC

A Semantically Enriched Recommendation & Visualization Approach for Academic Literature

  • 1.
    JCDL’18 Doctoral Consortium ASemantically Enriched Recommendation & Visualization Approach for Academic Literature Corinna Breitinger* Information Science Group & Human Computer Interaction Group University of Konstanz *Sponsored by SIGIR Student Travel Grant
  • 2.
    Short Bio 2 National Instituteof Informatics Tokyo (2014) Bachelor of Science UC Berkeley, California (2011) (2011-2014) 6/7/18 Master of Science Linnaeus University, Sweden (2016) PhD Research: Information Science / Human Computer Interaction University of Konstanz @BreitingerC – Semantically Enriched Recommendations for Academic Literature PhD Stipend
  • 3.
    Outline • Introduction &Problem Setting • Research Objective & Research Tasks • Background • Prior Work • Planned Research 6/7/18 3@BreitingerC – Semantically Enriched Recommendations for Academic Literature
  • 4.
    Recommender Systems 46/7/18 @BreitingerC– Semantically Enriched Recommendations for Academic Literature
  • 5.
    Academic Recommender Systems 56/7/18@BreitingerC – Semantically Enriched Recommendations for Academic Literature Annual publication volume on “Research Paper Recommender Systems’’ Prior Work: J. Beel, B. Gipp, S. Langer, and C. Breitinger, “Research-paper recommender systems: a literature survey’’ International Journal on Digital Libraries, pp. 1-34, 2015.
  • 6.
    Academic Recommender Systems 66/7/18@BreitingerC – Semantically Enriched Recommendations for Academic Literature
  • 7.
    76/7/18 Academic Recommender Systems @BreitingerC– Semantically Enriched Recommendations for Academic Literature
  • 8.
    Problem 1. Existing content-basedfiltering recommendation approaches focus on: • textual similarity • bibliometrics (citation counts, venue’s citation counts, etc.) à Do not take into account the variety of semantic markers that are especially prevalent in STEM literature: • Academic citations • Mathematical identifiers and equations • Figures, graphs, and images 2. List-based visualizations of recommendation sets fail to communicate the presence of such semantic features 86/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
  • 9.
    Academic Literature ContainsValuable Text-independent Semantic Information 96/7/18 (a)1 (b) Mathematical expressions: Citation-based similarity: 1 Source: https://citeplag.org/compare/110389/136117 (a) (b) Figures / Graphs / Tables: @BreitingerC – Semantically Enriched Recommendations for Academic Literature
  • 10.
    Text-based measures • Identifyingtextual similarity (string- based similarity measures) • Considering semantic similarity, e.g. synonyms (knowledge-based similarity) Citation-based measures • Citation Proximity Analysis (CPA) [Gipp & Beel, 2009] • Citation-based Plagiarism Detection (CbPD) [Gipp, Meuschke, Breitinger, 2014] 106/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature Mathematical Language • Measures from Mathematical Information Retrieval (MIR) – extracting semantic meaning from mathematical content [Schubotz et al., 2016] Figures/ Images/ Graphs • Measures to assess the similarity of images & diagram understanding • Feature-point methods • SIFT, SURF, BRISK • Perceptual hashing • pHash, minHash Semantic Similarity Measures to be Considered
  • 11.
    Research Objective Conceive, implement,and evaluate a semantically-enriched recommendation approach that considers text-independent semantic markers (e.g. academic citations, mathematical formulae, and figures) to improve the recommendation of academic literature in the STEM fields. 116/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
  • 12.
    Research Tasks 1. Reviewtoday’s literature recommendation approaches: • What is being done to improve recommendation for less text-heavy literature from the STEM fields? • What are the special requirements for literature recommendation in the STEM field? 2. Conceive & design a recommendation approach that takes into account semantic markers prevalent in STEM literature (formulas, figures, and citations). 3. Implement the novel approach in a literature recommender system. 6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 12
  • 13.
    Research Tasks 4. Deriveappropriate weights for the different similarity measures depending on the user (his/ her information need) and the research discipline • e.g. math-heavy publications should place a higher weight on formulas and a lower weight on text-based similarity 5. Conceive visualization concepts to support the user in sense-making of the recommended literature sets. 6. Evaluate the recommender system • User studies 6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 13
  • 14.
    Background: Recommender Systems 6/7/1814@BreitingerC – Semantically Enriched Recommendations for Academic Literature • Recommendation approaches: • User-based approaches, e.g. collaborative filtering (CF) • Content-based filtering (CBF) • Combination of approaches • Academic Literature Recommendation: • In a review of 62 approaches for research paper recommendation, we found the majority of reviewed systems (55%) used content-based approaches for recommending related academic literature [Beel et al., 2016] Prior Work Beel, J., Gipp, B., Langer, S., and Breitinger, C. Research-paper Recommender Systems: A Literature Survey. International Journal on Digital Libraries 17, 4 (2016), 305–338.
  • 15.
    6/7/18 15 Prior Work B.Gipp, N. Meuschke, and C. Breitinger, “Citation-based Plagiarism Detection: Practicability on a Large-scale Scientific Corpus” Journal of the American Society for Information Science and Technology (JASIST), vol. 65, iss. 2, pp. 1527-1540, 2014. N. Meuschke, C. Gondek, D. Seebacher, C. Breitinger, D. Keim, and B. Gipp, “An Adaptive Image-based Plagiarism Detection Approach” in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Fort Worth, USA, 2018. à Applied to: academic plagiarism detection (PD) use case à Potential for: Recommending semantically relevant academic literature @BreitingerC – Semantically Enriched Recommendations for Academic Literature Background: Semantic Similarity Measures
  • 16.
    Prior Work: Citation-basedApproaches 6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 16 Doc C Doc E Doc D Section 1 This is an exampl etext withreferences to different documents for illustratingtheusageof citation analysis for plagiari sm detection. This is an exampl etext withreferences to different documents for illustrati ng the usage of citationanalysis forplagiarism detection . This is ain-text citation [1].This is an exampl etext withreferences to different documents for illustrating the usage of citation analysis for plagiari sm detection . This is an exampl e text withreferenc es to differentdocuments fori llustratingthe usage ofci tation analysis for plagiarism detection. Section 2 Another in-text citation [2].tThis is anexample text with references todifferent documents for illustrati ng the usage of citationanalysis forplagiarism detection. This is an ex ampletext with references to different documents for illustrati ng the usageof citation anal ysis for plagiarism detection. This is arepeated in-text citation [1]. This is an exampl etext withreferences to different documents for illustratingtheusageof citation analysis for plagiari sm detection. This is an exampl etext withreferences to different documents for illustrati ng the usage of citationanalysis forplagiarism detection . Setion 3 A third in-text citation [3].This is an exampl etext withreferences to different documents for illustrating the usage of citation analysis for plagiari sm detection . This is an exampl e text withreferenc es to differentdocuments fori llustratingthe usage ofci tation analysis for plagiarism detection. a final i n-text-citation[2]. References [1] [2] [3] Document B This is an exampl etext withreferences to different documents for illustratingtheusage ofci tation analysis for plagi arism detection. This is ain-text citation [1].This is an ex ampletext with references to different documents for illustrati ng the usageof citation anal ysis for plagiarism detection. Another exampl efor ani n-text citation [2]. This is an exampl etext withreferences to different documents for illustratingtheusage ofci tation analysis for plagi arism detection. This is an exampl etext withreferences to different documents for illustratingtheusage ofci tation analysis for plagi arism detection. This is an exampl etext withreferences to different documents for illustrati ng the usage of citationanalysis forplagiarism detection. This is an exampl etext withreferences to different documents for illustrating the usage ofcitation analysi s for pl agiarism detection. This is an exampl etext withreferences to different documents for illustratingtheusage ofci tation analysis for plagi arism detection. This is an exampl etext withreferences to different documents for illustrati ng the usage of citationanalysis forplagiarism detection. Here’s a third in-text citation [3].This is an exampl etext withreferences to different documents for illustrati ng the usage of citationanalysis forplagiarism detection. This is an exampl etext withreferences to different documents for illustratingtheusage ofci tation analysis for plagi arism detection. Document A References [1] [2] [3] EDC DECDC Citation Pattern Citation Pattern Doc A Doc B Ins.EIns.DC DECDC Pattern Comparison Doc A Doc B Prior Work Gipp, B., Meuschke, N., and Breitinger, C. Citation-based Plagiarism Detection: Practicability on a Large-scale Scientific Corpus. Journal of the American Society for Information Science and Technology (JASIST) 65, 8 (2014), 1527–1540. à Applied to the PD use case • Citation Proximity Analysis (CPA) • CbPD Approaches • Longest Common Citation Sequence (LCCS) • Greedy Citation Tiling (GCT) • Citation Chunking (Cit-Chunk)
  • 17.
    6/7/18 @BreitingerC –Semantically Enriched Recommendations for Academic Literature 17 Prior Work B. Gipp, N. Meuschke, C. Breitinger, M. Lipinski, and A. Nürnberger. Demonstration of the First Citation-based Plagiarism Detection Prototype. Proc. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR), pages 1119–1120, 2013. B. Gipp, N. Meuschke, and C. Breitinger, “Citation-based Plagiarism Detection: Practicability on a Large-scale Scientific Corpus” Journal of the American Society for Information Science and Technology (JASIST), vol. 65, iss. 2, pp. 1527-1540, 2014. • A visualization concept for citation-based similarity: CitePlag Prior Work: Citation-based Approaches
  • 18.
    Prior Work: Citation-basedApproaches 6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 18 Prior Work M. Schwarzer, C. Breitinger, M. Schubotz, N. Meuschke, and B. Gipp, “Citolytics - A Link-based Recommender System for Wikipedia” in Proceedings of the 11th ACM Conference on Recommender Systems (RecSys), 2017. Citolytics: A link-based Recommender System • CPA approach for the first time applied to the recommendation use case
  • 19.
    Ongoing: Citation-based Approaches 6/7/18@BreitingerC – Semantically Enriched Recommendations for Academic Literature 19 • Currently: User study of link-based recommendation performance • Findings: àlink-based vs. text-based recommendations address different information needs àSerendipitous vs. expected àExpert vs. novice Citolytics: A link-based Recommender System
  • 20.
    Prior Work: Image-basedSimilarity 6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 20 Prior Work N. Meuschke, C. Gondek, D. Seebacher, C. Breitinger, D. Keim, and B. Gipp, “An Adaptive Image-based Plagiarism Detection Approach” in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Fort Worth, USA, 2018. • Images/ charts / diagrams are highly valuable text- independent features [Kembhavi et al., 2016] • Variations in the representation of images must still be perceived as semantically similar • PD approach by Meuschke et al. integrates perceptual hashing with newly developed similarity assessments for images, such as ratio hashing and position-aware OCR text matching • Photographs & diagrams à perceptual hashing (+OCR matching) • Bar charts à ratio hashing
  • 21.
    Prior Work: Image-basedSimilarity @BreitingerC – Semantically Enriched Recommendations for Academic Literature 21 • ‘Adaptive process’ for image-based semantic similarity identification: 6/7/18 Prior Work N. Meuschke, C. Gondek, D. Seebacher, C. Breitinger, D. Keim, and B. Gipp, “An Adaptive Image-based Plagiarism Detection Approach” in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Fort Worth, USA, 2018.
  • 22.
    6/7/18 @BreitingerC –Semantically Enriched Recommendations for Academic Literature 22 Related Work [1] N. Meuschke, M. Schubotz, F. Hamborg, T. Skopal, and B. Gipp. 2017. Analyzing Mathematical Content to Detect Academic Plagiarism. In Proc. CIKM. [2] M. Schubotz, L. Kraemer, N. Meuschke, F. Hamborg, and B. Gipp, “Evaluating and Improving the Extraction of Mathematical Identifier Definitions” in Proceedings of the 8th International Conference of the CLEF Association (CLEF 2017), 2017. Related Work: Mathematical Formula Analysis Mathematical Information Retrieval (MIR) • Meuschke et al. demonstrated the benefit of analyzing the similarity of mathematical expressions to improve detection of heavily disguised academic plagiarism [1] • Schubotz et al. improved the extraction of math identifier definitions from the surrounding text –> results in a better understanding of the semantic meaning of a given formula [2]
  • 23.
    Applying Mathematical Formula Analysisto Recommendations 6/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature 23 • Consider Formula Semantics • the semantic meaning of a mathematical identifier or an entire mathematical equation (e.g. defined by the accepted name for the identifier or equation) • Consider Formula Patterns • Sequence of several mathematical formulas contained in a STEM document, which holds meaning (e.g. a derivation or proof) à The consideration of mathematical formula concepts and patterns can improve content-based recommendation approaches for STEM literature
  • 24.
    Semantically-enriched Literature Recommendations 246/7/18 @BreitingerC– Semantically Enriched Recommendations for Academic Literature Task 1: Creation of suitable recommendation corpus • PMC OAS • Life Sciences and Biomedical field • arXiv • Physics, Mathematics, Statistics, Quantitative Biology, Computer Science, and Electrical Engineering à ensure a broad and representative spectrum of STEM fields
  • 25.
    Semantically-enriched Literature Recommendations 256/7/18 @BreitingerC– Semantically Enriched Recommendations for Academic Literature Task 2: Adapt and combine the presented semantic similarity analysis approaches (currently applied to the PD use case) to suit the recommendation use case • Adjust parameters and test weightings to tailor existing approaches to STEM literature recommendation (and visualization)
  • 26.
    Semantically-enriched Literature Recommendations 266/7/18 @BreitingerC– Semantically Enriched Recommendations for Academic Literature Related Work J. Beel, A. Aizawa, C. Breitinger, and B. Gipp, “Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia,” in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017. J. Beel, B. Gipp, S. Langer, M. Genzmehr, E. Wilde, A. Nuernberger, and J. Pitman, “Introducing Mr. DLib, a Machine-readable Digital Library” in Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL`11), Ottawa, Canada, 2011. Task 3: Implement the semantically-enriched approach in a literature recommendation system • Make use of framework provided by Mr.DLib
  • 27.
    Semantically-enriched Literature Recommendations 276/7/18 @BreitingerC– Semantically Enriched Recommendations for Academic Literature Task 4: Conceive a suitable visualization concept • Quick identification & navigation of relevant semantic markers
  • 28.
    Semantically-enriched Literature Recommendations 286/7/18 @BreitingerC– Semantically Enriched Recommendations for Academic Literature Task 5: Evaluation • Hypothesis: taking into consideration semantic markers (citations, formulas) and similarity of images improves content-based literature recommendation & exploration for STEM literature • Comparison of different weighting schemes (textual similarity, citation- based similarity, formula-based similarity, image-based similarity) • Expert user study: • Usefulness of recommendations given pre-defined information needs • Easy of exploring and identifying relevant semantic features
  • 29.
    Completed Research 6/7/18 @BreitingerC– Semantically Enriched Recommendations for Academic Literature 29 • Implementation of a link-based recommendation approach as first proof of concept • Citolytics - applied to the Wikipedia corpus • System requirements engineering study completed • to be submitted: Conference on Human Information Interaction and Retrieval (CHIIR) • Current: • creation of a representative STEM literature dataset to be used for recommendation generation
  • 30.
    Summary 6/7/18 @BreitingerC –Semantically Enriched Recommendations for Academic Literature 31 Contributions: 1) Improve content-based recommendations by taking into consideration the similarity of semantic markers (citations, formulas) and figures à valuable for STEM literature recommendation! 2) Conceive a visualization concept, with the help of user studies, to aid users in more quickly identifying and navigating through these otherwise hard-to-identify forms of semantic relevance contained in academic literature recommendations
  • 31.
    Prior Work 326/7/18 @BreitingerC– Semantically Enriched Recommendations for Academic Literature Semantic document similarity analysis B. Gipp, N. Meuschke, and C. Breitinger, “Citation-based Plagiarism Detection: Practicability on a Large-scale Scientific Corpus.” Journal of the American Society for Information Science and Technology (JASIST), vol. 65, iss. 2, pp. 1527-1540, 2014. B. Gipp, N. Meuschke, C. Breitinger, M. Lipinski, and A. Nuernberger, “Demonstration of Citation Pattern Analysis for Plagiarism Detection.” In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, UK, 2013. Meuschke, Gipp, and Breitinger. CitePlag: A Citation-based PDS Prototype. Proc. Int. Plagiarism Conf.,2012. User Modeling J. Beel, S. Langer, G. M. Kapitsaki, C. Breitinger, and B. Gipp, “Exploring the Potential of User Modeling based on Mind Maps,” in User Modeling, Adaptation and Personalization – 23rd International Conference, UMAP 2015, Dublin, Ireland, 2015, pp. 3-17. Recommender Systems J. Beel, B. Gipp, S. Langer, and C. Breitinger, “Research-paper recommender systems: a literature survey,” International Journal on Digital Libraries, pp. 1-34, 2015. Trusted timestamping C. Breitinger. “Using the Blockchain of Cryptocurrencies to Encourage Open Discussion and Sharing of Ideas”. Master thesis: Linnaeus University, Sweden. June 2016.
  • 32.
    References [Beel et al.2016] Beel, J., Gipp, B., Langer, S., and Breitinger, C. Research-paper Recommender Systems: A Literature Survey. International Journal on Digital Libraries 17, 4 (2016), 305–338. [Gipp & Beel, 2009] B. Gipp and J. Beel, “Citation Proximity Analysis (CPA) – A New Approach for Identifying Related Work Based on Co-Citation Analysis,” in Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), Rio de Janeiro, Brazil, 2009. [Gipp, Meuschke, Breitinger, 2014] Gipp, Meuschke, Breitinger. “Citation-based Plagiarism Detection: Practicability on a Large-scale Scientific Corpus”. Journal of the American Society for Information Science and Technology, 65 (2): 1527–1540, 2014. [Kembhavi et al., 2016] Kembhavi, Aniruddha, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, and Ali Farhadi. "A Diagram Is Worth A Dozen Images." arXiv preprint arXiv:1603.07396 (2016). [Meuschke et al. 2018] N. Meuschke, C. Gondek, D. Seebacher, C. Breitinger, D. Keim, and B. Gipp, “An Adaptive Image-based Plagiarism Detection Approach” in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Fort Worth, USA, 2018. [Schubotz et al. 2016] M. Schubotz, A. Grigorev, M. Leich, H. S. Cohl, N. Meuschke, B. Gipp, A. S. Youssef, and V. Markl, “Semantification of Identifiers in Mathematics for Better Math Information Retrieval,” in Proceedings of the 39th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, 2016. 336/7/18 @BreitingerC – Semantically Enriched Recommendations for Academic Literature
  • 33.
    Thank you! 6/7/18 @BreitingerC– Semantically Enriched Recommendations for Academic Literature 34 Contact: corinna.breitinger@uni.kn @BreitingerC