SlideShare a Scribd company logo
1 of 16
An initial Analysis of
Topic-based Similarity
among Scientific Documents
based on their
Rhetorical Discourse Parts
ocorcho@fi.upm.es
@ocorcho ISWC’17
oeg-upm.net
Carlos Badenes-Olmedo
Jose Luis Redondo-Garcia
Oscar Corcho
Ontology Engineering Group
Universidad Politécnica de Madrid
Spain
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Motivation
2
How representative is an abstract?
Scientific Research
Practitioners
Reviewers
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Motivation
3
How representative are summaries based
on scientific discourse categories?
Scientific Research
Practitioners
Reviewers
approach
challenge
background
outcome
future work
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Representativeness
4
Full-Paper
Summary
Internal
External
finding related items
describing main ideas
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Probabilistic Topic Models
5
• Each document is a mixture of corpus-wide topics
• Each topic is a distribution over words
• Each word is drawn from one of those topics
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research,
Latent Dirichlet Allocation (LDA)
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Representativeness Measure
6
Internal
External
precision / recall / f-measure
JSD-based similarity
[d1,d2,d3,..dn] [s1,s2,s3,..sn]
[h1,h2,..hn] [j1,j2,..jn] [j1,j2,..jn] [k1,k2,..kn] [m1,m2,..mn]
Full-Paper Summary
JSD-based
similarity
JSD-based
similarity
• Feature vectors in Topic Models are topic distributions expressed as vectors of probabi
• The similarity measure used in our analysis is based on the Jensen Shannon-Divergen
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Evaluation
7
Advances in
Space Research
Procedia
Chemistry
Journal of
Pharmaceutical Analysis
Journal of
Web Semantics
Elsevier API
1000 papers
( + abstracts)
Topic
Model
discover
rhetorical
parts
training (only full-papers)
inference
1000 papers
( + abstracts,
+ discourse parts)
network of related papers
( + abstracts + discourse parts)
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Evaluation
8
Advances in Space Research
Corpus
Procedia Chemistry
Corpus
Journal of Pharmaceutical
Analysis Corpus
Journal of Web Semantics
Corpus
• http://librairy.linkeddata.es/resources/domains/aisr
Test
Corpus
• http://librairy.linkeddata.es/resources/domains/pc
• http://librairy.linkeddata.es/resources/domains/jopa
• http://librairy.linkeddata.es/resources/domains/jows
• http://librairy.linkeddata.es/resources/domains/group1
• Topics in a Corpus:
http://librairy.linkeddata.es/resources/domains/group1/topics?words=10
• Papers in a Corpus:
http://librairy.linkeddata.es/resources/domains/group1/items?size=10
Explore a Corpus
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Evaluation
9
Full-Paper
• Info:
http://librairy.linkeddata.es/resources/items/2-s2.0-84924147106?content=true
• Parts:
http://librairy.linkeddata.es/resources/items/2-s2.0-84924147106/parts
• abstract:
http://librairy.linkeddata.es/resources/parts/adfe85d9634654e4cfd7148be7cd2b29?content=true
• approach:
http://librairy.linkeddata.es/resources/parts/83f2b9722953034d7b6b50cbead4ec6b?content=true
• outcome:
http://librairy.linkeddata.es/resources/parts/61452a5ec420c8926160ae748c12a826?content=true
• challenge:
http://librairy.linkeddata.es/resources/parts/8858ef323fc09efbdcd46b9de45f146c?content=true
• background:
http://librairy.linkeddata.es/resources/parts/d118ef60d5e874d69d92c6b07be68b61?content=true
• future-work:
http://librairy.linkeddata.es/resources/parts/92be5400df5bb331e5f7f692e6b05bca?content=true
• Topic Distribution of Full-Paper:
http://librairy.linkeddata.es/resources/domains/group1/items/2-s2.0-84924147106/topics?words=15
• Topic Distribution of Abstract:
http://librairy.linkeddata.es/resources/domains/group1/parts/adfe85d9634654e4cfd7148be7cd2b29/topics?words=15
• Similarity between Full-Paper and Abstract:
http://librairy.linkeddata.es/resources/domains/group1/items/2-s2.0-
84924147106/relations?type=similarity&relatedId=adfe85d9634654e4cfd7148be7cd2b29
• Similarity between Full-Paper and Approach content:
http://librairy.linkeddata.es/resources/domains/group1/items/2-s2.0-
84924147106/relations?type=similarity&relatedId=83f2b9722953034d7b6b50cbead4ec6b
Internal Representativeness
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Evaluation
10
• Similar papers to Full-Paper:
http://librairy.linkeddata.es/resources/domains/group1/items/2-s2.0-
84924147106/relations?type=similarity&resourceType=item&size=5
• Similar papers to Abstract:
http://librairy.linkeddata.es/resources/domains/group1/parts/adfe85d9634654e4cfd7148be7cd2b29/relations?type=simila
rity&resourceType=item&size=5
• Similar papers to Approach content:
http://librairy.linkeddata.es/resources/domains/group1/parts/83f2b9722953034d7b6b50cbead4ec6b/relations?type=simil
arity&resourceType=item&size=5
• Similar summaries to a Full-Paper:
http://librairy.linkeddata.es/resources/domains/group1/items/2-s2.0-
84924147106/relations?type=similarity&resourceType=part&size=5
• Similar summaries to an Abstract:
http://librairy.linkeddata.es/resources/domains/group1/parts/adfe85d9634654e4cfd7148be7cd2b29/relations?type=simila
rity&resourceType=part&size=5
• Similar summaries to Approach:
http://librairy.linkeddata.es/resources/domains/group1/parts/83f2b9722953034d7b6b50cbead4ec6b/relations?type=simil
arity&resourceType=part&size=5
External Representativeness
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Results: Size of Summaries
11
The approach, the background
and the outcome content of a
paper generate more accurate
topic distributions than those
created from other approaches
as the abstract.
Since LDA considers documents
as bag-of-words, the text length
affects the accuracy of the topic
distributions inferred by the
model
Relative size of summaries respect to full-paper
Absolute size of summaries (in number of characters)
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Results: Internal Representativeness
12
• The Internal Representativeness of a summary measures the similarity of
this summary against the original full-text research paper
• This similarity is based on the JSD between the topic distribution of each
of them
• Results suggest than the distribution of topics describing the text created
from the approach content is the most similar to the one corresponding to
the full-content of the paper
internal-representativeness
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Results: External Representativeness
13
• The External Representativeness of a summary measures how different
is the set of related documents obtained with respect to those derived
from the original text
• Similarity thresholds from 0.5 to 0.99 were considered in experiments
precision recall
• In terms of recall, the upward trend followed by the approach, the
outcome and the background content remarks the assumption of
summaries containing key words allow to discover more similar papers
than others
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Results: External Representativeness
14
f-measure
• For higher similarity thresholds, i.e. for strongly related papers, the
recommendations discovered by using the approach are more precise
than those discovered by using the abstract.
An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts
Conclusions
15
• We have studied the Topic-based similarities among scientific documents
based on their abstract sections with respect to summaries
corresponding to their scientific discourse categories.
• Two novel measures have been proposed: (1) internal-
representativeness and (2) external-representativeness.
• Results show that summaries created from the approach, outcome or
background content of a paper describe more accurately its full-content in
terms of overall ideas and related documents than abstracts.
• In order to avoid an influence of the size of the summaries on the
accuracy of the results, in future work we plan to use probabilistic topic
model algorithms oriented to handle short-texts such as BTM to describe
texts .
An initial Analysis of
Topic-based Similarity
among Scientific Documents
based on their
Rhetorical Discourse Parts
Carlos Badenes-Olmedo
Jose Luis Redondo-Garcia
Oscar Corcho
Ontology Engineering Group
Universidad Politécnica de Madrid
Spain
ocorcho@fi.upm.es
@ocorcho ISWC’17
oeg-upm.net

More Related Content

What's hot

Compressed full text indexes
Compressed full text indexesCompressed full text indexes
Compressed full text indexesunyil96
 
Perspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textPerspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textJennifer D'Souza
 
LOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataLOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataFilip Ilievski
 
Lotus: Linked Open Text UnleaShed - ISWC COLD '15
Lotus: Linked Open Text UnleaShed - ISWC COLD '15Lotus: Linked Open Text UnleaShed - ISWC COLD '15
Lotus: Linked Open Text UnleaShed - ISWC COLD '15Filip Ilievski
 
Combining Approximate String Matching Algorithms and Term Frequency In The De...
Combining Approximate String Matching Algorithms and Term Frequency In The De...Combining Approximate String Matching Algorithms and Term Frequency In The De...
Combining Approximate String Matching Algorithms and Term Frequency In The De...CSCJournals
 
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Antoine Isaac
 
Linking Folksonomies to Knowledge Organization Systems
Linking Folksonomies to Knowledge Organization SystemsLinking Folksonomies to Knowledge Organization Systems
Linking Folksonomies to Knowledge Organization SystemsJakob .
 
Open minted content_provision
Open minted content_provisionOpen minted content_provision
Open minted content_provisionLucas anastasiou
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining TechniquesHouw Liong The
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)Uma Se
 
Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...
Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...
Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...infoclio.ch
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrievalunyil96
 
Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview   Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview Jennifer D'Souza
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)Svitlana volkova
 
A Survey on Text Mining-techniques and application
A Survey on Text Mining-techniques and applicationA Survey on Text Mining-techniques and application
A Survey on Text Mining-techniques and applicationRyota Eisaki
 

What's hot (18)

Compressed full text indexes
Compressed full text indexesCompressed full text indexes
Compressed full text indexes
 
Perspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textPerspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from text
 
LOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataLOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked Data
 
Lotus: Linked Open Text UnleaShed - ISWC COLD '15
Lotus: Linked Open Text UnleaShed - ISWC COLD '15Lotus: Linked Open Text UnleaShed - ISWC COLD '15
Lotus: Linked Open Text UnleaShed - ISWC COLD '15
 
Combining Approximate String Matching Algorithms and Term Frequency In The De...
Combining Approximate String Matching Algorithms and Term Frequency In The De...Combining Approximate String Matching Algorithms and Term Frequency In The De...
Combining Approximate String Matching Algorithms and Term Frequency In The De...
 
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
 
Linking Folksonomies to Knowledge Organization Systems
Linking Folksonomies to Knowledge Organization SystemsLinking Folksonomies to Knowledge Organization Systems
Linking Folksonomies to Knowledge Organization Systems
 
Open minted content_provision
Open minted content_provisionOpen minted content_provision
Open minted content_provision
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
 
Web and text
Web and textWeb and text
Web and text
 
Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...
Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...
Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
Data Science Workshop
Data Science WorkshopData Science Workshop
Data Science Workshop
 
Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview   Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)
 
Pipe dreams
Pipe dreamsPipe dreams
Pipe dreams
 
A Survey on Text Mining-techniques and application
A Survey on Text Mining-techniques and applicationA Survey on Text Mining-techniques and application
A Survey on Text Mining-techniques and application
 

Similar to An initial analysis of topic-based similarity among scientific documents based on their rhetorical discourse parts

20131005_Reviewing the literature.pdf
20131005_Reviewing the literature.pdf20131005_Reviewing the literature.pdf
20131005_Reviewing the literature.pdfOsmanAli92
 
Writing research thesis literature review
Writing research thesis literature reviewWriting research thesis literature review
Writing research thesis literature reviewMuhammad Riaz
 
Lecture 6 - Literature Review.pptx
Lecture 6 - Literature Review.pptxLecture 6 - Literature Review.pptx
Lecture 6 - Literature Review.pptxHafeezUllah783173
 
Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...
Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...
Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...Michael Levine-Clark
 
Experimental psychology spring 2015
Experimental psychology   spring 2015Experimental psychology   spring 2015
Experimental psychology spring 2015k-baril
 
Study design & anatomy of scientific research
Study design & anatomy of scientific researchStudy design & anatomy of scientific research
Study design & anatomy of scientific researchDocIbrahimAbdelmonaem
 
Literature Review.ppt
Literature Review.pptLiterature Review.ppt
Literature Review.pptHadiTak1
 
MELJUN CORTES research seminar_1__preparing_your_paper_summer_1516
MELJUN CORTES research seminar_1__preparing_your_paper_summer_1516MELJUN CORTES research seminar_1__preparing_your_paper_summer_1516
MELJUN CORTES research seminar_1__preparing_your_paper_summer_1516MELJUN CORTES
 
Unit 6. Literature Review & Synthesis.pptx
Unit 6. Literature Review & Synthesis.pptxUnit 6. Literature Review & Synthesis.pptx
Unit 6. Literature Review & Synthesis.pptxshakirRahman10
 
Literature Search and Review
Literature Search and ReviewLiterature Search and Review
Literature Search and Review Dave Marcial
 
Literature Review - How to write effectively.pptx
Literature Review - How to write effectively.pptxLiterature Review - How to write effectively.pptx
Literature Review - How to write effectively.pptxnguyenlekhanhx02
 
Literature Review and Research Related Problems
Literature Review and Research Related ProblemsLiterature Review and Research Related Problems
Literature Review and Research Related ProblemsChris Okiki
 
كيفية كتابة المسح الأدبي
كيفية كتابة المسح الأدبيكيفية كتابة المسح الأدبي
كيفية كتابة المسح الأدبيresearchcenterm
 
PSYC 3401
PSYC 3401PSYC 3401
PSYC 3401Traciwm
 

Similar to An initial analysis of topic-based similarity among scientific documents based on their rhetorical discourse parts (20)

20131005_Reviewing the literature.pdf
20131005_Reviewing the literature.pdf20131005_Reviewing the literature.pdf
20131005_Reviewing the literature.pdf
 
Writing research thesis literature review
Writing research thesis literature reviewWriting research thesis literature review
Writing research thesis literature review
 
Lecture 6 - Literature Review.pptx
Lecture 6 - Literature Review.pptxLecture 6 - Literature Review.pptx
Lecture 6 - Literature Review.pptx
 
Review.pdf
Review.pdfReview.pdf
Review.pdf
 
Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...
Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...
Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...
 
Experimental psychology spring 2015
Experimental psychology   spring 2015Experimental psychology   spring 2015
Experimental psychology spring 2015
 
Study design & anatomy of scientific research
Study design & anatomy of scientific researchStudy design & anatomy of scientific research
Study design & anatomy of scientific research
 
Literature Review.ppt
Literature Review.pptLiterature Review.ppt
Literature Review.ppt
 
MELJUN CORTES research seminar_1__preparing_your_paper_summer_1516
MELJUN CORTES research seminar_1__preparing_your_paper_summer_1516MELJUN CORTES research seminar_1__preparing_your_paper_summer_1516
MELJUN CORTES research seminar_1__preparing_your_paper_summer_1516
 
Unit 6. Literature Review & Synthesis.pptx
Unit 6. Literature Review & Synthesis.pptxUnit 6. Literature Review & Synthesis.pptx
Unit 6. Literature Review & Synthesis.pptx
 
Literature Search and Review
Literature Search and ReviewLiterature Search and Review
Literature Search and Review
 
Chapter-2-1.pptx
Chapter-2-1.pptxChapter-2-1.pptx
Chapter-2-1.pptx
 
Literature Review - How to write effectively.pptx
Literature Review - How to write effectively.pptxLiterature Review - How to write effectively.pptx
Literature Review - How to write effectively.pptx
 
Literature Review and Research Related Problems
Literature Review and Research Related ProblemsLiterature Review and Research Related Problems
Literature Review and Research Related Problems
 
محاضرة 2
محاضرة 2محاضرة 2
محاضرة 2
 
كيفية كتابة المسح الأدبي
كيفية كتابة المسح الأدبيكيفية كتابة المسح الأدبي
كيفية كتابة المسح الأدبي
 
3.rm the literature review
3.rm the literature review3.rm the literature review
3.rm the literature review
 
08. EDT 513 2023 Week 8.pptx
08. EDT 513 2023 Week 8.pptx08. EDT 513 2023 Week 8.pptx
08. EDT 513 2023 Week 8.pptx
 
PPT on literature review.pdf
PPT on literature review.pdfPPT on literature review.pdf
PPT on literature review.pdf
 
PSYC 3401
PSYC 3401PSYC 3401
PSYC 3401
 

More from Oscar Corcho

Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOrganisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOscar Corcho
 
Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Oscar Corcho
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management Oscar Corcho
 
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosAdiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosOscar Corcho
 
Ontology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOntology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOscar Corcho
 
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Oscar Corcho
 
STARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaSTARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaOscar Corcho
 
Towards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceTowards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceOscar Corcho
 
Publishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyPublishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyOscar Corcho
 
Linked Statistical Data 101
Linked Statistical Data 101Linked Statistical Data 101
Linked Statistical Data 101Oscar Corcho
 
Aplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMETAplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMET Oscar Corcho
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Oscar Corcho
 
Educando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadEducando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadOscar Corcho
 
STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016Oscar Corcho
 
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaGeneración de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaOscar Corcho
 
Presentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesPresentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesOscar Corcho
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Oscar Corcho
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Oscar Corcho
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 

More from Oscar Corcho (20)

Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOrganisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
 
Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management
 
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosAdiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
 
Ontology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOntology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data Sharing
 
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
 
STARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaSTARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación Lumínica
 
Towards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceTowards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experience
 
Publishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyPublishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case study
 
Linked Statistical Data 101
Linked Statistical Data 101Linked Statistical Data 101
Linked Statistical Data 101
 
Aplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMETAplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMET
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016
 
Educando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadEducando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidad
 
STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016
 
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaGeneración de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
 
Presentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesPresentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart Cities
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 

Recently uploaded

WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2WSO2
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2
 
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & InnovationWSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & InnovationWSO2
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfryanfarris8
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2
 

Recently uploaded (20)

WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & InnovationWSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 

An initial analysis of topic-based similarity among scientific documents based on their rhetorical discourse parts

  • 1. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts ocorcho@fi.upm.es @ocorcho ISWC’17 oeg-upm.net Carlos Badenes-Olmedo Jose Luis Redondo-Garcia Oscar Corcho Ontology Engineering Group Universidad Politécnica de Madrid Spain
  • 2. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Motivation 2 How representative is an abstract? Scientific Research Practitioners Reviewers
  • 3. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Motivation 3 How representative are summaries based on scientific discourse categories? Scientific Research Practitioners Reviewers approach challenge background outcome future work
  • 4. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Representativeness 4 Full-Paper Summary Internal External finding related items describing main ideas
  • 5. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Probabilistic Topic Models 5 • Each document is a mixture of corpus-wide topics • Each topic is a distribution over words • Each word is drawn from one of those topics Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, Latent Dirichlet Allocation (LDA)
  • 6. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Representativeness Measure 6 Internal External precision / recall / f-measure JSD-based similarity [d1,d2,d3,..dn] [s1,s2,s3,..sn] [h1,h2,..hn] [j1,j2,..jn] [j1,j2,..jn] [k1,k2,..kn] [m1,m2,..mn] Full-Paper Summary JSD-based similarity JSD-based similarity • Feature vectors in Topic Models are topic distributions expressed as vectors of probabi • The similarity measure used in our analysis is based on the Jensen Shannon-Divergen
  • 7. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Evaluation 7 Advances in Space Research Procedia Chemistry Journal of Pharmaceutical Analysis Journal of Web Semantics Elsevier API 1000 papers ( + abstracts) Topic Model discover rhetorical parts training (only full-papers) inference 1000 papers ( + abstracts, + discourse parts) network of related papers ( + abstracts + discourse parts)
  • 8. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Evaluation 8 Advances in Space Research Corpus Procedia Chemistry Corpus Journal of Pharmaceutical Analysis Corpus Journal of Web Semantics Corpus • http://librairy.linkeddata.es/resources/domains/aisr Test Corpus • http://librairy.linkeddata.es/resources/domains/pc • http://librairy.linkeddata.es/resources/domains/jopa • http://librairy.linkeddata.es/resources/domains/jows • http://librairy.linkeddata.es/resources/domains/group1 • Topics in a Corpus: http://librairy.linkeddata.es/resources/domains/group1/topics?words=10 • Papers in a Corpus: http://librairy.linkeddata.es/resources/domains/group1/items?size=10 Explore a Corpus
  • 9. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Evaluation 9 Full-Paper • Info: http://librairy.linkeddata.es/resources/items/2-s2.0-84924147106?content=true • Parts: http://librairy.linkeddata.es/resources/items/2-s2.0-84924147106/parts • abstract: http://librairy.linkeddata.es/resources/parts/adfe85d9634654e4cfd7148be7cd2b29?content=true • approach: http://librairy.linkeddata.es/resources/parts/83f2b9722953034d7b6b50cbead4ec6b?content=true • outcome: http://librairy.linkeddata.es/resources/parts/61452a5ec420c8926160ae748c12a826?content=true • challenge: http://librairy.linkeddata.es/resources/parts/8858ef323fc09efbdcd46b9de45f146c?content=true • background: http://librairy.linkeddata.es/resources/parts/d118ef60d5e874d69d92c6b07be68b61?content=true • future-work: http://librairy.linkeddata.es/resources/parts/92be5400df5bb331e5f7f692e6b05bca?content=true • Topic Distribution of Full-Paper: http://librairy.linkeddata.es/resources/domains/group1/items/2-s2.0-84924147106/topics?words=15 • Topic Distribution of Abstract: http://librairy.linkeddata.es/resources/domains/group1/parts/adfe85d9634654e4cfd7148be7cd2b29/topics?words=15 • Similarity between Full-Paper and Abstract: http://librairy.linkeddata.es/resources/domains/group1/items/2-s2.0- 84924147106/relations?type=similarity&relatedId=adfe85d9634654e4cfd7148be7cd2b29 • Similarity between Full-Paper and Approach content: http://librairy.linkeddata.es/resources/domains/group1/items/2-s2.0- 84924147106/relations?type=similarity&relatedId=83f2b9722953034d7b6b50cbead4ec6b Internal Representativeness
  • 10. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Evaluation 10 • Similar papers to Full-Paper: http://librairy.linkeddata.es/resources/domains/group1/items/2-s2.0- 84924147106/relations?type=similarity&resourceType=item&size=5 • Similar papers to Abstract: http://librairy.linkeddata.es/resources/domains/group1/parts/adfe85d9634654e4cfd7148be7cd2b29/relations?type=simila rity&resourceType=item&size=5 • Similar papers to Approach content: http://librairy.linkeddata.es/resources/domains/group1/parts/83f2b9722953034d7b6b50cbead4ec6b/relations?type=simil arity&resourceType=item&size=5 • Similar summaries to a Full-Paper: http://librairy.linkeddata.es/resources/domains/group1/items/2-s2.0- 84924147106/relations?type=similarity&resourceType=part&size=5 • Similar summaries to an Abstract: http://librairy.linkeddata.es/resources/domains/group1/parts/adfe85d9634654e4cfd7148be7cd2b29/relations?type=simila rity&resourceType=part&size=5 • Similar summaries to Approach: http://librairy.linkeddata.es/resources/domains/group1/parts/83f2b9722953034d7b6b50cbead4ec6b/relations?type=simil arity&resourceType=part&size=5 External Representativeness
  • 11. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Results: Size of Summaries 11 The approach, the background and the outcome content of a paper generate more accurate topic distributions than those created from other approaches as the abstract. Since LDA considers documents as bag-of-words, the text length affects the accuracy of the topic distributions inferred by the model Relative size of summaries respect to full-paper Absolute size of summaries (in number of characters)
  • 12. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Results: Internal Representativeness 12 • The Internal Representativeness of a summary measures the similarity of this summary against the original full-text research paper • This similarity is based on the JSD between the topic distribution of each of them • Results suggest than the distribution of topics describing the text created from the approach content is the most similar to the one corresponding to the full-content of the paper internal-representativeness
  • 13. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Results: External Representativeness 13 • The External Representativeness of a summary measures how different is the set of related documents obtained with respect to those derived from the original text • Similarity thresholds from 0.5 to 0.99 were considered in experiments precision recall • In terms of recall, the upward trend followed by the approach, the outcome and the background content remarks the assumption of summaries containing key words allow to discover more similar papers than others
  • 14. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Results: External Representativeness 14 f-measure • For higher similarity thresholds, i.e. for strongly related papers, the recommendations discovered by using the approach are more precise than those discovered by using the abstract.
  • 15. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Conclusions 15 • We have studied the Topic-based similarities among scientific documents based on their abstract sections with respect to summaries corresponding to their scientific discourse categories. • Two novel measures have been proposed: (1) internal- representativeness and (2) external-representativeness. • Results show that summaries created from the approach, outcome or background content of a paper describe more accurately its full-content in terms of overall ideas and related documents than abstracts. • In order to avoid an influence of the size of the summaries on the accuracy of the results, in future work we plan to use probabilistic topic model algorithms oriented to handle short-texts such as BTM to describe texts .
  • 16. An initial Analysis of Topic-based Similarity among Scientific Documents based on their Rhetorical Discourse Parts Carlos Badenes-Olmedo Jose Luis Redondo-Garcia Oscar Corcho Ontology Engineering Group Universidad Politécnica de Madrid Spain ocorcho@fi.upm.es @ocorcho ISWC’17 oeg-upm.net