SlideShare a Scribd company logo
Research around and about
the scientific paper in the
biomedical domain.
Supporting Literature Based
Discovery

From the paper to the data back and forth

Alexander Garcia, PhD.
FSU
350 Years and Counting
 Scientific articles have adopted electronic dissemination
channels
 Scholarly communication has been complemented by
the adoption of blogs, mailing lists, social networks, and
other technologies

 Information remains locked up in PDFs
And so we are…
Managing the publication on a postmortem basis…

The paper as an interface to the Web of Data?
The problem remains, so…
To be born semantics… why not?
Heading towards
 A semantic document, one where human-readable
knowledge is augmented to enable its interpretation by
machine
 A human interpretable document fully procesable by
machines

 Human interoperability and machine interoperability
 Literature Based Discovery and the Paper as an interface
to the WoD
We all know that
 Information is locked up in discrete documents
 Mostly PDF
 Controlled vocabularies are not always available
 Text Mining depends on availability of data
 Poor metadata
Agenda
Biotea
Citagora
Semantic documents as scaffolds for research objects

Human interoperability and machine interoperability
Literature Based Discovery
• The key idea is: putting together explicit
assertions from different papers to form
new implicit assertions

– PTSD and suicide
– Magnesium-migraine
– Fish oil-Raynaud’s or calcium-channel blokers

• Sophisticated access to online information
• Supplement document retrieval with:
– Information extraction
– Automatic summarization
– Question answering
The White Paper Challenge
 Search and Retrieval
How to get relevant documents faster
Info Sources
Query Builders
Notifications
How to “scan” the document in a meaningful
manner?

How to repurpose fragments of the documents?
Literature Discovery Process
 Search
 Usually string-based search mechanisms
 Little cognitive support

 Retrieval
 Simple list of DB entries
 Little cognitive support

 Interacting with the document
 Straight into the PDF
 Zero cognitive support
 Data availability
Literature Discovery Process
 Search
 Usually string-based search mechanisms
 Little cognitive support

 Retrieval
 Simple list of DB entries
 Little cognitive support

 Interacting with the document
 Straight into the PDF

 Zero cognitive support
Literature Discovery Process
 Search
 Usually string-based search mechanisms
 Little cognitive support

 Retrieval
 Simple list of DB entries
 Little cognitive support

 Interacting with the document
 Straight into the PDF

 Zero cognitive support
Challenge: Language Complexity
The average age of participants (approximately 63
years), the predominance of women, and the high
prevalence of comorbid conditions (for
example, hypertension and cardiovascular disease) reflect
typical characteristics of patients with osteoarthritis.
Language encodes a lot of information
Words and Phrases
age
approximately
average
cardiovascular
characteristics
comorbid
conditions
disease
example
high

average age of
participants
approximately 63 years
predominance of women
high prevalence
comorbid conditions
Semantic Predications
The average age of participants (approximately 63
years), the predominance of women, and the high
prevalence of comorbid conditions (for
example, hypertension and cardiovascular disease)
reflect typical characteristics of patients with
osteoarthritis.
Semantic Predications
Cardiovascular Diseases
CO-OCCURS_WITH
Degenerative polyarthritis
Hypertension
CO-OCCURS_WITH
Degenerative polyarthritis
Suicide Ideation
CO-OCCURS_WITH
Suicide Risk
What is needed
 Disambiguate Text and tag/link concepts
 Meta-analyse information at concept level
 Provide meta-analysed information
 Support Information Based Knowledge Discovery
(especially new associations)
In order to support
Literature Based
Discovery
 Ontologies
 Communities
 Annotation
 Machinereadable
documents
In a nutshell….
…documents as interfaces
to the Web of Data….

Biotea

• Machine-readable and
procesable documents
• Interactive documents
• Enriched metadata
• Full content
management, document
centric
• Social hub

Citagora

-Aggregated search
-Single entry point
-Social hub
-Citation centric
Biotea in a nutshell
 It is a knowledge model for biomedical literature
 We are semantically annotating literature with text mining
and ontologies
 Delivers a network of interrelated documents
 Delivers a semantic infrastructure for PMC and scientific
literature in general
PMC RDFication
Metadata+
Content +
References

References
Enrichment

RDF Generation

RDFReacto
r

PMC XML
RDF4PMC, some results
Makes possible


How similar are two articles?  based
on
authors, keywords, abstracts, ontologi
cal terms



Metadata +
Content +
References

What articles use this reference in a
section with title “Results”?

Annotations
Makes possible
•
How similar are two articles?
 based on semantic
distance
•
Which annotation co-occurs
more with this “YYY”
annotation?
•
Which articles include “TERM”
but not this other “TERM”?

Annotations
Some numbers, article PMC126253
“Computational method for
reducing variance with
Affymetrix microarrays”
•
NCBO
•
Annotations: 407
•
Topics: 633
•
Whatizit
•
Annotations: 14
•
Topics: 203

Delivering: the platform that makes possible to build interactive environments for semantic publications
A dashboard for semantic biopublications

Semantically
enriched
publication
Metadata+
Content +
References

SPARQL

Catalase

Automatically
Annotated
RDF
Cloud of Bioannotations
(term + # of bioentities)
Title &
authors

Links

Abstra
ct

Paragraphs
containing the
annotation selected
by the user
Bio-entities for the
annotation selected

Enriched content: interactive zone for
the bio-entity selected by user
Citagora
 An Agora for Citations
 From Citations to Social Web to an Interactive Document
 Aggregating activity from Social Networks, Reference
Management Systems, Blogs, Publishers, etc.

 Aggregating sources from Google Scholar, Microsoft
Academics, Zotero, Mendely, etc.
What is MSRC.CITAGORA?
Corpus of documents for one specific domain

•
•
•

BibRef centric
Enrichment mechanism
Based on heterogeneous data
sources, aggregator
o

•

o

Heterogeneous BibRef data sources
Heterogeneous PDF layouts

Value in
o
o
o
o

Enriching semantics around the BibRef
Aggregating social activity around the BibRef

Social activity as part of the BifRef
Making use of the content without exposing it
DATA for and compatible with the Web of Data
MSRC.CITAGORA
Data Source
Data Sources, may be users
uploading ENL files, that have
for
each
record
the
corresponding PDF.
Result
from
harvesting
Mendeley, ZOTERO, Elsevier
API, Microsoft Academics
API, etc.

Extracting Meaningful
Information by
Processing the Data
Source
-List of references
this document
cites_to
-Meaningful bag of
words
Authors, affiliations,
emails

Outcome: RDF
-BibRef for the
original PDF
-Annotations
for the whole
document
-Text
-List of cites_to
MSRC.CITAGORA
Citagora
Harvester

Citation
Metadata
&
References

Database

S2T

PDFs

Basic
XML

Enhanced
XML

Ontology /
Citation
References Vocabulary

Documen Query
Search
t
Database Engine
RDF
SPARQL

Interface
(Search +
Tag
Browser)
Moving Towards OPEN.CITAGORA
Lets build the largest OPEN repository of everything around a
standardized interoperable bibliographic reference

Annotations

has_part
BibRef

has_part

has_part

has_part

Living in the Web of Data
References

Content

PDF
Focus for OPEN.CITAGORA
Data
Interoperability
Unlocking valuable information from the PDF
Home of the largest collection of scientific bibliographic
references and literature
Semantic Enrichment
Jailbreaking
PDF

Content is
locked up

Meaningful Text
Citations, cites_t
o
this paper
cites_to
-Authors
-this paper
has_authors
-Title, DOI, etc
-Content as text
-Bag of words
describing
content

Annotations

PDF
has_part

has_part
BibRef
has_part

has_part

Content

References
Semantic Enrichment
Jailbreaking
BibRef

PDF

Meaningful Text
-Citations,
cites_to
Heterogeneous Content is
this paper
locked up
formats
cites_to
Diversity in APIs
-Authors
for collecting
-this paper
BibRefs
has_authors
Poor in
-Title, DOI, etc
descriptors
-Content as text
anchored in the
-Bag of words
content
Not justdescribing
about the
Louzy
content
PDF
metadata
Standardization, all in one place, one
URI, etc

Annotatio
ns

PDF
has_p
art

has_p
art
BibRef

has_p
art

Reference
s

has_p
art
Conte
nt
Translational Research
 How is MSRC contributing to Translational Research in
Clinical Psychology?
 Data Standards
 Semantic Infrastructure
 Bridging the gap between documents and data
repositories
Narrative
Text
Usable by humans and comp

The paper as a
Research Object

The RO is a fluid structured grid
About data

Data Processing

Data Processing

BibRef Object BibRef Object

Data

The RO is a fluid structured grid
Rhetorical structure: Header, Body.

Lab
Notebook
BIBLIOGRAPHIC RECORD:
CiTO+FaBIO

HEAD: Bibliographic
record (this paper),
KeyWords, Author
Contacts

AUTHOR CONTACT: FOAF

RHETORIC
INFORMATION + EVIDENCE (external):
SWAN-SIOC + CiTO + FaBIO

SCIENTIFIC
PAPER: Head,
Body, Tail

BODY: Rhetoric,
Information,
Evidence

METHODS & MATERIALS:
REAGENTS,
PROTOCOLS,
EQUIPMENT,
INSTRUMENTATION
INFORMATION +
EVIDENCE (internal):
METHODS &
MATERIALS,
EXPERIMENTAL
DESIGN, DATA &
COMPUTATIONS,
INTERPRETATIONS

REAGENTS:
SemRes Antibodies,
SemRes Mouse Models

EXPERIMENTAL DESIGN:
SWAN Data + Experiment, OBI, myExperiment

DATA & COMPUTATIONS:
SWAN Data+Experiment,
OBI, SWAN, myExperiment

INTERPRETATIONS:
SWAN-SIOC

TAIL: Bibliographic
records (papers cited
as external evidence)

BIBLIOGRAPHIC RECORDS:
SWAN Collections, CiTO+FaBIO
We have learned so far
 Born semantic enables the semantics to be of use to the
authors, as they are present in the publication process
from the start. To add value for readers and
computational consumption these semantics must then
be "preserved” throughout the publication process;
so, we need to address the publication process to
achieve this goal.
Acknowledgments
 Special Thanks to John Gomez, John Patterson, Dietrich
Rebholz-Schuhmann, Robert Morris, Oscar Corcho, Diane
Leiva and Greg Riccardi

More Related Content

What's hot

Library Resources for EN4271
Library Resources for EN4271Library Resources for EN4271
Library Resources for EN4271NUS Libraries
 
MR^3: Meta-Model Management based on RDFs Revision Reflection
MR^3: Meta-Model Management based on RDFs Revision ReflectionMR^3: Meta-Model Management based on RDFs Revision Reflection
MR^3: Meta-Model Management based on RDFs Revision Reflection
Takeshi Morita
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
Paul Groth
 
Informatics Transkills 2006-7
Informatics Transkills 2006-7Informatics Transkills 2006-7
Informatics Transkills 2006-7
skelly
 
Informatics UG4 2006-7
Informatics UG4 2006-7Informatics UG4 2006-7
Informatics UG4 2006-7
skelly
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
National Information Standards Organization (NISO)
 
Index nominum to ontology
Index nominum to ontologyIndex nominum to ontology
Index nominum to ontology
Gautier Poupeau
 
Literature Search Importance & Techniques
Literature Search   Importance & TechniquesLiterature Search   Importance & Techniques
Literature Search Importance & Techniques
Dr. Rupak Chakravarty
 
Open hpi semweb-06-part4
Open hpi semweb-06-part4Open hpi semweb-06-part4
Open hpi semweb-06-part4Nadine Ludwig
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
Cason Snow
 
Informatics UG1 2006-7
Informatics UG1 2006-7Informatics UG1 2006-7
Informatics UG1 2006-7
skelly
 
SA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated ContentSA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated Content
John Breslin
 
Advanced literature searching for health sciences
Advanced literature searching for health sciencesAdvanced literature searching for health sciences
Advanced literature searching for health sciences
e1033930
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
LeeFeigenbaum
 
literature search
literature searchliterature search
literature search
Dr. Nirmal Kumar Swain
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
Bernadette Hyland-Wood
 
The OpenOffice.org ODF Toolkit Project
The OpenOffice.org ODF Toolkit ProjectThe OpenOffice.org ODF Toolkit Project
The OpenOffice.org ODF Toolkit ProjectAlexandro Colorado
 
Citation
CitationCitation
Citation
sandersapril
 

What's hot (20)

Library Resources for EN4271
Library Resources for EN4271Library Resources for EN4271
Library Resources for EN4271
 
Business research lec5
Business research lec5Business research lec5
Business research lec5
 
MR^3: Meta-Model Management based on RDFs Revision Reflection
MR^3: Meta-Model Management based on RDFs Revision ReflectionMR^3: Meta-Model Management based on RDFs Revision Reflection
MR^3: Meta-Model Management based on RDFs Revision Reflection
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Informatics Transkills 2006-7
Informatics Transkills 2006-7Informatics Transkills 2006-7
Informatics Transkills 2006-7
 
Informatics UG4 2006-7
Informatics UG4 2006-7Informatics UG4 2006-7
Informatics UG4 2006-7
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
Index nominum to ontology
Index nominum to ontologyIndex nominum to ontology
Index nominum to ontology
 
Literature Search Importance & Techniques
Literature Search   Importance & TechniquesLiterature Search   Importance & Techniques
Literature Search Importance & Techniques
 
Open hpi semweb-06-part4
Open hpi semweb-06-part4Open hpi semweb-06-part4
Open hpi semweb-06-part4
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
 
Informatics UG1 2006-7
Informatics UG1 2006-7Informatics UG1 2006-7
Informatics UG1 2006-7
 
SA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated ContentSA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated Content
 
Advanced literature searching for health sciences
Advanced literature searching for health sciencesAdvanced literature searching for health sciences
Advanced literature searching for health sciences
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 
Sindice warehousing meetup
Sindice warehousing meetupSindice warehousing meetup
Sindice warehousing meetup
 
literature search
literature searchliterature search
literature search
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
The OpenOffice.org ODF Toolkit Project
The OpenOffice.org ODF Toolkit ProjectThe OpenOffice.org ODF Toolkit Project
The OpenOffice.org ODF Toolkit Project
 
Citation
CitationCitation
Citation
 

Similar to Paper as a Research Object

literature based discovery
literature based discoveryliterature based discovery
literature based discovery
alexander garcia
 
Semantic citation
Semantic citationSemantic citation
Semantic citation
Deepak K
 
Library research methods
Library research methodsLibrary research methods
Library research methods
rphillipsATswbts
 
Linked library data
Linked library dataLinked library data
Linked library data
Jindřich Mynarz
 
Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?
Stuart Weibel
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
University of Toronto Libraries - Information Technology Services
 
Open Access NBIC Workshop April 19, 2011
Open Access NBIC Workshop April 19, 2011Open Access NBIC Workshop April 19, 2011
Open Access NBIC Workshop April 19, 2011
Philip Bourne
 
Mathew.ppt
Mathew.pptMathew.ppt
Mathew.ppt
SurbhiTanwar12
 
W13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popularW13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popularlterrones
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
Sören Auer
 
online library databases.pptx
online library databases.pptxonline library databases.pptx
online library databases.pptx
MuhammadAsif362357
 
British Library
British LibraryBritish Library
British Library
clarivate
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
Todd Vision
 
Charlie Rapple
Charlie RappleCharlie Rapple
Charlie Rapple
ptslides
 
020610
020610020610
Monday presentation 1336-may23
Monday presentation 1336-may23Monday presentation 1336-may23
Monday presentation 1336-may23alexander garcia
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities
Getaneh Alemu
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
National Information Standards Organization (NISO)
 

Similar to Paper as a Research Object (20)

literature based discovery
literature based discoveryliterature based discovery
literature based discovery
 
Semantic citation
Semantic citationSemantic citation
Semantic citation
 
Library research methods
Library research methodsLibrary research methods
Library research methods
 
Linked library data
Linked library dataLinked library data
Linked library data
 
Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
Open Access NBIC Workshop April 19, 2011
Open Access NBIC Workshop April 19, 2011Open Access NBIC Workshop April 19, 2011
Open Access NBIC Workshop April 19, 2011
 
Mathew.ppt
Mathew.pptMathew.ppt
Mathew.ppt
 
W13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popularW13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popular
 
McNair scholars
McNair scholarsMcNair scholars
McNair scholars
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
online library databases.pptx
online library databases.pptxonline library databases.pptx
online library databases.pptx
 
British Library
British LibraryBritish Library
British Library
 
E-LEARN: Databases
E-LEARN: DatabasesE-LEARN: Databases
E-LEARN: Databases
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
Charlie Rapple
Charlie RappleCharlie Rapple
Charlie Rapple
 
020610
020610020610
020610
 
Monday presentation 1336-may23
Monday presentation 1336-may23Monday presentation 1336-may23
Monday presentation 1336-may23
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
 

Recently uploaded

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 

Recently uploaded (20)

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 

Paper as a Research Object

  • 1. Research around and about the scientific paper in the biomedical domain. Supporting Literature Based Discovery From the paper to the data back and forth Alexander Garcia, PhD. FSU
  • 2. 350 Years and Counting  Scientific articles have adopted electronic dissemination channels  Scholarly communication has been complemented by the adoption of blogs, mailing lists, social networks, and other technologies  Information remains locked up in PDFs
  • 3. And so we are… Managing the publication on a postmortem basis… The paper as an interface to the Web of Data? The problem remains, so… To be born semantics… why not?
  • 4. Heading towards  A semantic document, one where human-readable knowledge is augmented to enable its interpretation by machine  A human interpretable document fully procesable by machines  Human interoperability and machine interoperability  Literature Based Discovery and the Paper as an interface to the WoD
  • 5. We all know that  Information is locked up in discrete documents  Mostly PDF  Controlled vocabularies are not always available  Text Mining depends on availability of data  Poor metadata
  • 6. Agenda Biotea Citagora Semantic documents as scaffolds for research objects Human interoperability and machine interoperability
  • 7. Literature Based Discovery • The key idea is: putting together explicit assertions from different papers to form new implicit assertions – PTSD and suicide – Magnesium-migraine – Fish oil-Raynaud’s or calcium-channel blokers • Sophisticated access to online information • Supplement document retrieval with: – Information extraction – Automatic summarization – Question answering
  • 8. The White Paper Challenge  Search and Retrieval How to get relevant documents faster Info Sources Query Builders Notifications How to “scan” the document in a meaningful manner? How to repurpose fragments of the documents?
  • 9. Literature Discovery Process  Search  Usually string-based search mechanisms  Little cognitive support  Retrieval  Simple list of DB entries  Little cognitive support  Interacting with the document  Straight into the PDF  Zero cognitive support  Data availability
  • 10.
  • 11. Literature Discovery Process  Search  Usually string-based search mechanisms  Little cognitive support  Retrieval  Simple list of DB entries  Little cognitive support  Interacting with the document  Straight into the PDF  Zero cognitive support
  • 12.
  • 13. Literature Discovery Process  Search  Usually string-based search mechanisms  Little cognitive support  Retrieval  Simple list of DB entries  Little cognitive support  Interacting with the document  Straight into the PDF  Zero cognitive support
  • 14.
  • 15. Challenge: Language Complexity The average age of participants (approximately 63 years), the predominance of women, and the high prevalence of comorbid conditions (for example, hypertension and cardiovascular disease) reflect typical characteristics of patients with osteoarthritis. Language encodes a lot of information
  • 16. Words and Phrases age approximately average cardiovascular characteristics comorbid conditions disease example high average age of participants approximately 63 years predominance of women high prevalence comorbid conditions
  • 17. Semantic Predications The average age of participants (approximately 63 years), the predominance of women, and the high prevalence of comorbid conditions (for example, hypertension and cardiovascular disease) reflect typical characteristics of patients with osteoarthritis.
  • 18. Semantic Predications Cardiovascular Diseases CO-OCCURS_WITH Degenerative polyarthritis Hypertension CO-OCCURS_WITH Degenerative polyarthritis Suicide Ideation CO-OCCURS_WITH Suicide Risk
  • 19. What is needed  Disambiguate Text and tag/link concepts  Meta-analyse information at concept level  Provide meta-analysed information  Support Information Based Knowledge Discovery (especially new associations)
  • 20. In order to support Literature Based Discovery  Ontologies  Communities  Annotation  Machinereadable documents In a nutshell…. …documents as interfaces to the Web of Data…. Biotea • Machine-readable and procesable documents • Interactive documents • Enriched metadata • Full content management, document centric • Social hub Citagora -Aggregated search -Single entry point -Social hub -Citation centric
  • 21. Biotea in a nutshell  It is a knowledge model for biomedical literature  We are semantically annotating literature with text mining and ontologies  Delivers a network of interrelated documents  Delivers a semantic infrastructure for PMC and scientific literature in general
  • 23. RDF4PMC, some results Makes possible  How similar are two articles?  based on authors, keywords, abstracts, ontologi cal terms  Metadata + Content + References What articles use this reference in a section with title “Results”? Annotations Makes possible • How similar are two articles?  based on semantic distance • Which annotation co-occurs more with this “YYY” annotation? • Which articles include “TERM” but not this other “TERM”? Annotations Some numbers, article PMC126253 “Computational method for reducing variance with Affymetrix microarrays” • NCBO • Annotations: 407 • Topics: 633 • Whatizit • Annotations: 14 • Topics: 203 Delivering: the platform that makes possible to build interactive environments for semantic publications
  • 24. A dashboard for semantic biopublications Semantically enriched publication Metadata+ Content + References SPARQL Catalase Automatically Annotated RDF
  • 25. Cloud of Bioannotations (term + # of bioentities) Title & authors Links Abstra ct Paragraphs containing the annotation selected by the user
  • 26. Bio-entities for the annotation selected Enriched content: interactive zone for the bio-entity selected by user
  • 27. Citagora  An Agora for Citations  From Citations to Social Web to an Interactive Document  Aggregating activity from Social Networks, Reference Management Systems, Blogs, Publishers, etc.  Aggregating sources from Google Scholar, Microsoft Academics, Zotero, Mendely, etc.
  • 28. What is MSRC.CITAGORA? Corpus of documents for one specific domain • • • BibRef centric Enrichment mechanism Based on heterogeneous data sources, aggregator o • o Heterogeneous BibRef data sources Heterogeneous PDF layouts Value in o o o o Enriching semantics around the BibRef Aggregating social activity around the BibRef  Social activity as part of the BifRef Making use of the content without exposing it DATA for and compatible with the Web of Data
  • 29. MSRC.CITAGORA Data Source Data Sources, may be users uploading ENL files, that have for each record the corresponding PDF. Result from harvesting Mendeley, ZOTERO, Elsevier API, Microsoft Academics API, etc. Extracting Meaningful Information by Processing the Data Source -List of references this document cites_to -Meaningful bag of words Authors, affiliations, emails Outcome: RDF -BibRef for the original PDF -Annotations for the whole document -Text -List of cites_to
  • 31. Moving Towards OPEN.CITAGORA Lets build the largest OPEN repository of everything around a standardized interoperable bibliographic reference Annotations has_part BibRef has_part has_part has_part Living in the Web of Data References Content PDF
  • 32. Focus for OPEN.CITAGORA Data Interoperability Unlocking valuable information from the PDF Home of the largest collection of scientific bibliographic references and literature
  • 33. Semantic Enrichment Jailbreaking PDF Content is locked up Meaningful Text Citations, cites_t o this paper cites_to -Authors -this paper has_authors -Title, DOI, etc -Content as text -Bag of words describing content Annotations PDF has_part has_part BibRef has_part has_part Content References
  • 34. Semantic Enrichment Jailbreaking BibRef PDF Meaningful Text -Citations, cites_to Heterogeneous Content is this paper locked up formats cites_to Diversity in APIs -Authors for collecting -this paper BibRefs has_authors Poor in -Title, DOI, etc descriptors -Content as text anchored in the -Bag of words content Not justdescribing about the Louzy content PDF metadata Standardization, all in one place, one URI, etc Annotatio ns PDF has_p art has_p art BibRef has_p art Reference s has_p art Conte nt
  • 35.
  • 36.
  • 37.
  • 38. Translational Research  How is MSRC contributing to Translational Research in Clinical Psychology?  Data Standards  Semantic Infrastructure  Bridging the gap between documents and data repositories
  • 39. Narrative Text Usable by humans and comp The paper as a Research Object The RO is a fluid structured grid
  • 40. About data Data Processing Data Processing BibRef Object BibRef Object Data The RO is a fluid structured grid
  • 41. Rhetorical structure: Header, Body. Lab Notebook
  • 42. BIBLIOGRAPHIC RECORD: CiTO+FaBIO HEAD: Bibliographic record (this paper), KeyWords, Author Contacts AUTHOR CONTACT: FOAF RHETORIC INFORMATION + EVIDENCE (external): SWAN-SIOC + CiTO + FaBIO SCIENTIFIC PAPER: Head, Body, Tail BODY: Rhetoric, Information, Evidence METHODS & MATERIALS: REAGENTS, PROTOCOLS, EQUIPMENT, INSTRUMENTATION INFORMATION + EVIDENCE (internal): METHODS & MATERIALS, EXPERIMENTAL DESIGN, DATA & COMPUTATIONS, INTERPRETATIONS REAGENTS: SemRes Antibodies, SemRes Mouse Models EXPERIMENTAL DESIGN: SWAN Data + Experiment, OBI, myExperiment DATA & COMPUTATIONS: SWAN Data+Experiment, OBI, SWAN, myExperiment INTERPRETATIONS: SWAN-SIOC TAIL: Bibliographic records (papers cited as external evidence) BIBLIOGRAPHIC RECORDS: SWAN Collections, CiTO+FaBIO
  • 43. We have learned so far  Born semantic enables the semantics to be of use to the authors, as they are present in the publication process from the start. To add value for readers and computational consumption these semantics must then be "preserved” throughout the publication process; so, we need to address the publication process to achieve this goal.
  • 44. Acknowledgments  Special Thanks to John Gomez, John Patterson, Dietrich Rebholz-Schuhmann, Robert Morris, Oscar Corcho, Diane Leiva and Greg Riccardi

Editor's Notes

  1. From paper-based journals to purely electronic formats.
  2. El siguiente paso consistió en hacer énfasis en la importancia de añadir semántica a los datos o anotaciones hechas en diferentes tipos de procedimientos experimentales o técnicas de laboratorio. En los cuadernos analizados se encontraron anotaciones de diferentes procedimientos experimentales, siendo los mas recurrentes la extraccion de ADN, la PCR incluyendo algunas de sus variantes y la electroforesis en geles de agarosa y poliacrilamida. El tipo de anotaciones encontradas estan relacionadas con los materiales y métodos y otros relacionados con diseño experimental, observandose datos de algun tipo de analisis de resultados.Entonces, con base en ésta estructura retórica de los cuadernos de laboratorio se planeó la construcción de dos ontologias, una que provea los metadatos que autodescriben el cuaderno de laboratorio y una actividad experimental; y otra que contuviera términos relacionados con procesos de laboratorio comúnmente usados en biología molecular de plantas.El propósito de contar con estas ontologías es poder soportar preguntas de competencia como “en que fechas fue extraído el ADN de los materiales de arroz usados en el proyecto titulado “identificación de marcadores moleculares asociados a QTLs de rendimiento en arroz” ?En que proyectos de investigación participó OXG entre el 2005 y el 2009?