• Save
Paper as a Research Object
Upcoming SlideShare
Loading in...5
×
 

Paper as a Research Object

on

  • 247 views

We need to start understanding documents within an electronic machine procesable environment. Such conception goes beyond the PDF and HTML; it entails, I argue, understanding the document as a fluid ...

We need to start understanding documents within an electronic machine procesable environment. Such conception goes beyond the PDF and HTML; it entails, I argue, understanding the document as a fluid aggregator.

Statistics

Views

Total Views
247
Views on SlideShare
246
Embed Views
1

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • From paper-based journals to purely electronic formats.
  • El siguiente paso consistió en hacer énfasis en la importancia de añadir semántica a los datos o anotaciones hechas en diferentes tipos de procedimientos experimentales o técnicas de laboratorio. En los cuadernos analizados se encontraron anotaciones de diferentes procedimientos experimentales, siendo los mas recurrentes la extraccion de ADN, la PCR incluyendo algunas de sus variantes y la electroforesis en geles de agarosa y poliacrilamida. El tipo de anotaciones encontradas estan relacionadas con los materiales y métodos y otros relacionados con diseño experimental, observandose datos de algun tipo de analisis de resultados.Entonces, con base en ésta estructura retórica de los cuadernos de laboratorio se planeó la construcción de dos ontologias, una que provea los metadatos que autodescriben el cuaderno de laboratorio y una actividad experimental; y otra que contuviera términos relacionados con procesos de laboratorio comúnmente usados en biología molecular de plantas.El propósito de contar con estas ontologías es poder soportar preguntas de competencia como “en que fechas fue extraído el ADN de los materiales de arroz usados en el proyecto titulado “identificación de marcadores moleculares asociados a QTLs de rendimiento en arroz” ?En que proyectos de investigación participó OXG entre el 2005 y el 2009?

Paper as a Research Object Paper as a Research Object Presentation Transcript

  • Research around and about the scientific paper in the biomedical domain. Supporting Literature Based Discovery From the paper to the data back and forth Alexander Garcia, PhD. FSU
  • 350 Years and Counting  Scientific articles have adopted electronic dissemination channels  Scholarly communication has been complemented by the adoption of blogs, mailing lists, social networks, and other technologies  Information remains locked up in PDFs
  • And so we are… Managing the publication on a postmortem basis… The paper as an interface to the Web of Data? The problem remains, so… To be born semantics… why not? View slide
  • Heading towards  A semantic document, one where human-readable knowledge is augmented to enable its interpretation by machine  A human interpretable document fully procesable by machines  Human interoperability and machine interoperability  Literature Based Discovery and the Paper as an interface to the WoD View slide
  • We all know that  Information is locked up in discrete documents  Mostly PDF  Controlled vocabularies are not always available  Text Mining depends on availability of data  Poor metadata
  • Agenda Biotea Citagora Semantic documents as scaffolds for research objects Human interoperability and machine interoperability
  • Literature Based Discovery • The key idea is: putting together explicit assertions from different papers to form new implicit assertions – PTSD and suicide – Magnesium-migraine – Fish oil-Raynaud’s or calcium-channel blokers • Sophisticated access to online information • Supplement document retrieval with: – Information extraction – Automatic summarization – Question answering
  • The White Paper Challenge  Search and Retrieval How to get relevant documents faster Info Sources Query Builders Notifications How to “scan” the document in a meaningful manner? How to repurpose fragments of the documents?
  • Literature Discovery Process  Search  Usually string-based search mechanisms  Little cognitive support  Retrieval  Simple list of DB entries  Little cognitive support  Interacting with the document  Straight into the PDF  Zero cognitive support  Data availability
  • Literature Discovery Process  Search  Usually string-based search mechanisms  Little cognitive support  Retrieval  Simple list of DB entries  Little cognitive support  Interacting with the document  Straight into the PDF  Zero cognitive support
  • Literature Discovery Process  Search  Usually string-based search mechanisms  Little cognitive support  Retrieval  Simple list of DB entries  Little cognitive support  Interacting with the document  Straight into the PDF  Zero cognitive support
  • Challenge: Language Complexity The average age of participants (approximately 63 years), the predominance of women, and the high prevalence of comorbid conditions (for example, hypertension and cardiovascular disease) reflect typical characteristics of patients with osteoarthritis. Language encodes a lot of information
  • Words and Phrases age approximately average cardiovascular characteristics comorbid conditions disease example high average age of participants approximately 63 years predominance of women high prevalence comorbid conditions
  • Semantic Predications The average age of participants (approximately 63 years), the predominance of women, and the high prevalence of comorbid conditions (for example, hypertension and cardiovascular disease) reflect typical characteristics of patients with osteoarthritis.
  • Semantic Predications Cardiovascular Diseases CO-OCCURS_WITH Degenerative polyarthritis Hypertension CO-OCCURS_WITH Degenerative polyarthritis Suicide Ideation CO-OCCURS_WITH Suicide Risk
  • What is needed  Disambiguate Text and tag/link concepts  Meta-analyse information at concept level  Provide meta-analysed information  Support Information Based Knowledge Discovery (especially new associations)
  • In order to support Literature Based Discovery  Ontologies  Communities  Annotation  Machinereadable documents In a nutshell…. …documents as interfaces to the Web of Data…. Biotea • Machine-readable and procesable documents • Interactive documents • Enriched metadata • Full content management, document centric • Social hub Citagora -Aggregated search -Single entry point -Social hub -Citation centric
  • Biotea in a nutshell  It is a knowledge model for biomedical literature  We are semantically annotating literature with text mining and ontologies  Delivers a network of interrelated documents  Delivers a semantic infrastructure for PMC and scientific literature in general
  • PMC RDFication Metadata+ Content + References References Enrichment RDF Generation RDFReacto r PMC XML
  • RDF4PMC, some results Makes possible  How similar are two articles?  based on authors, keywords, abstracts, ontologi cal terms  Metadata + Content + References What articles use this reference in a section with title “Results”? Annotations Makes possible • How similar are two articles?  based on semantic distance • Which annotation co-occurs more with this “YYY” annotation? • Which articles include “TERM” but not this other “TERM”? Annotations Some numbers, article PMC126253 “Computational method for reducing variance with Affymetrix microarrays” • NCBO • Annotations: 407 • Topics: 633 • Whatizit • Annotations: 14 • Topics: 203 Delivering: the platform that makes possible to build interactive environments for semantic publications
  • A dashboard for semantic biopublications Semantically enriched publication Metadata+ Content + References SPARQL Catalase Automatically Annotated RDF
  • Cloud of Bioannotations (term + # of bioentities) Title & authors Links Abstra ct Paragraphs containing the annotation selected by the user
  • Bio-entities for the annotation selected Enriched content: interactive zone for the bio-entity selected by user
  • Citagora  An Agora for Citations  From Citations to Social Web to an Interactive Document  Aggregating activity from Social Networks, Reference Management Systems, Blogs, Publishers, etc.  Aggregating sources from Google Scholar, Microsoft Academics, Zotero, Mendely, etc.
  • What is MSRC.CITAGORA? Corpus of documents for one specific domain • • • BibRef centric Enrichment mechanism Based on heterogeneous data sources, aggregator o • o Heterogeneous BibRef data sources Heterogeneous PDF layouts Value in o o o o Enriching semantics around the BibRef Aggregating social activity around the BibRef  Social activity as part of the BifRef Making use of the content without exposing it DATA for and compatible with the Web of Data
  • MSRC.CITAGORA Data Source Data Sources, may be users uploading ENL files, that have for each record the corresponding PDF. Result from harvesting Mendeley, ZOTERO, Elsevier API, Microsoft Academics API, etc. Extracting Meaningful Information by Processing the Data Source -List of references this document cites_to -Meaningful bag of words Authors, affiliations, emails Outcome: RDF -BibRef for the original PDF -Annotations for the whole document -Text -List of cites_to
  • MSRC.CITAGORA Citagora Harvester Citation Metadata & References Database S2T PDFs Basic XML Enhanced XML Ontology / Citation References Vocabulary Documen Query Search t Database Engine RDF SPARQL Interface (Search + Tag Browser)
  • Moving Towards OPEN.CITAGORA Lets build the largest OPEN repository of everything around a standardized interoperable bibliographic reference Annotations has_part BibRef has_part has_part has_part Living in the Web of Data References Content PDF
  • Focus for OPEN.CITAGORA Data Interoperability Unlocking valuable information from the PDF Home of the largest collection of scientific bibliographic references and literature
  • Semantic Enrichment Jailbreaking PDF Content is locked up Meaningful Text Citations, cites_t o this paper cites_to -Authors -this paper has_authors -Title, DOI, etc -Content as text -Bag of words describing content Annotations PDF has_part has_part BibRef has_part has_part Content References
  • Semantic Enrichment Jailbreaking BibRef PDF Meaningful Text -Citations, cites_to Heterogeneous Content is this paper locked up formats cites_to Diversity in APIs -Authors for collecting -this paper BibRefs has_authors Poor in -Title, DOI, etc descriptors -Content as text anchored in the -Bag of words content Not justdescribing about the Louzy content PDF metadata Standardization, all in one place, one URI, etc Annotatio ns PDF has_p art has_p art BibRef has_p art Reference s has_p art Conte nt
  • Translational Research  How is MSRC contributing to Translational Research in Clinical Psychology?  Data Standards  Semantic Infrastructure  Bridging the gap between documents and data repositories
  • Narrative Text Usable by humans and comp The paper as a Research Object The RO is a fluid structured grid
  • About data Data Processing Data Processing BibRef Object BibRef Object Data The RO is a fluid structured grid
  • Rhetorical structure: Header, Body. Lab Notebook
  • BIBLIOGRAPHIC RECORD: CiTO+FaBIO HEAD: Bibliographic record (this paper), KeyWords, Author Contacts AUTHOR CONTACT: FOAF RHETORIC INFORMATION + EVIDENCE (external): SWAN-SIOC + CiTO + FaBIO SCIENTIFIC PAPER: Head, Body, Tail BODY: Rhetoric, Information, Evidence METHODS & MATERIALS: REAGENTS, PROTOCOLS, EQUIPMENT, INSTRUMENTATION INFORMATION + EVIDENCE (internal): METHODS & MATERIALS, EXPERIMENTAL DESIGN, DATA & COMPUTATIONS, INTERPRETATIONS REAGENTS: SemRes Antibodies, SemRes Mouse Models EXPERIMENTAL DESIGN: SWAN Data + Experiment, OBI, myExperiment DATA & COMPUTATIONS: SWAN Data+Experiment, OBI, SWAN, myExperiment INTERPRETATIONS: SWAN-SIOC TAIL: Bibliographic records (papers cited as external evidence) BIBLIOGRAPHIC RECORDS: SWAN Collections, CiTO+FaBIO
  • We have learned so far  Born semantic enables the semantics to be of use to the authors, as they are present in the publication process from the start. To add value for readers and computational consumption these semantics must then be "preserved” throughout the publication process; so, we need to address the publication process to achieve this goal.
  • Acknowledgments  Special Thanks to John Gomez, John Patterson, Dietrich Rebholz-Schuhmann, Robert Morris, Oscar Corcho, Diane Leiva and Greg Riccardi