BIOLINK 2008: Linking database submissions to primary citations with PubMed Central

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    BIOLINK 2008: Linking database submissions to primary citations with PubMed Central - Presentation Transcript

    1. Linking Database Submissions to Primary Citations with PubMed Central Heather Piwowar and Wendy Chapman Department of Biomedical Informatics University of Pittsburgh BioLINK 2008
    2.  
    3.  
      • These links are important for several reasons
    4.  
    5.  
    6.  
      • Sometimes the links are easy to discover
    7.  
    8.  
    9.  
    10.  
    11. But the meaning of hyperlinks is ambiguous:
    12. And often no hyperlinks at all:
      • One way to identify links:
      • NLP systems that identify statements of shared data from within full text.
      • BUT this requires developing and maintaining a full-text archive!
      • What about using PubMed Central?
      • Usage?
      • scientists looking for datasets for reuse
      • curators looking for primary citations
      • researchers studying data sharing behaviour
      • Goal:
      • Use the simple, full-text query interface of PubMed Central to identify articles with shared datasets
      • Method:
      • Gene expression microarray data
      • GEO database
      • Method:
      • Open Access articles to train
      • Non-Open access articles to test
      • Gene-expression articles selected by MeSH term query
      • Gold Standard:
      • True positives (N=550) Articles with primary citation links from GEO + screening of full-text
      • True negatives (N=165) The rest
      • Building the query:
      • Used full-text of open-access cohort
      • Removed words <40 occurrences
      • Unigram bag-of-words vectors
      • Tree and Rule algorithms, a variety of parameters
      • (geo OR omnibus) AND microarray AND &quot;gene expression&quot; AND accession NOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
      • (geo OR omnibus) AND microarray AND &quot;gene expression&quot; AND accession NOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
      • (geo OR omnibus) AND microarray AND &quot;gene expression&quot; AND accession NOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
      • (geo OR omnibus) AND microarray AND &quot;gene expression&quot; AND accession NOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
      • (geo OR omnibus) AND microarray AND &quot;gene expression&quot; AND accession NOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
      • Evaluation Results
      • 40% recall
      • 94% precision, 65% for those not yet linked
      • worse than full-NLP results (~ 89%,83%)
      • slightly better than trivial query (34%,90%)
      • Limitations
      • only one datatype
      • database-centric
      • performance so far is rather mediocre…
      • Impact?
      • Today’s performance:
        • would increase GEO links by 2.6%
        • by 5.5% annually when all NIH in PMC
      • Double the recall, to 80%:
        • double the numbers above
      • GEO curators added the 40 links identified by this study
      • We hope this work
        • inspires future enhancements, and
        • highlights the opportunities for simple full-text queries in PubMed Central given the mandated influx of NIH-funded research reports.
    13. Thank you
      • Advisor: Dr. Wendy Chapman
      • Funders: NLM and Pitt DBMI
      • Enablers: Everyone who deposits their publications in PubMed Central!
      • My shared data: www.dbmi.pitt.edu/piwowar
      • Share your research data too!

    + hpiwowarhpiwowar, 2 months ago

    custom

    161 views, 0 favs, 2 embeds more stats

    Abstract: Background: Dataset submissions are growi more

    More info about this document

    CC Attribution License

    Go to text version

    • Total Views 161
      • 155 on SlideShare
      • 6 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 0
    Most viewed embeds
    • 4 views on http://www.rubenroa.com.ar
    • 2 views on http://rubenroa.com.ar

    more

    All embeds
    • 4 views on http://www.rubenroa.com.ar
    • 2 views on http://rubenroa.com.ar

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories