• Save
Repurposing authoritative data about faculty to analyze publication output, infer expertise, and recommend grant opportunities
Upcoming SlideShare
Loading in...5
×
 

Repurposing authoritative data about faculty to analyze publication output, infer expertise, and recommend grant opportunities

on

  • 357 views

August 19, 2013 presentation to Center for Healthcare Informatics and Policy located at Weill Cornell Medical College in New York City.

August 19, 2013 presentation to Center for Healthcare Informatics and Policy located at Weill Cornell Medical College in New York City.

Statistics

Views

Total Views
357
Views on SlideShare
357
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Repurposing authoritative data about faculty to analyze publication output, infer expertise, and recommend grant opportunities Repurposing authoritative data about faculty to analyze publication output, infer expertise, and recommend grant opportunities Presentation Transcript

  • Repurposing authoritative data about faculty to analyze publication output, infer expertise, and recommend grant opportunities Paul Albert, Don Carpenter, and Jie Lin paa2013@med.cornell.edu Weill Cornell Medical College
  • Email cyz2123@med.cornell.edu Phone 646-962-2551 Address 1300 York Avenue New York, NY 10065 Other sites Clinical profile
  • Email cyz2123@med.cornell.edu Phone 646-962-2551 Address 1300 York Avenue New York, NY 10065 Other sites Clinical profile Where is the ongoing motivation to keep these profiles current?
  • Researchers use profile systems to find collaborators. A widely invoked “fact” about VIVO (also an old Russian proverb) “
  • How can VIVO data address pressing needs in order to strengthen its viability?
  • 1. Administrators want reports. 2. Both administrators and researchers want to know about funding opportunities. Pressing needs
  • Invention is 1% inspiration and (due to rounding error) 98% perspiration. Thomas A. Edison Source: Yahoo Answers “
  • 1. Administrators want reports. 2. Both administrators and researchers want to know about funding opportunities. Pressing needs
  • Administrators are avid consumers of institutional data.
  • Proposed question #1 Publications appearing in journals of a given impact factor
  • Proposed question #2 In any given year, which paper has the most incoming citations?
  • Proposed question #3 Which papers that have received federal funding are not deposited in PubMed Central?
  • Proposed question #4 Which clinical departments tend to publish the most?
  • Proposed question #5 What articles have faculty published in the last month in which they were first or last author?
  • Institutional publication reporting: choose two* • High quality disambiguation (>90% accuracy) • Minimal delay between review and inclusion in the reporting system • Tool is simple enough to allow anyone to use * Or one
  • Sample SPARQL query SELECT distinct ?Person1_firstName ?Person1_lastName ?Person1_primaryEmail ? AcademicArticle1_label ?Journal1_label ?AcademicArticle1_pmid ? DateTimeValue1_dateTime WHERE{ ?AcademicArticle1 rdf:type bibo:Document . ?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid . ?AcademicArticle1 vivo:dateTimeValue ?DateTimeValue1 . ?AcademicArticle1 vivo:informationResourceSupportedBy ?FundingOrganization1 . ?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid . ?DateTimeValue1 rdf:type vivo:DateTimeValue . ?DateTimeValue1 vivo:dateTime ?DateTimeValue1_dateTime . ?FundingOrganization1 rdf:type vivo:FundingOrganization . ?FundingOrganization1 rdfs:label ?FundingOrganization1_label . ?AcademicArticle1 rdfs:label ?AcademicArticle1_label . ?AcademicArticle1 vivo:hasPublicationVenue ?Journal1 . ?Journal1 rdf:type bibo:Journal . ?Journal1 rdfs:label ?Journal1_label . ?AcademicArticle1 vivo:informationResourceInAuthorship ?Authorship1 . ?Authorship1 rdf:type vivo:Authorship . ?Authorship1 vivo:linkedAuthor ?Person1 . ?Person1 rdf:type foaf:Person . ?Person1 vivo:primaryEmail ?Person1_primaryEmail . ?Person1 wcmc:cwid ?Person1_cwid . ?Person1 foaf:firstName ?Person1_firstName . ?Person1 foaf:lastName ?Person1_lastName . FILTER REGEX (str(?FundingOrganization1_label), 'N.I.H.', 'i') FILTER NOT EXISTS { ?AcademicArticle1 vivo:pmcid ?AcademicArticle1_pmcid .} FILTER (xsd:dateTime(?DateTimeValue1_dateTime) > "2008-04-01T00:00:00"^^xsd:dateTime) FILTER (xsd:dateTime(?DateTimeValue1_dateTime) < "2012-12-01T00:00:00"^^xsd:dateTime) } ORDER BY ?Person1_lastName
  • SELECT distinct ?Person1_firstName ?Person1_lastName ? Person1_primaryEmail  ?AcademicArticle1_label ?Journal1_label ? AcademicArticle1_pmid ?DateTimeValue1_dateTime WHERE{ ?AcademicArticle1 rdf:type bibo:Document . ?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid . ?AcademicArticle1 vivo:dateTimeValue ?DateTimeValue1 . ?AcademicArticle1 vivo:informationResourceSupportedBy ? FundingOrganization1 . ?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid . ?DateTimeValue1 rdf:type vivo:DateTimeValue . ?DateTimeValue1 vivo:dateTime ?DateTimeValue1_dateTime . ?FundingOrganization1 rdf:type vivo:FundingOrganization . ?FundingOrganization1 rdfs:label ?FundingOrganization1_label . ?AcademicArticle1 rdfs:label ?AcademicArticle1_label . ?AcademicArticle1 vivo:hasPublicationVenue ?Journal1 . ?Journal1 rdf:type bibo:Journal . ?Journal1 rdfs:label ?Journal1_label . ?AcademicArticle1 vivo:informationResourceInAuthorship ? Authorship1 . ?Authorship1 rdf:type vivo:Authorship . ?Authorship1 vivo:linkedAuthor ?Person1 . ?Person1 rdf:type foaf:Person . ?Person1 vivo:primaryEmail ?Person1_primaryEmail . ?Person1 wcmc:cwid ?Person1_cwid . ?Person1 foaf:firstName ?Person1_firstName . ?Person1 foaf:lastName ?Person1_lastName . FILTER REGEX (str(?FundingOrganization1_label), 'N.I.H.', 'i') FILTER NOT EXISTS { ?AcademicArticle1 vivo:pmcid ? AcademicArticle1_pmcid .} FILTER (xsd:dateTime(?DateTimeValue1_dateTime) > "2008-04-01T00:00:00"^^xsd:dateTime) FILTER (xsd:dateTime(?DateTimeValue1_dateTime) < "2012-12-01T00:00:00"^^xsd:dateTime) }    ORDER BY ?Person1_lastName + VIVO Dashboard
  • VIVO Dashboard: a tool for easily running sophisticated reports Don Carpenter dwc92@cornell.edu Cornell University
  • Prime directive of VIVO Dashboard Empower untrained users to run sophisticated semantic queries on Weill Cornell faculty publications * Secondary directive: kill Sarah Connor
  • Sample SPARQL query SELECT distinct ?Article1_pmid ?Person1_cwid ? Authorship1_authorRank WHERE{ ?Article1 rdf:type bibo:Document . ?Article1 vivo:informationResourceInAuthorship ?Authorship1 . ?Article1 bibo:pmid ?Article1_pmid . ?Authorship1 rdf:type vivo:Authorship . ?Authorship1 vivo:authorRank ?Authorship1_authorRank . ?Authorship1 vivo:linkedAuthor ?Person1 . ?Person1 rdf:type foaf:Person . ?Person1 wcmc:cwid ?Person1_cwid . }
  • Demo Demo Demo Demo Demo Demo Demo Demo Demo Demo
  • Workflow • One-time basis, set up the fields in the Drupal admin • On a weekly basis, execute a set of SPARQL queries against VIVO’s semantic endpoint. • Import resulting .csv files into Drupal.
  • Technology Stack • Drupal 7.x • Stores content using the robust indexing application, Apache Solr  • AJAX • Key modules - Apache Solr - Facet API - Facet API graphs - D3.js (visualization library) - Charts and graphs - VIVO Dashboard (custom module)
  • Performance • A previous version using MySQL queries took >10 seconds to load • Completely rewriting the application in Solr allows us to store X publications • Performance is now < 5 seconds
  • Future Work • Enlist the talents of other Drupal developers • Release this project as open source code • Create a visualization for global health expertise
  • Publications The following publications are for all publications by active Weill Cornell Medical College faculty as represented in VIVO. 25 50 75 100 Graph List Export ✓ Research Article (657) ✓ In Process (55) ✓ Review (45) ✓ Clinical Guideline (32) more... Publication Type Author Name Journal ranking 15.4 - 68.3 Date 2009 - Present Journal Name
  • Repurposing authoritative semantic data to infer expertise and recommend grant opportunities Jie Lin jie265@gmail.com Cornell University
  • Pressing needs 1. Researchers, development officers, and funding agencies frequently complain that the process of learning about grant opportunities is inefficient.  2. As a project manager for VIVO, I want to accurately include researchers' fields and expertise.
  • Maybe the needs of grant recommendations and expertise can be addressed... together.
  • 1. Gather information about people and grant notices. 2. Algorithmically make personalized recommend- ations of grant opportunities. (Hard.) 3. In exchange for the promise of higher quality recommendations, we get busy researchers to provide us feedback on our initial inferences about expertise. 4. Use expertise data in VIVO. Our intended workflow
  • Sources for people Source Example Clinical expertise and board certifications at WeillCornell.org clinical pathology Medical Subject Headings (MeSH) in published papers anti-bacterial agents Personal statement ... I’ve always enjoyed medical education... Keywords for NIH grants information system analysis CFDA labels for NIH grants 93.821 – Lung diseases research Spending categories for NIH grants neurosciences ClinicalTrials.gov keywords and system- inferred MeSH violence research Global health expertise in Researcher Profile System Egypt NCCR category as asserted by CTSC staff Developmental and Child Psychology
  • ScanGrants Sources for grant opportunities Grants.gov After global pre-filtration n = ~1,200
  • Concept ranking • Term Frequency-Inverse Document Frequency – reward terms for showing up in a person’s list of terms and penalize terms for being in others.  • Result: no one is expert on “humans” • No algorithm is perfect so we allow faculty to provide feedback on the controlled terms we have inferred for them.
  • Mapping concepts to fields • Objective of using a limited number of fields is to increase overlap between people and grants • 149 (somewhat arbitrarily) defined fields • Fields represent eight different lists of fields (Map of Science, ScanGrants, ABMS specialties...) • Take concepts and fields and do a co-occurrence search in MEDLINE. • For example, after weighting by size of field, how often does “Natural Language Processing” occur in conjunction with immunology; medical informatics; urology...?
  • The math for mapping people and grants to fields
  • Promise of co-occurrence searching Suppose a researcher is working almost exclusively on autoimmune disease and is highly ranked for the concept, “apoptosis.” Apoptosis also frequently co-occurs in MEDLINE with oncology. Therefore, we can predict her interest in an oncology grant.
  • The downside of co-occurrence searching
  • Match people to grants • Not yet done, but early testing is promising. • The idea is to use cosine similarity to define how similar any person-grant combination is to any other person-grant combination • Then you can rank those connections by people or by grant.
  • Utility for Development Office • Suppose Dr. Lamon and the Development Office want to identify candidates to apply for a particular grant.  • He can get an ordered list of the top candidates of the people who are appropriate for this opportunity.
  • Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration