Your SlideShare is downloading. ×
Repurposing authoritative data
about faculty to analyze publication
output, infer expertise, and
recommend grant opportuni...
Email
cyz2123@med.cornell.edu
Phone
646-962-2551
Address
1300 York Avenue
New York, NY 10065
Other sites
Clinical profile
Email
cyz2123@med.cornell.edu
Phone
646-962-2551
Address
1300 York Avenue
New York, NY 10065
Other sites
Clinical profile
W...
Researchers use profile
systems to find collaborators.
A widely invoked “fact” about VIVO
(also an old Russian proverb)
“
How can VIVO data address
pressing needs in order to
strengthen its viability?
1. Administrators want reports.
2. Both administrators and researchers
want to know about funding
opportunities.
Pressing ...
Invention is 1% inspiration and
(due to rounding error) 98%
perspiration.
Thomas A. Edison
Source: Yahoo Answers
“
1. Administrators want reports.
2. Both administrators and researchers
want to know about funding
opportunities.
Pressing ...
Administrators are avid
consumers of institutional data.
Proposed question #1
Publications appearing in journals of
a given impact factor
Proposed question #2
In any given year, which paper has
the most incoming citations?
Proposed question #3
Which papers that have received
federal funding are not deposited in
PubMed Central?
Proposed question #4
Which clinical departments tend to
publish the most?
Proposed question #5
What articles have faculty published
in the last month in which they were
first or last author?
Institutional publication
reporting: choose two*
• High quality disambiguation (>90% accuracy)
• Minimal delay between rev...
Sample SPARQL query
SELECT distinct ?Person1_firstName ?Person1_lastName ?Person1_primaryEmail ?
AcademicArticle1_label ?J...
SELECT distinct ?Person1_firstName ?Person1_lastName ?
Person1_primaryEmail  ?AcademicArticle1_label ?Journal1_label ?
Acad...
VIVO Dashboard: a tool for easily
running sophisticated reports
Don Carpenter
dwc92@cornell.edu
Cornell University
Prime directive of VIVO
Dashboard
Empower untrained users to run
sophisticated semantic queries on Weill
Cornell faculty p...
Sample SPARQL query
SELECT distinct ?Article1_pmid ?Person1_cwid ?
Authorship1_authorRank
WHERE{
?Article1 rdf:type bibo:D...
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Workflow
• One-time basis, set up the fields in the Drupal
admin
• On a weekly basis, execute a set of SPARQL
queries again...
Technology Stack
• Drupal 7.x
• Stores content using the robust indexing application,
Apache Solr 
• AJAX
• Key modules
- ...
Performance
• A previous version using MySQL queries took >10
seconds to load
• Completely rewriting the application in So...
Future Work
• Enlist the talents of other Drupal developers
• Release this project as open source code
• Create a visualiz...
Publications
The following publications are for all publications by active Weill Cornell Medical
College faculty as repres...
Repurposing authoritative
semantic data to infer expertise
and recommend grant
opportunities
Jie Lin
jie265@gmail.com
Corn...
Pressing needs
1. Researchers, development officers, and funding
agencies frequently complain that the process of
learning a...
Maybe the needs of grant
recommendations and expertise
can be addressed... together.
1. Gather information about people and grant notices.
2. Algorithmically make personalized recommend-
ations of grant oppo...
Sources for people
Source Example
Clinical expertise and board certifications
at WeillCornell.org clinical pathology
Medica...
ScanGrants
Sources for grant opportunities
Grants.gov
After global pre-filtration n = ~1,200
Concept ranking
• Term Frequency-Inverse Document Frequency –
reward terms for showing up in a person’s list of
terms and ...
Mapping concepts to fields
• Objective of using a limited number of fields is to
increase overlap between people and grants
...
The math for mapping people
and grants to fields
Promise of co-occurrence
searching
Suppose a researcher is working almost exclusively on
autoimmune disease and is highly ...
The downside of co-occurrence
searching
Match people to grants
• Not yet done, but early testing is promising.
• The idea is to use cosine similarity to define how...
Utility for Development Office
• Suppose Dr. Lamon and the Development Office
want to identify candidates to apply for a
parti...
Demonstration
Demonstration
Demonstration
Demonstration
Demonstration
Demonstration
Demonstration
Demonstration
Demonstrat...
Repurposing authoritative data about faculty to analyze publication output, infer expertise, and recommend grant opportuni...
Repurposing authoritative data about faculty to analyze publication output, infer expertise, and recommend grant opportuni...
Upcoming SlideShare
Loading in...5
×

Repurposing authoritative data about faculty to analyze publication output, infer expertise, and recommend grant opportunities

374

Published on

August 19, 2013 presentation to Center for Healthcare Informatics and Policy located at Weill Cornell Medical College in New York City.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
374
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Repurposing authoritative data about faculty to analyze publication output, infer expertise, and recommend grant opportunities"

  1. 1. Repurposing authoritative data about faculty to analyze publication output, infer expertise, and recommend grant opportunities Paul Albert, Don Carpenter, and Jie Lin paa2013@med.cornell.edu Weill Cornell Medical College
  2. 2. Email cyz2123@med.cornell.edu Phone 646-962-2551 Address 1300 York Avenue New York, NY 10065 Other sites Clinical profile
  3. 3. Email cyz2123@med.cornell.edu Phone 646-962-2551 Address 1300 York Avenue New York, NY 10065 Other sites Clinical profile Where is the ongoing motivation to keep these profiles current?
  4. 4. Researchers use profile systems to find collaborators. A widely invoked “fact” about VIVO (also an old Russian proverb) “
  5. 5. How can VIVO data address pressing needs in order to strengthen its viability?
  6. 6. 1. Administrators want reports. 2. Both administrators and researchers want to know about funding opportunities. Pressing needs
  7. 7. Invention is 1% inspiration and (due to rounding error) 98% perspiration. Thomas A. Edison Source: Yahoo Answers “
  8. 8. 1. Administrators want reports. 2. Both administrators and researchers want to know about funding opportunities. Pressing needs
  9. 9. Administrators are avid consumers of institutional data.
  10. 10. Proposed question #1 Publications appearing in journals of a given impact factor
  11. 11. Proposed question #2 In any given year, which paper has the most incoming citations?
  12. 12. Proposed question #3 Which papers that have received federal funding are not deposited in PubMed Central?
  13. 13. Proposed question #4 Which clinical departments tend to publish the most?
  14. 14. Proposed question #5 What articles have faculty published in the last month in which they were first or last author?
  15. 15. Institutional publication reporting: choose two* • High quality disambiguation (>90% accuracy) • Minimal delay between review and inclusion in the reporting system • Tool is simple enough to allow anyone to use * Or one
  16. 16. Sample SPARQL query SELECT distinct ?Person1_firstName ?Person1_lastName ?Person1_primaryEmail ? AcademicArticle1_label ?Journal1_label ?AcademicArticle1_pmid ? DateTimeValue1_dateTime WHERE{ ?AcademicArticle1 rdf:type bibo:Document . ?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid . ?AcademicArticle1 vivo:dateTimeValue ?DateTimeValue1 . ?AcademicArticle1 vivo:informationResourceSupportedBy ?FundingOrganization1 . ?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid . ?DateTimeValue1 rdf:type vivo:DateTimeValue . ?DateTimeValue1 vivo:dateTime ?DateTimeValue1_dateTime . ?FundingOrganization1 rdf:type vivo:FundingOrganization . ?FundingOrganization1 rdfs:label ?FundingOrganization1_label . ?AcademicArticle1 rdfs:label ?AcademicArticle1_label . ?AcademicArticle1 vivo:hasPublicationVenue ?Journal1 . ?Journal1 rdf:type bibo:Journal . ?Journal1 rdfs:label ?Journal1_label . ?AcademicArticle1 vivo:informationResourceInAuthorship ?Authorship1 . ?Authorship1 rdf:type vivo:Authorship . ?Authorship1 vivo:linkedAuthor ?Person1 . ?Person1 rdf:type foaf:Person . ?Person1 vivo:primaryEmail ?Person1_primaryEmail . ?Person1 wcmc:cwid ?Person1_cwid . ?Person1 foaf:firstName ?Person1_firstName . ?Person1 foaf:lastName ?Person1_lastName . FILTER REGEX (str(?FundingOrganization1_label), 'N.I.H.', 'i') FILTER NOT EXISTS { ?AcademicArticle1 vivo:pmcid ?AcademicArticle1_pmcid .} FILTER (xsd:dateTime(?DateTimeValue1_dateTime) > "2008-04-01T00:00:00"^^xsd:dateTime) FILTER (xsd:dateTime(?DateTimeValue1_dateTime) < "2012-12-01T00:00:00"^^xsd:dateTime) } ORDER BY ?Person1_lastName
  17. 17. SELECT distinct ?Person1_firstName ?Person1_lastName ? Person1_primaryEmail  ?AcademicArticle1_label ?Journal1_label ? AcademicArticle1_pmid ?DateTimeValue1_dateTime WHERE{ ?AcademicArticle1 rdf:type bibo:Document . ?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid . ?AcademicArticle1 vivo:dateTimeValue ?DateTimeValue1 . ?AcademicArticle1 vivo:informationResourceSupportedBy ? FundingOrganization1 . ?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid . ?DateTimeValue1 rdf:type vivo:DateTimeValue . ?DateTimeValue1 vivo:dateTime ?DateTimeValue1_dateTime . ?FundingOrganization1 rdf:type vivo:FundingOrganization . ?FundingOrganization1 rdfs:label ?FundingOrganization1_label . ?AcademicArticle1 rdfs:label ?AcademicArticle1_label . ?AcademicArticle1 vivo:hasPublicationVenue ?Journal1 . ?Journal1 rdf:type bibo:Journal . ?Journal1 rdfs:label ?Journal1_label . ?AcademicArticle1 vivo:informationResourceInAuthorship ? Authorship1 . ?Authorship1 rdf:type vivo:Authorship . ?Authorship1 vivo:linkedAuthor ?Person1 . ?Person1 rdf:type foaf:Person . ?Person1 vivo:primaryEmail ?Person1_primaryEmail . ?Person1 wcmc:cwid ?Person1_cwid . ?Person1 foaf:firstName ?Person1_firstName . ?Person1 foaf:lastName ?Person1_lastName . FILTER REGEX (str(?FundingOrganization1_label), 'N.I.H.', 'i') FILTER NOT EXISTS { ?AcademicArticle1 vivo:pmcid ? AcademicArticle1_pmcid .} FILTER (xsd:dateTime(?DateTimeValue1_dateTime) > "2008-04-01T00:00:00"^^xsd:dateTime) FILTER (xsd:dateTime(?DateTimeValue1_dateTime) < "2012-12-01T00:00:00"^^xsd:dateTime) }    ORDER BY ?Person1_lastName + VIVO Dashboard
  18. 18. VIVO Dashboard: a tool for easily running sophisticated reports Don Carpenter dwc92@cornell.edu Cornell University
  19. 19. Prime directive of VIVO Dashboard Empower untrained users to run sophisticated semantic queries on Weill Cornell faculty publications * Secondary directive: kill Sarah Connor
  20. 20. Sample SPARQL query SELECT distinct ?Article1_pmid ?Person1_cwid ? Authorship1_authorRank WHERE{ ?Article1 rdf:type bibo:Document . ?Article1 vivo:informationResourceInAuthorship ?Authorship1 . ?Article1 bibo:pmid ?Article1_pmid . ?Authorship1 rdf:type vivo:Authorship . ?Authorship1 vivo:authorRank ?Authorship1_authorRank . ?Authorship1 vivo:linkedAuthor ?Person1 . ?Person1 rdf:type foaf:Person . ?Person1 wcmc:cwid ?Person1_cwid . }
  21. 21. Demo Demo Demo Demo Demo Demo Demo Demo Demo Demo
  22. 22. Workflow • One-time basis, set up the fields in the Drupal admin • On a weekly basis, execute a set of SPARQL queries against VIVO’s semantic endpoint. • Import resulting .csv files into Drupal.
  23. 23. Technology Stack • Drupal 7.x • Stores content using the robust indexing application, Apache Solr  • AJAX • Key modules - Apache Solr - Facet API - Facet API graphs - D3.js (visualization library) - Charts and graphs - VIVO Dashboard (custom module)
  24. 24. Performance • A previous version using MySQL queries took >10 seconds to load • Completely rewriting the application in Solr allows us to store X publications • Performance is now < 5 seconds
  25. 25. Future Work • Enlist the talents of other Drupal developers • Release this project as open source code • Create a visualization for global health expertise
  26. 26. Publications The following publications are for all publications by active Weill Cornell Medical College faculty as represented in VIVO. 25 50 75 100 Graph List Export ✓ Research Article (657) ✓ In Process (55) ✓ Review (45) ✓ Clinical Guideline (32) more... Publication Type Author Name Journal ranking 15.4 - 68.3 Date 2009 - Present Journal Name
  27. 27. Repurposing authoritative semantic data to infer expertise and recommend grant opportunities Jie Lin jie265@gmail.com Cornell University
  28. 28. Pressing needs 1. Researchers, development officers, and funding agencies frequently complain that the process of learning about grant opportunities is inefficient.  2. As a project manager for VIVO, I want to accurately include researchers' fields and expertise.
  29. 29. Maybe the needs of grant recommendations and expertise can be addressed... together.
  30. 30. 1. Gather information about people and grant notices. 2. Algorithmically make personalized recommend- ations of grant opportunities. (Hard.) 3. In exchange for the promise of higher quality recommendations, we get busy researchers to provide us feedback on our initial inferences about expertise. 4. Use expertise data in VIVO. Our intended workflow
  31. 31. Sources for people Source Example Clinical expertise and board certifications at WeillCornell.org clinical pathology Medical Subject Headings (MeSH) in published papers anti-bacterial agents Personal statement ... I’ve always enjoyed medical education... Keywords for NIH grants information system analysis CFDA labels for NIH grants 93.821 – Lung diseases research Spending categories for NIH grants neurosciences ClinicalTrials.gov keywords and system- inferred MeSH violence research Global health expertise in Researcher Profile System Egypt NCCR category as asserted by CTSC staff Developmental and Child Psychology
  32. 32. ScanGrants Sources for grant opportunities Grants.gov After global pre-filtration n = ~1,200
  33. 33. Concept ranking • Term Frequency-Inverse Document Frequency – reward terms for showing up in a person’s list of terms and penalize terms for being in others.  • Result: no one is expert on “humans” • No algorithm is perfect so we allow faculty to provide feedback on the controlled terms we have inferred for them.
  34. 34. Mapping concepts to fields • Objective of using a limited number of fields is to increase overlap between people and grants • 149 (somewhat arbitrarily) defined fields • Fields represent eight different lists of fields (Map of Science, ScanGrants, ABMS specialties...) • Take concepts and fields and do a co-occurrence search in MEDLINE. • For example, after weighting by size of field, how often does “Natural Language Processing” occur in conjunction with immunology; medical informatics; urology...?
  35. 35. The math for mapping people and grants to fields
  36. 36. Promise of co-occurrence searching Suppose a researcher is working almost exclusively on autoimmune disease and is highly ranked for the concept, “apoptosis.” Apoptosis also frequently co-occurs in MEDLINE with oncology. Therefore, we can predict her interest in an oncology grant.
  37. 37. The downside of co-occurrence searching
  38. 38. Match people to grants • Not yet done, but early testing is promising. • The idea is to use cosine similarity to define how similar any person-grant combination is to any other person-grant combination • Then you can rank those connections by people or by grant.
  39. 39. Utility for Development Office • Suppose Dr. Lamon and the Development Office want to identify candidates to apply for a particular grant.  • He can get an ordered list of the top candidates of the people who are appropriate for this opportunity.
  40. 40. Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration

×