SlideShare a Scribd company logo
1 of 33
Download to read offline
Tracking Data Reuse
Motivations, Methods, and Obstacles

                 Heather	
  Piwowar
     DataONE	
  postdoc	
  with	
  NESCent	
  and	
  Dryad
                   @researchremix	
  

                  IASSIST2011	
  #iassist
http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm
http://www.flickr.com/photos/jsmjr/62443357/
http://www.flickr.com/photos/camilleharrington/3587294608/
http://www.flickr.com/photos/rkuhnau/3318245976/
http://www.flickr.com/photos/conformpdx/1796399674/
http://www.flickr.com/photos/rkuhnau/3317418699/
http://www.flickr.com/photos/zemlinki/261617721/
http://www.flickr.com/photos/tracenmatt/3020786491/
http://www.flickr.com/photos/the-o/2078239333/
?
    http://www.flickr.com/photos/ryanr/142455033/
http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png
http://www.flickr.com/photos/archeon/2941655917/
In 2009, 116 articles cited ORNL DAAC data.

Finding these articles took 70-80 hours

across at least 12 resources
all chosen from a deep understanding
of this specific research domain

    then the full text of all the hits were
             manually reviewed
                                  Valerie Enriquez interview with James Kidder
                   http://openwetware.org/wiki/DataONE:Notebook/Reuse_of_repository_data
How	
  to	
  iden9fy	
  Dataset	
  Reuse	
  in	
  the	
  published	
  literature
                                                                                                                                                                                                                                                                     This	
  cita2on	
  paCern	
  (dataset	
  
                                                                                                                                                                                                                                                                     DOI/ID	
  in	
  references	
  sec2on)	
  is	
  
                                                                                                                                                                                                                                                                     used	
  almost	
  exclusively	
  for	
  
                                   dataset	
  has	
  an	
  iden2fier?                                 with	
  dataset	
  unique	
  ID                                                     search	
  in	
  reference	
                                                 dataset	
  reuse.	
  	
  
                                    (DOI,	
  url,	
  accession	
  #)                                                                                                                    sec2ons	
  	
  of	
  all	
  papers                                           Manual	
  disambigua2on	
  not	
  
                                                                                                                                                                                                                                                                     required:	
  	
  can	
  be	
  automated	
  
                                      IDs	
  are	
  difficult	
  to	
                                                                                    DOI/ID	
  reference	
  search	
  possible	
  in	
  full-­‐text	
  portals	
  like	
                           pending	
  API	
  support.
                                      unambiguously	
  iden2fy	
  in	
                                                                                 PubMed	
  Central	
  and	
  HighWire	
  Press,	
  however	
  portal	
  
                                      full	
  text	
  	
  unless	
  they	
  have	
  a 	
                                                               coverage	
  is	
  limited	
  and	
  search	
  is	
  not	
  restricted	
  to	
                                 Does	
  not	
  require	
  access	
  to	
  
                                      unique	
  paCern	
  (DOI)	
  or	
                                                                                references	
  sec2on.                                                                                         full-­‐text
                                      unusual	
  prefix	
  or	
  suffix.                                with	
  dataset	
  unique	
  
                                                                                                                                                         DOI/ID	
  search	
  works	
  in	
  Google	
  Scholar,	
  but	
  scope	
  is	
                               This	
  cita2on	
  paCern	
  is	
  currently	
  
                                                                                                     ID
                                                                                                                                                         poorly	
  defined,	
  results	
  are	
  messy.                                                               rare

                                                                                                                                                                                                                                                                     This	
  cita2on	
  paCern	
  is	
  difficult	
  
                                                                                                                                                         DOI/ID	
  search	
  not	
  supported	
  by	
  ISI	
  Web	
  of	
  Science	
  or	
                           to	
  track	
  with	
  exis2ng	
  tool	
  
                                                                                                                                                         Scopus                                                                                                      limita2ons



                                                                                             with	
  (submi-er	
  surname	
  AND	
  
                                                                                             repository	
  name),	
  
publicly	
                     dataset	
  submission	
  record	
  has	
                      and	
  also                                                                                                                                                                 This	
  cita2on	
  paCern	
  
archived	
                      submiCer	
  name	
  or	
  dataset	
                          (dataset	
  9tle	
  AND                                 search	
  in	
  full	
  text	
  of	
  all 	
                 sort	
  hits	
  to	
  disambiguate	
                   (accession	
  numbers	
  in	
  full	
  
dataset                                       2tle?                                          	
  repository	
  name)                                            papers                                             reuse	
  from	
  submission                           text)	
  is	
  very	
  common	
  in	
  
                                                                                                                                                                                                                                                                         some	
  subdisciplines,	
  so	
  
                                   Names	
  and	
  2tles	
  are	
  messy	
                                                                                                                                     Disambigua2on	
  is	
  2me	
                              probably	
  finds	
  most	
  
                                                                                                                                                    Requires	
  ability	
  to	
  query	
  
                                   iden2fiers                                                                                                                                                                   consuming                                                 reuses.
                                                                                                                                                    full	
  text	
  across	
  all	
  
                                                                                                                                                    literature	
  that	
  may	
                                Requires	
  access	
  to	
  full	
  text	
  of	
  
                                                                                                with	
  (first	
  author	
  surname	
                contain	
  reuse                                           search	
  hits	
  for	
  sor2ng
                                                                                                AND	
  repository	
  name)




                                                                                                                                                                                                                sort	
  hits	
  to	
  disambiguate	
  
                           dataset	
  submission	
  record	
  men2ons	
                                                      gather	
  papers	
  that	
  cite	
  the	
  data 	
                                                                                          This	
  cita2on	
  paCern	
  
                                                                                                with	
  data	
                                                                                                      reuse	
  from	
  other	
  
                            data	
  collec2on	
  ar2cle	
  publica2on?                                                                  collec2on	
  paper                                                                                                               (cita2on	
  to	
  data	
  crea2on	
  
                                                                                                collec2on	
  ar2cle’s	
                                                                                              cita2on	
  contexts
                                                                                                                                                                                                                                                                         paper)	
  is	
  very	
  common	
  in	
  
                                                                                                journal,	
  volume,	
                                                                                            Disambigua2on	
  is	
  2me	
                            some	
  subdisciplines,	
  so	
  
                                                                                                page,	
  etc.                   Cita2on	
  history	
  export	
  is	
  2me	
                                                                                              probably	
  finds	
  most	
  reuses.
                           Link	
  to	
  data	
  collec2on	
  paper	
  oVen	
                                                                                                                                    consuming:	
  most	
  cita2ons	
  are	
  
                                                                                                                                consuming:	
  	
  automa2on	
  not	
  
                           missing	
  from	
  dataset	
  submission	
  record,	
                                                                                                                                 not	
  in	
  the	
  context	
  of	
  reuse
                                                                                                                                supported.
                           especially	
  when	
  dataset	
  submission	
  
                           predates	
  ar2cle	
  publica2on.
                                                                                                                                Only	
  finds	
  cita2ons	
  indexed	
  by	
                                     Requires	
  access	
  to	
  full	
  text	
  of	
  
                                                                                                                                cita2on	
  databases                                                            search	
  hits	
  for	
  sor2ng




      This	
  flow	
  s2ll	
  misses	
  aCribu2ons	
  embedded	
  in	
  supplementary	
  informa2on,	
  reuses	
  
      aCributed	
  through	
  a	
  query	
  descrip2on,	
  etc.
                                                                                                                                                                                                                                                                        Heather	
  Piwowar,	
  v1.0,	
  CC-­‐BY
10 * 100 = 1000
publication-
based datasets
deposited in
   2005
1. following citations to the
paper that describes the data
   collection, then filtering.
2. searching for accession
numbers, urls, and DOIs in
         full text
http://api.plos.org/2011/05/31/announcing_the_plos_search_api/
2005 long time ago

biomedicine familiar, also very
dominant

search interfaces not well designed
for this task

helpdesks are very helpful
stay tuned for results
poster at ASIS&T, SIGUSE
I post my data, code, and statistical scripts:
http://researchremix.org
Share yours too!
-> Open Notebook Science


                         http://www.flickr.com/photos/myklroventine/892446624/
https://notebooks.dataone.org/tracking1000datasets/
thank you
Todd Vision,
  Estephanie Sta Maria
  Jonathan Carlson
  Dryad and DataONE teams
The open science online community and those who
  release their articles, datasets and photos openly

More Related Content

More from Heather Piwowar

How to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHow to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHeather Piwowar
 
Intro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseIntro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseHeather Piwowar
 
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 The Future of OA: 
The Impact of Open Access on Readership and Subscription ... The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...Heather Piwowar
 
The time has come to talk of... who should own scholarly infrastructure?
 The time has come to talk of... who should own scholarly infrastructure? The time has come to talk of... who should own scholarly infrastructure?
The time has come to talk of... who should own scholarly infrastructure?Heather Piwowar
 
What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...Heather Piwowar
 
Data science needs Data and lots of it
Data science needs Data and lots of itData science needs Data and lots of it
Data science needs Data and lots of itHeather Piwowar
 
Impactstory OA week 2017
Impactstory OA week 2017Impactstory OA week 2017
Impactstory OA week 2017Heather Piwowar
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedHeather Piwowar
 
What's your Impactstory?
What's your Impactstory?What's your Impactstory?
What's your Impactstory?Heather Piwowar
 
capturing the impact of software AAS 2017
capturing the impact of software AAS 2017capturing the impact of software AAS 2017
capturing the impact of software AAS 2017Heather Piwowar
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedHeather Piwowar
 
Building Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipBuilding Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipHeather Piwowar
 
Right time, right place, to change the world
Right time, right place, to change the worldRight time, right place, to change the world
Right time, right place, to change the worldHeather Piwowar
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
 
Analyzing data about our data
Analyzing data about our dataAnalyzing data about our data
Analyzing data about our dataHeather Piwowar
 
Libraries empowering scholars (and scholarly communication) through #altmetrics
Libraries empowering scholars (and scholarly communication) through #altmetricsLibraries empowering scholars (and scholarly communication) through #altmetrics
Libraries empowering scholars (and scholarly communication) through #altmetricsHeather Piwowar
 
AAAS 2012: Data about the costs and benefits of Open Research DAta
AAAS 2012: Data about the costs and benefits of Open Research DAtaAAAS 2012: Data about the costs and benefits of Open Research DAta
AAAS 2012: Data about the costs and benefits of Open Research DAtaHeather Piwowar
 

More from Heather Piwowar (20)

Unsub Lightning Talk
Unsub Lightning TalkUnsub Lightning Talk
Unsub Lightning Talk
 
How to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHow to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your University
 
Intro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseIntro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid Use
 
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 The Future of OA: 
The Impact of Open Access on Readership and Subscription ... The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 
The time has come to talk of... who should own scholarly infrastructure?
 The time has come to talk of... who should own scholarly infrastructure? The time has come to talk of... who should own scholarly infrastructure?
The time has come to talk of... who should own scholarly infrastructure?
 
What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...
 
Data science needs Data and lots of it
Data science needs Data and lots of itData science needs Data and lots of it
Data science needs Data and lots of it
 
Oadoi and libraries
Oadoi and librariesOadoi and libraries
Oadoi and libraries
 
Impactstory OA week 2017
Impactstory OA week 2017Impactstory OA week 2017
Impactstory OA week 2017
 
Paperbuzz sneak peek
Paperbuzz sneak peekPaperbuzz sneak peek
Paperbuzz sneak peek
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learned
 
What's your Impactstory?
What's your Impactstory?What's your Impactstory?
What's your Impactstory?
 
capturing the impact of software AAS 2017
capturing the impact of software AAS 2017capturing the impact of software AAS 2017
capturing the impact of software AAS 2017
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learned
 
Building Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipBuilding Skyscrapers with our Scholarship
Building Skyscrapers with our Scholarship
 
Right time, right place, to change the world
Right time, right place, to change the worldRight time, right place, to change the world
Right time, right place, to change the world
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset use
 
Analyzing data about our data
Analyzing data about our dataAnalyzing data about our data
Analyzing data about our data
 
Libraries empowering scholars (and scholarly communication) through #altmetrics
Libraries empowering scholars (and scholarly communication) through #altmetricsLibraries empowering scholars (and scholarly communication) through #altmetrics
Libraries empowering scholars (and scholarly communication) through #altmetrics
 
AAAS 2012: Data about the costs and benefits of Open Research DAta
AAAS 2012: Data about the costs and benefits of Open Research DAtaAAAS 2012: Data about the costs and benefits of Open Research DAta
AAAS 2012: Data about the costs and benefits of Open Research DAta
 

Recently uploaded

Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 

Recently uploaded (20)

Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 

IASSIST 2011 presentation: Tracking Data Reuse Motivations, Methods, and Obstacles

  • 1. Tracking Data Reuse Motivations, Methods, and Obstacles Heather  Piwowar DataONE  postdoc  with  NESCent  and  Dryad @researchremix   IASSIST2011  #iassist
  • 11. ? http://www.flickr.com/photos/ryanr/142455033/
  • 14.
  • 15.
  • 16. In 2009, 116 articles cited ORNL DAAC data. Finding these articles took 70-80 hours across at least 12 resources all chosen from a deep understanding of this specific research domain then the full text of all the hits were manually reviewed Valerie Enriquez interview with James Kidder http://openwetware.org/wiki/DataONE:Notebook/Reuse_of_repository_data
  • 17. How  to  iden9fy  Dataset  Reuse  in  the  published  literature This  cita2on  paCern  (dataset   DOI/ID  in  references  sec2on)  is   used  almost  exclusively  for   dataset  has  an  iden2fier? with  dataset  unique  ID search  in  reference   dataset  reuse.     (DOI,  url,  accession  #) sec2ons    of  all  papers Manual  disambigua2on  not   required:    can  be  automated   IDs  are  difficult  to   DOI/ID  reference  search  possible  in  full-­‐text  portals  like   pending  API  support. unambiguously  iden2fy  in   PubMed  Central  and  HighWire  Press,  however  portal   full  text    unless  they  have  a   coverage  is  limited  and  search  is  not  restricted  to   Does  not  require  access  to   unique  paCern  (DOI)  or   references  sec2on. full-­‐text unusual  prefix  or  suffix. with  dataset  unique   DOI/ID  search  works  in  Google  Scholar,  but  scope  is   This  cita2on  paCern  is  currently   ID poorly  defined,  results  are  messy. rare This  cita2on  paCern  is  difficult   DOI/ID  search  not  supported  by  ISI  Web  of  Science  or   to  track  with  exis2ng  tool   Scopus limita2ons with  (submi-er  surname  AND   repository  name),   publicly   dataset  submission  record  has   and  also This  cita2on  paCern   archived   submiCer  name  or  dataset   (dataset  9tle  AND search  in  full  text  of  all   sort  hits  to  disambiguate   (accession  numbers  in  full   dataset 2tle?  repository  name) papers reuse  from  submission text)  is  very  common  in   some  subdisciplines,  so   Names  and  2tles  are  messy   Disambigua2on  is  2me   probably  finds  most   Requires  ability  to  query   iden2fiers consuming reuses. full  text  across  all   literature  that  may   Requires  access  to  full  text  of   with  (first  author  surname   contain  reuse search  hits  for  sor2ng AND  repository  name) sort  hits  to  disambiguate   dataset  submission  record  men2ons   gather  papers  that  cite  the  data   This  cita2on  paCern   with  data   reuse  from  other   data  collec2on  ar2cle  publica2on? collec2on  paper (cita2on  to  data  crea2on   collec2on  ar2cle’s   cita2on  contexts paper)  is  very  common  in   journal,  volume,   Disambigua2on  is  2me   some  subdisciplines,  so   page,  etc. Cita2on  history  export  is  2me   probably  finds  most  reuses. Link  to  data  collec2on  paper  oVen   consuming:  most  cita2ons  are   consuming:    automa2on  not   missing  from  dataset  submission  record,   not  in  the  context  of  reuse supported. especially  when  dataset  submission   predates  ar2cle  publica2on. Only  finds  cita2ons  indexed  by   Requires  access  to  full  text  of   cita2on  databases search  hits  for  sor2ng This  flow  s2ll  misses  aCribu2ons  embedded  in  supplementary  informa2on,  reuses   aCributed  through  a  query  descrip2on,  etc. Heather  Piwowar,  v1.0,  CC-­‐BY
  • 18.
  • 19. 10 * 100 = 1000
  • 21. deposited in 2005
  • 22.
  • 23.
  • 24. 1. following citations to the paper that describes the data collection, then filtering.
  • 25.
  • 26. 2. searching for accession numbers, urls, and DOIs in full text
  • 27.
  • 29. 2005 long time ago biomedicine familiar, also very dominant search interfaces not well designed for this task helpdesks are very helpful
  • 30. stay tuned for results poster at ASIS&T, SIGUSE
  • 31. I post my data, code, and statistical scripts: http://researchremix.org Share yours too! -> Open Notebook Science http://www.flickr.com/photos/myklroventine/892446624/
  • 33. thank you Todd Vision, Estephanie Sta Maria Jonathan Carlson Dryad and DataONE teams The open science online community and those who release their articles, datasets and photos openly