Opportunities for DataExchange: optimising theconditions for datasharingSusan ReillyLERU Doctoral Summer School, 9th Jul, ...
Thank you!
LIBER & the European Research InfrastructureLIBER (Association of European Research Libraries)     -Projects:            C...
Ready to ride the wave… ?
Rule #11: Don’t Publicize! Unless the break is a well known spot, like for e.g. Lahinch, Bundoran, or Strandhill, taking p...
Reason not to share surf info• Other people will steal my wave• Unethical to share e.g.inexperienced surfers on dangerous ...
15 petabytes (15million gigabytes)of data annually –enough to fill morethan 1.7 milliondual-layer DVDs ayear!
The Vision    “With a proper scientific einfrastructure, researchers in    different domains can collaborate on the same d...
Now and Next• Authentication & authorisation• New skills
The Opportunities for Data Exchange Project• identify, collate, interpret and deliver evidence of emerging  best practices...
Steps to creating the conditions for data sharing• Understand data sharing today  • Collection of "success stories”, “near...
Tales of Data sharing• 21 stories  •   scientific communities  •   infrastructure initiatives  •   management  •   other r...
The Astronomical Importance of Discoverability• Galaxy Zoo (Carolin Liefke)• Pre-processed data shared with the public to ...
Hypotheses  “Without the infrastructure  that helps scientists manage  their data in a convenient  and efficient way, no  ...
Hypotheses ExpectedCategory: Infrastructure  “An international research community needs  an international data infrastruct...
Tension between hypothesesCat: Legislation, Education, Behaviour  “Premature data releases should not be  enforced, but th...
Hypotheses by Category4.Attitudes6.Policies8.Infrastructure10.DMPs,Citability11.Dependency ondiscipline
Barriers & Drivers                     accreditation & certification education                   culture & attitude     le...
Integrating Data & Publications• 3 stakeholder groups  • Publishers  • Researchers  • Libraries & data centres
How stakeholdersinteract
The DataPublication Pyramid         (1) Data                         contained and                        explained within...
Where do you currently store your research data?(multiple answers possible)                      Source: PARSE.Insight sur...
The Pyramid’s likely short term reality:                                            (1) Top of the                        ...
The Ideal Pyramid                                (1) More                           integration of text                   ...
A famous paper in Nature:DNA structure - 1953                                           •     1 page                      ...
Nature in 2001:The human genome issue• 62 pages, 49 figures, 27 tables       Source: V. Kiermer, Nature Publishing Group, ...
A thousand genomes – 2010http://www.nature.com/nature/journal/v467/n7319/full/nature09534.html                            ...
Elsevier offers gene and protein viewersfrom within the article, to data stored elsewhere:                                ...
Articles: the currency of Science
Issues for researchers• Researchers need somewhere to put data and make it safe  for reuse• Researchers need to control it...
Library support for the researcherLibraries and data centres must support…                                                ...
7 Areas of Opportunity•    Availability•   Findability•   Interpretability•   Reusability•   Citability•   Curation•   Pre...
Researcher OpportunitiesData Issue:        Researchers opportunities:Availability       Researchers demand their data be t...
Publishers’ OpportuntiesData Issue:        Publishers opportunities (Chapter 3):Availability       Articles with data prov...
Libraries’ OpportunitiesData Issue:             Libraries and data centres opportunities (Chapter 4):Availability         ...
Q. What exactly should the role of the library be and whatare the skills we need?
Data Citation: Getting Credit!• Challenges:  • granularity: which bits inside the dataset is being referred to  • versioni...
Some Findings• Citations with persistent identifiers should be listed in the  references/bibliography to enable tracking o...
Our Relationship    Many researchers do not appear to see the value and benefits of data      citation. There is a gap, wh...
Now & Next• For ODE:  • Verify hypotheses as drivers and barriers  • Translate findings for various target groups• For LIB...
Now and Next• Authentication & authorisation• New skills
Addressing Trust and Data Curation• AAA Study  • Authentication and authorisation infrastructure for European    researche...
Addressing Trust and Data Curation• Alliance for Permanent Access to the Record of Science in Europe Network (APARSEN)  • ...
Back to surfing…What was the result of all this sharing?
http://www.brain-cloud.net/wp-content/uploads/2011/05/fergal-smith.jpg
Has enabeled surfers to do things they only dreamedabout• Big wave hunters….http://theweek.com/article/index/227955/the-bi...
Further ReadingRiding the Wave (2011)http://www.cordis.europa.eu/fp7/ict/e.../hlg-sdi-report.pdfODE/APARSEN Publicationsht...
CreditsSlide reused from presentations by:Salvatore Mele (CERN)Eefke Smit (STM)Hans Pfeiffenberger (Helmholtz)Most images ...
Thank you again!
Research Data Sharing LERU
Research Data Sharing LERU
Research Data Sharing LERU
Research Data Sharing LERU
Research Data Sharing LERU
Upcoming SlideShare
Loading in...5
×

Research Data Sharing LERU

1,373

Published on

Presentation from LERU Doctoral Summer School 2012, Barcelona

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,373
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • I thought I would start where most people normally end by first saying thank you. I work on a lot of projects which focus on research data sharing and curation. I talk with libraries and publishers, funders, and research institutes about the type of infrastructure we need to promote and realise the full potential of data sharing. Regardless of the context, whether it be data preservation, curation, access, resuse, citation, the key to the success of this infrastructure is buy-in from researchers. Putting the carrot and the stick of incentivisation and mandating aside, research need to be convinced that data sharing is something they want to do. So, I’m very happy that LERU has invited to be here today, to discuss the drivers and barriers for data sharing with actual researchers. So thank you to LERU for this opportunity and thank you for having enough interest in this subject to be here today and to hopefully take up the data sharing baton.
  • Before we get in to the drivers and barriers for data sharing I would like to ‘share’ 2 things about me with you.. First of all, I am a librarian. I work as project officer for LIBER, which is the Association of European Research Libraries. We have 380 member libraries from all over Europe. Our projects really focus on developing the role of the library as part of the Europeana Research Infrastructure and they fall into 3 main categories.
  • So, waves and surfing are anaologies that are often used when referring to the data deluge and research data sharing. This report ‘Riding the Wave’ which was written by the High Level Expert Group n Scientific Data in october 2010 talks about how Europe can gain from the rising tide of scientific data.
  • Doing this since 2008. Involves 160 computer centres around the world
  • Called for a frameworkk for collaborative data infrastructure to outline how different stakeholders interact with the data sharing system
  • Researcher as end user and researcher as data creator
  • Libraries and data centres must support data publishing as a prerequisite for data availability, including persistent identification/citation of datasets, and solutions for data description and retrieval, which together facilitate findability. They must also ensure that data is properly documented as a condition for data interpretability and re-usability and prepare for long-term data archiving including data curation and preservation.
  • Called for a frameworkk for collaborative data infrastructure to outline how different stakeholders interact with the data sharing system
  • I thought I would start where most people normally end by first saying thank you. I work on a lot of projects which focus on research data sharing and curation. I talk with libraries and publishers, funders, and research institutes about the type of infrastructure we need to promote and realise the full potential of data sharing. Regardless of the context, whether it be data preservation, curation, access, resuse, citation, the key to the success of this infrastructure is buy-in from researchers. Putting the carrot and the stick of incentivisation and mandating aside, research need to be convinced that data sharing is something they want to do. So, I’m very happy that LERU has invited to be here today, to discuss the drivers and barriers for data sharing with actual researchers. So thank you to LERU for this opportunity and thank you for having enough interest in this subject to be here today and to hopefully take up the data sharing baton.
  • Research Data Sharing LERU

    1. 1. Opportunities for DataExchange: optimising theconditions for datasharingSusan ReillyLERU Doctoral Summer School, 9th Jul, 2012
    2. 2. Thank you!
    3. 3. LIBER & the European Research InfrastructureLIBER (Association of European Research Libraries) -Projects: Content Europeana Libraries Europeana Newspapers Policy MEDOANET Infrastructure APARSEN AAA Study ODE
    4. 4. Ready to ride the wave… ?
    5. 5. Rule #11: Don’t Publicize! Unless the break is a well known spot, like for e.g. Lahinch, Bundoran, or Strandhill, taking photo’s and posting them on the Internet is regarded as unacceptable in the surfing community. If you publicize a break in this manner you draw attention to it, which in turns draws more people to it, which means a place gets more crowded and there is more aggro in the water. The more you talk about a break to those who haven’t surfed it the more damage you do to it, and yourself in the long run because the more people there are in the water the less waves there are for you. Think about it.http://www.boards.ie/vbulletin/showthread.php?s=fc082712ef1354ecf7cb0e53dc71d519&t=2055828999
    6. 6. Reason not to share surf info• Other people will steal my wave• Unethical to share e.g.inexperienced surfers on dangerous breaks get hurt• We won’t get recognition e.g. local surfers loose out to visiting pros• .............
    7. 7. 15 petabytes (15million gigabytes)of data annually –enough to fill morethan 1.7 milliondual-layer DVDs ayear!
    8. 8. The Vision “With a proper scientific einfrastructure, researchers in different domains can collaborate on the same data set, finding new insights. They can share a data set easily across the globe, but also protect its integrity and ownership. They can use, re-use and combine data, increasing productivity. They can more easily solve today’s Grand Challenges, such as climate change and energy supply. Indeed, they can engage in whole new forms of scientific inquiry, made possible by the unimaginable power of the e-infrastructure to find correlations, draw inferences and trade ideas and information at a scale we are only beginning to see.”
    9. 9. Now and Next• Authentication & authorisation• New skills
    10. 10. The Opportunities for Data Exchange Project• identify, collate, interpret and deliver evidence of emerging best practices in sharing, re-using, preserving and citing data, the drivers for these changes and barriers impeding progress, in forms suited to each audience• policy makers, funders, infrastructure operators, data centres, data providers and users, libraries and publishers
    11. 11. Steps to creating the conditions for data sharing• Understand data sharing today • Collection of "success stories”, “near misses” and “honourable failures” in data sharing, re-use and preservation• Data & scholarly communications • Integrating data and publications • Best practice in data citation • New roles• Identify drivers and barriers • Interviews with stakeholder to seek consensus Foto "Bell", Noordewierweg 116, Amersfoort.
    12. 12. Tales of Data sharing• 21 stories • scientific communities • infrastructure initiatives • management • other relevant stakeholders
    13. 13. The Astronomical Importance of Discoverability• Galaxy Zoo (Carolin Liefke)• Pre-processed data shared with the public to carry out specific tasks (e.g. classifying galaxies)• Discoverability a major challengein data sharing- easier, moresophisticated data mining, morecomplex automated processing
    14. 14. Hypotheses “Without the infrastructure that helps scientists manage their data in a convenient and efficient way, no culture of data sharing will evolve.” Stefan Winkler-Nees (German Research Foundation, DFG)
    15. 15. Hypotheses ExpectedCategory: Infrastructure “An international research community needs an international data infrastructure and international support.” "After decades of reports with data in their titles the community found inadequate services almost no international support and few solutions.”
    16. 16. Tension between hypothesesCat: Legislation, Education, Behaviour “Premature data releases should not be enforced, but the mere possibility of data misinterpretation is no reason for not sharing data.” “To avoid misuse and lack of acknowledgement of very special data, access should be restricted to skilled persons trained by the data creator.”
    17. 17. Hypotheses by Category4.Attitudes6.Policies8.Infrastructure10.DMPs,Citability11.Dependency ondiscipline
    18. 18. Barriers & Drivers accreditation & certification education culture & attitude legislation quality fundingcooperation policies data sharingpublishing & visibility data flow improvements Infrastructure disciplines career efficiency
    19. 19. Integrating Data & Publications• 3 stakeholder groups • Publishers • Researchers • Libraries & data centres
    20. 20. How stakeholdersinteract
    21. 21. The DataPublication Pyramid (1) Data contained and explained within the article (2) Further data explanations in any kind of supplementary (3) Data files to articles referenced from the article and held in data centers and (4) Data repositoriespublications, describing available datasets (5) Data in drawers and on disks at the institute
    22. 22. Where do you currently store your research data?(multiple answers possible) Source: PARSE.Insight survey 2009, N = 1202
    23. 23. The Pyramid’s likely short term reality: (1) Top of the pyramid is stable but small (2) Risk that supplements to articles turn into Data Dumping (3) Too many places disciplines lack a community endorsed data archive (4) Estimates are that at least 75 % of research data is never made openly avaiable 26
    24. 24. The Ideal Pyramid (1) More integration of text and data, viewers and seamless links to interactive datasets (2) Only if data cannot be integrated in (3) Seamless links article, and only (bi-directional) relevant extra between explanations publications and data, interactive(4) More Data viewers within the Journals that articles describedatasets, datamgt plans anddata methods 27
    25. 25. A famous paper in Nature:DNA structure - 1953 • 1 page • 2 authors • 1 figure • no data Source: V. Kiermer, Nature Publishing Group, 2011
    26. 26. Nature in 2001:The human genome issue• 62 pages, 49 figures, 27 tables Source: V. Kiermer, Nature Publishing Group, 2011
    27. 27. A thousand genomes – 2010http://www.nature.com/nature/journal/v467/n7319/full/nature09534.html Raw data: 12,145 SRA Raw data: 12,145 SRA run ids submitted to run ids submitted to Short Read Archive Short Read Archive Source: V. Kiermer, Nature Publishing Group, 2011
    28. 28. Elsevier offers gene and protein viewersfrom within the article, to data stored elsewhere: 31
    29. 29. Articles: the currency of Science
    30. 30. Issues for researchers• Researchers need somewhere to put data and make it safe for reuse• Researchers need to control its sharing and access• Researchers need the ability to integrate data and publication• Researchers need to get creditfor data as a first class researchobject• Researchers need someone topay for the costs of data availabilityand re-use
    31. 31. Library support for the researcherLibraries and data centres must support… Availability• data as first class research object: publishing, persistent identification/citation of datasets• data description, metadata, standards Findability documentation and retrieval• proper documentation of data Interpretability• long-term data archiving including data curation and preservation Re-usability
    32. 32. 7 Areas of Opportunity• Availability• Findability• Interpretability• Reusability• Citability• Curation• Preservation
    33. 33. Researcher OpportunitiesData Issue: Researchers opportunities:Availability Researchers demand their data be treated as first class research objects Researchers loosen control over data Define roles of responsibility and controlFindability Agree convention to propose to publishers regarding data citation Use of persistent identifiers such as DOI’s Ensure common citation practicesInterpretability Recognize that data require metadata and work towards community best practice in metadata developmentRe-usability Be concerned about the long term ability for secondary use and consider or seek out responsible preservation actionsCitability Agree a convention for data citation Follow metadata standards for datasets Use of persistent identifiers such as DOI’sCuration Develop sustainable and realistic data management plans Collaboration with public data archivesPreservation Develop sustainable realistic preservation plans Active engagement with public data archives
    34. 34. Publishers’ OpportuntiesData Issue: Publishers opportunities (Chapter 3):Availability Articles with data provide richer content and higher usage Impose stricter editorial policies about availability of underlying data which is in line with general funder’s trends Ensure data is stored in a safe place, preferably a public repository Be transparent about curation and preservation of submitted dataFindability Ensure bi-directional links between data and publications Ensure common citation practicesInterpretability Provide services around data such as viewer apps for underlying data from within the article or interactive graphs, tables and images Data PublicationsRe-usability Interactive data from within articles Links to the relevant datasets, not just to the database Data PublicationsCitability Establish uniform data citation standards Follow metadata standards for datasets Use of persistent identifiers such as DOI’s Data PublicationsCuration Transparency about curation of submitted data Collaboration with public data archivesPreservation Transparency about preservation of submitted data Collaboration with public data archives
    35. 35. Libraries’ OpportunitiesData Issue: Libraries and data centres opportunities (Chapter 4):Availability  Lower barriers to researchers to make their data available.  Integrate data sets into retrieval services.Findability  Support of persistent identifiers.  Engage in developing common metadescription schemas and common citation practices.  Promote use of common standards and tools among researchersInterpretability  Support crosslinks between publications and datasets.  Provide and help researchers understand metadescriptions of datasets.  Establish and maintain knowledge base about data and their context.Re-usability  Curate and preserve datasets.  Archive software needed for re-analysis of data.  Be transparent about conditions under which data sets can be re-used (expert knowledge needed, software needed).Citability  Engage in establishing uniform data citation standards.  Support and promote persistent identifiers.Curation/Preservation  Transparency about curation of submitted data.  Promote good data management practice.  Collaborate with data creators  Instruct researchers on discipline specific best practices in data creation (preservation formats, documentation of experiment,…)
    36. 36. Q. What exactly should the role of the library be and whatare the skills we need?
    37. 37. Data Citation: Getting Credit!• Challenges: • granularity: which bits inside the dataset is being referred to • versioning: in case of dynamic or regularly updated data, which version is cited • retrievability: indicate via DOIs or accession numbers where the data are retrievable Overview of best practices reported in literature and through interviews with experts
    38. 38. Some Findings• Citations with persistent identifiers should be listed in the references/bibliography to enable tracking of citation metrics.• Publishers need to provide guidance for authors and referees on citation of data.• Researchers need to nurture awareness in their community of the benefits of data citation, and follow citation guidelines given by publishers and data centres. • Many researchers do not appear to see the value and benefits of data citation. How different communities can work together to promote this activity and the status of datasets as primary research outputs and publishable works in their own right, is an issue that still needs to be addressed.
    39. 39. Our Relationship Many researchers do not appear to see the value and benefits of data citation. There is a gap, which could be filled by libraries, in advocacy for data sharing, the use of subject specific repositories, and best practice in data citation. These, if filled, would increase the number of researchers sharing and reusing data.The issue still to beaddressed is how differentcommunities can work together to promote this activity andthe status of datasets asprimary research outputs andpublishable worksin their own right.
    40. 40. Now & Next• For ODE: • Verify hypotheses as drivers and barriers • Translate findings for various target groups• For LIBER: • Continue to find ways of supporting data sharing • Return to the framework for the collaborative data infrastructure
    41. 41. Now and Next• Authentication & authorisation• New skills
    42. 42. Addressing Trust and Data Curation• AAA Study • Authentication and authorisation infrastructure for European researchers • On the Riding the Wave wish list: “Distributed and collaborative authentication, authorisation and accounting” • Safe depositing of data • Authenticity and provenance • Ensure recognition • Safe environments for collaboration
    43. 43. Addressing Trust and Data Curation• Alliance for Permanent Access to the Record of Science in Europe Network (APARSEN) • look across the excellent work in digital preservation which is carried out in Europe and to try to bring it together under a common vision • Trust, Sustainability, Usability, Access
    44. 44. Back to surfing…What was the result of all this sharing?
    45. 45. http://www.brain-cloud.net/wp-content/uploads/2011/05/fergal-smith.jpg
    46. 46. Has enabeled surfers to do things they only dreamedabout• Big wave hunters….http://theweek.com/article/index/227955/the-biggest-wave-ever-surfed-the-mind-blowing-video
    47. 47. Further ReadingRiding the Wave (2011)http://www.cordis.europa.eu/fp7/ict/e.../hlg-sdi-report.pdfODE/APARSEN Publicationshttp://www.alliancepermanentaccess.org/index.php/community/current-projectsAAA Studyhttps://confluence.terena.org/display/aaastudy/AAA+Study+Home+Page
    48. 48. CreditsSlide reused from presentations by:Salvatore Mele (CERN)Eefke Smit (STM)Hans Pfeiffenberger (Helmholtz)Most images sourced through The European Library
    49. 49. Thank you again!
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×