In the U.S., Open Data -- or data sharing – has advanced mainly through grassroots efforts. Historically, data sharing was the domain of social scientists (e.g. ICPSR) with notable exceptions like the NIH’s National Center for Biotechnology Information, and a few federal agencies who dealt in very big data – NASA and NOAA. It’s fair to say that every discipline, even sub-discipline, has taken a different approach to the question of whether and how to openly share their data.
Starting around 2005, organizations like Science Commons (part of Creative Commons) and the Open Knowledge Foundation, started to formalize the idea of open data, building on the Open Access movement in scholarly communication.
These early efforts focused mainly on health and life sciences, probably because that’s where OA was getting the most traction. They eventually came up with tools like the Open Database License and CC0 waiver as tools to help researchers share their data.
And, for a variety of reasons, U.S. funding agencies haved started to get involved – beginning with the NIH (managers of NCBI) then later the NSF, leading up to…
Even more recently, we’re experiencing a sudden surge of scientific misconduct cases, some very serious, others contested. Turns out that scientific reproducibility is a lot harder than it sounds, and data availability is a necessary but insufficient component.
Related to all this question of reproducibility, and a big driver for researchers to open up their data, are the growing requirements from publishers that data associated with articles they publish be deposited and retrievable publically. PLoS is just the latest of these. Nature has required deposit of sequence data to GenBank before publication for many years, and in some cases an entire discipline adopts this practice -- like the evolutional biology with the Dryad data repository.
An OSTP memo calling for ALL funding agencies of a certain size to develop and Open Access strategy for articles AND research data.
The U.S. federal funding agencies are still working out their policies for data, and seem to aligning with their historical practices. So NASA, for example, will make data sharing a requirement and already have the infrastructure to support it, NOAA and NIH are similar, while NSF and other less physical and life science-oriented funders are taking more time.
Now we’re seeing a few private foundations that fund research, and a few State governments, considering similar open access policies for the research they fund.
NIH has often been a bellweather for funder policies in the U.S., and with the recent hire of Phil Bourne as their Chief Data Officer, that trend is continuing. In a fairly short time he’s developed a framework for thinking about digital assets in the context of academic research and is beginning to fund new pieces of infrastructure.
Note here an important development – he mentions software as equal in importance to articles and data. This is a theme of growing importance in the U.S.
Integrated journals may allow authors to embargo one or more datafiles within a data package from release for one year following the data of publication, or they may disallow this option. Editors may also direct Dryad to grant longer custom embargoes upon request. It is of interest to know how often embargoes are used when authors are given the choice, as a measure of the level of comfort researchers have with the idea of publishing data alongside an article. Reassuringly, we find that since 2009, more than 90% of datafiles are being released either immediately or at the time of article publication in those cases where the authors have freedom to choose. Less than 1% of datafiles were placed under specially requested embargoes of greater than one year, and those came from a limited number of journals (Vision T, Scherle R, Mannheimer, S (2013) Embargo selections of Dryad data authors. Figshare http://dx.doi.org/10.6084/m9.figshare.805946).
Stanford Repository List
The SHARE initiative from the ARL, AAU, and APLU is very well represented at this meeting so I won’t discuss it in detail. Just to say that it’s a new, and one of the first, national initiatives coming from Higher Education and addressing the problem of Open Access to publications and data at scale.
The first problem SHARE identified is how to how to know what researchers have done that should be shared. In the U.S., most institutions -- including major research universities -- have no idea what their researchers have accomplished, much less whether or not they’re complying with funder requirements, local Open Access policies, etc. It’s very difficult to keep up with new publications, much less other research products that aren’t part of the formal publishing ecosystem. Today there exists no single, structured way to report research output releases in timely and reliable manner.
The Notification System will be a digest of metadata about publicly available research from which institutions, repositories, and funding agencies can receive information about research outputs they’re interested in. The digest will be created mainly through harvest of available streams of data and will not require the direct participation of principal investigators or present additional burdens to them. Notification System Project underway: Beta release fall 2014, Full release fall 2015.
Longer-term, there are big issues to tackle related to rights, and relating SHARE research to the Open Access goals of the federal mandates it was intended to address. As all of you know, there are tensions between researchers’ desire to get credit for their work in the scholarly reward system without necessarily giving up control of it to get that credit. In the world of research publications that got sorted out a long time ago, and researchers are motivated to publish as quickly as they can. That’s not always true for data or related software, so SHARE is going to work on a rights framework that includes data.
So what are these areas of RDM services? We distinguished types of service that characterize what is most common for libraries data management support services And distinguished levels of service for particular libraries by how many of these services they offered, which corresponds roughly to the depth of library resources devoted to these services.
Levels of service ranged from website resources on data management planning, to staffed consulting and archiving. But most libraries with RDM had offering in our three categories of services
Most common (more than 40 institutions) RDM services are: Providing Data management Planning services – mainly through Online Resources but also DMP consulting
Broader Data Management support – such as – training on particular DM topics Or providing Research metadata support
And providing support for Sharing Data – such as on data citation And many libraries are starting to directly archive researchers’ data.
<I will go into just a bit of detail and key findings on these services>
89% provide researchers with what we call Consulting – but defined broadly as in-person help of some kind, both email and office visits. (Researchers rarely if ever visit the library direclty for help) Training sessions for writing data management plans are also a common offering. 61% - most as in-person workshops and some delivered online
He observes that trying to change behavior in an intensely competitive field like academic research is counter-productive, if it’s even possible. That our top priority in HE should be to make every researcher and student ‘data literate’ in the sense of knowing how to create and manage data efficiently and effectively, and provide them with simple-to-use tools to publish and cite data and software, so they can reap the credit.
Opening up data – Jisc and CNI conference 10 July 2014
• An IP rights strategy, including the promotion of
university-based Open Access policies and favorable
licensing terms, will be part of the scaffolding that will
enable the layers of SHARE to develop.
• Rights subgroup formed to deal with this
• A broad collective action by AAU and APLU – to be
discussed with AAU Presidents in April