Keynote address: "Ways and Needs to Promote Rapid Data Sharing" by Laurie Goodman of GigaScience.
Data is the base upon which all scientific discoveries are built, and data availability speeds the rate at which discoveries are made. Given that the overall goal for research is to improve human health and our environment, waiting to release data until after the first publication (sometimes taking years) is unacceptable. There are myriad issues that impede researchers from openly, and most importantly, rapidly sharing data, including lack of incentives: no credit, limited funding benefits, and little impact on career advancement; and cultural issues: the fear of being scooped. However, scientific publishers —the communicators of science and a key mechanism by which a researcher’s productivity is measured— can, and should, play a central role in promoting data sharing. Data citation and publication are just some of the ways we can support and encourage researchers who share data. Here, I will provide examples to help make clear the need for publishers to play an active role in this process and provide potential ways to facilitate our ability to promote open and rapid data sharing. This is not easy; but it is essential.
This document discusses the opportunities of open data sharing in the big data era, including quicker responses to problems, more collaboration, and harnessing crowd-sourced efforts. It provides examples of open data enabling scientific progress, such as genome analysis that helped control an E. coli outbreak. Open data can provide credit to data sharers and incentivize open science. The document advocates for removing barriers to open data like paywalls and silos through initiatives like GigaDB and GigaScience that integrate publishing and data platforms to maximize data utility.
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: the reproducibility crisis, and the need for transparency. Melbourne University 19th September 2014
Scott Edmunds from GigaScience on 'Publishing in the Open Data Era", at the "Open, Crowdsource and Blockchain Science!" hangout at Hackerspace.sg, 23rd March 2015
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...GigaScience, BGI Hong Kong
Scott Edmunds talk at the 7th Internation Conference on Genomics: "Channeling the Deluge: Reproducibility & Data Dissemination in the “Big-Data” Era. ICG7, Hong Kong 1st December 2012
"
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...GigaScience, BGI Hong Kong
The document discusses problems with the current scholarly publishing and incentive systems, including a lack of access to supporting data and computational methods, "gaming" of the peer review system through fake journals and referees, and increasing retractions over time. It proposes that new incentives are needed to reward open data sharing, transparent methods, reproducible research objects, and other practices that improve verification and reuse of findings. The GigaScience journal and associated platforms aim to address these issues through data publishing, open review, and integrated sharing of datasets, software, workflows and results.
This document discusses the opportunities of open data sharing in the big data era, including quicker responses to problems, more collaboration, and harnessing crowd-sourced efforts. It provides examples of open data enabling scientific progress, such as genome analysis that helped control an E. coli outbreak. Open data can provide credit to data sharers and incentivize open science. The document advocates for removing barriers to open data like paywalls and silos through initiatives like GigaDB and GigaScience that integrate publishing and data platforms to maximize data utility.
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: the reproducibility crisis, and the need for transparency. Melbourne University 19th September 2014
Scott Edmunds from GigaScience on 'Publishing in the Open Data Era", at the "Open, Crowdsource and Blockchain Science!" hangout at Hackerspace.sg, 23rd March 2015
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...GigaScience, BGI Hong Kong
Scott Edmunds talk at the 7th Internation Conference on Genomics: "Channeling the Deluge: Reproducibility & Data Dissemination in the “Big-Data” Era. ICG7, Hong Kong 1st December 2012
"
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...GigaScience, BGI Hong Kong
The document discusses problems with the current scholarly publishing and incentive systems, including a lack of access to supporting data and computational methods, "gaming" of the peer review system through fake journals and referees, and increasing retractions over time. It proposes that new incentives are needed to reward open data sharing, transparent methods, reproducible research objects, and other practices that improve verification and reuse of findings. The GigaScience journal and associated platforms aim to address these issues through data publishing, open review, and integrated sharing of datasets, software, workflows and results.
1. Gigascience Journal is a new open access journal and database focused on publishing and hosting large-scale genomic and other "big data" sets to promote sharing, reproducibility, and reuse.
2. The journal aims to address incentives for data sharing by providing data producers credit through DOIs for datasets and enabling attribution and impact tracking when data is cited.
3. As an example, genomic data from the 2011 E. coli outbreak in Germany was rapidly shared on the journal's website under an open license and assigned a DOI to allow analysis and citation by researchers worldwide working to understand the epidemic.
This document discusses the need for open science due to a reproducibility crisis in many scientific disciplines. It notes that many published findings cannot be replicated and estimates that at least two-thirds of published results in psychology and biomedicine may be incorrect. This represents a credibility crisis that undermines public trust in science. The document argues that adopting practices of open science such as preregistration, open data, and detailed documentation can help address this crisis by reducing biases, enabling replication, and increasing transparency and reproducibility. Open science is presented as a means of improving research quality and accelerating discovery for the benefit of both science and society.
This document summarizes a presentation by Nicole Nogoy from GigaScience about their journal, data platform, and database for large-scale data. GigaScience aims to enable more open access, collaboration and data sharing across disciplines by deconstructing research papers and providing credit for data, software and other digital outputs. It utilizes a big data infrastructure to integrate open access publishing with data and software publishing platforms. Examples are provided of data sets and analyses that have been published through GigaScience to maximize reuse and reproducibility.
This document provides an overview and introduction to the concepts and challenges of e-research. It begins by examining competing terms used to describe the transformation in research due to widespread digital technologies and networks. Key terms discussed include e-science, cyberinfrastructure, and e-research. The document then outlines the conceptual framework of the book, which is divided into sections on conceptualization, development, collaboration, visualization, data preservation and reuse, access and intellectual property, and case studies. Each chapter is briefly introduced. The concluding section notes areas for further research around chronicling transformations in scholarship and contextualizing changes within disciplinary cultures.
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
This document discusses open access to research data and peer review of data publications. It notes that as a first step, data underpinning journal articles should be made concurrently available in accessible databases. The Royal Society report in 2012 advocated for all science literature and data to be online and interoperable. Key issues in linking data to the scientific record are data persistence, quality, attribution, and credit. The document provides examples from astronomy of data reuse leading to new publications and cites a study finding poor reproducibility of ecological data sets over time as data availability declines. It outlines different levels of research data from raw to processed to published and discusses initiatives for open data publication and peer review.
Reproducibility, argument and data in translational medicineTim Clark
Failures in reproducibility and robustness of scientific findings are explored from statistical, historical, and argumentation theory perspectives. The impact of false positives in the literature is connected to failures in T1 and T2 biomedical translation, and is shown to have a significant impact on the costs of therapeutic development and availability of needed treatments to the public. Technological and social approaches to resolve these issues are presented. "Reproducibility" initiatives are critiqued as unsustainable and non-authoritative; improved requirements and methods for scientific communication of findings including data, methods and material are supported as the best approaches for improved reproducibility.
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds
The document discusses the benefits of open science data and argues that open data is important for addressing issues like climate change, disease outbreaks, and environmental problems. It provides an example where open genomic data from an E. coli outbreak in Germany was released under an open license and analyzed by researchers around the world, leading to important findings that helped control the outbreak. The document advocates for more open access and open data policies in Hong Kong to maximize the benefits of research and address issues like a lack of transparency in China.
Scientists used over 1 million home computers linked together to test potential drug compounds for treating anthrax. They were able to test 3.5 billion molecules in just 24 days, far more than any pharmaceutical company could test alone. The researchers identified 12,000 potential drug candidates, including antitoxins that could counter the lethal effects of the anthrax bacteria. This worldwide distributed computing project showed that harnessing vast computing resources in this way could lead to medical breakthroughs much faster than traditional methods.
This presentation was provided by Alberto Pepe of Authorea, during the NISO hot topic event "Preprints." The virtual conference was held on April 21, 2021.
The document describes the development of the Open Drug Discovery Teams (ODDT) mobile app, which aims to facilitate collaboration in drug discovery. The app aggregates open science data from sources like Twitter on topics related to rare and neglected diseases. It provides a magazine-style interface for browsing recent posts. The app and its backend were developed iteratively, with input from researchers during testing. The app harvests tweets with specific hashtags and allows users to endorse or reject posts. It can visualize chemical structures and tables linked from tweets. The goal is to connect researchers and data to help accelerate open drug discovery.
BGI training lecture: Scott Edmunds - Science 2.0, why new developments on th...Scott Edmunds
The document discusses how new developments on the web can help scientists share information more openly and collaboratively. It describes how tools like blogs, wikis, and social networks allow researchers to openly discuss data, findings, and ideas. As an example, it highlights how Chinese scientists rapidly shared genomic data on the 2011 E. coli outbreak online, enabling global crowdsourcing efforts that helped analyze and understand the outbreak more quickly. The document advocates for open science through open access, open source, and open data practices to accelerate discovery and make science fairer and more impactful.
This presentation was provided by Leslie McIntosh of Ripeta, during the NISO hot topic event "Preprints." The virtual conference was held on April 21, 2021.
Thesis Proposal, as presented for dissertation proposal defenseHeather Piwowar
The slides I presented for my PhD proposal defense for my project, "Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data." Dept of Biomedical Informatics, University of Pittsburgh.
On Dec. 20th 2016, the HRB published their "Health Research In Action" booklet that detailed a small selection of recent success stories from their research funding portfolio which "...really show health research in action".
The corneal-limbal stem cell research work carried out at NICB (by Finbarr O’Sullivan and Prof. Martin Clynes) and which led to the first corneal-limbal stem cell transplant in Ireland (carried out by Mr. William Power of the RVEEH) on June 7th, 2016 got an honorable mention (Page 17)
1. The document discusses three areas of change in scholarly communication: public access to papers, treating papers as data, and dataset archiving. Attendees of iEvoBio are well-positioned to understand and guide these changes.
2. Preliminary results from a study on researcher attitudes towards data archiving show that some researchers are worried about others using their data without proper recognition or collecting their own data.
3. The key messages are that the world of scholarly communication is changing, attendees can help shape the future by raising expectations, voices, and glasses to change the status quo.
Public Sharing of Research Datasets: A Pilot Study of Associations Heather Piwowar
Presented at ASIST & ISSI Pre-Conference
Symposium on Informetrics and Scientometrics on Nov 7, 2009
http://www.sois.uwm.edu/MetricsPreCon/program.html
1. Gigascience Journal is a new open access journal and database focused on publishing and hosting large-scale genomic and other "big data" sets to promote sharing, reproducibility, and reuse.
2. The journal aims to address incentives for data sharing by providing data producers credit through DOIs for datasets and enabling attribution and impact tracking when data is cited.
3. As an example, genomic data from the 2011 E. coli outbreak in Germany was rapidly shared on the journal's website under an open license and assigned a DOI to allow analysis and citation by researchers worldwide working to understand the epidemic.
This document discusses the need for open science due to a reproducibility crisis in many scientific disciplines. It notes that many published findings cannot be replicated and estimates that at least two-thirds of published results in psychology and biomedicine may be incorrect. This represents a credibility crisis that undermines public trust in science. The document argues that adopting practices of open science such as preregistration, open data, and detailed documentation can help address this crisis by reducing biases, enabling replication, and increasing transparency and reproducibility. Open science is presented as a means of improving research quality and accelerating discovery for the benefit of both science and society.
This document summarizes a presentation by Nicole Nogoy from GigaScience about their journal, data platform, and database for large-scale data. GigaScience aims to enable more open access, collaboration and data sharing across disciplines by deconstructing research papers and providing credit for data, software and other digital outputs. It utilizes a big data infrastructure to integrate open access publishing with data and software publishing platforms. Examples are provided of data sets and analyses that have been published through GigaScience to maximize reuse and reproducibility.
This document provides an overview and introduction to the concepts and challenges of e-research. It begins by examining competing terms used to describe the transformation in research due to widespread digital technologies and networks. Key terms discussed include e-science, cyberinfrastructure, and e-research. The document then outlines the conceptual framework of the book, which is divided into sections on conceptualization, development, collaboration, visualization, data preservation and reuse, access and intellectual property, and case studies. Each chapter is briefly introduced. The concluding section notes areas for further research around chronicling transformations in scholarship and contextualizing changes within disciplinary cultures.
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
This document discusses open access to research data and peer review of data publications. It notes that as a first step, data underpinning journal articles should be made concurrently available in accessible databases. The Royal Society report in 2012 advocated for all science literature and data to be online and interoperable. Key issues in linking data to the scientific record are data persistence, quality, attribution, and credit. The document provides examples from astronomy of data reuse leading to new publications and cites a study finding poor reproducibility of ecological data sets over time as data availability declines. It outlines different levels of research data from raw to processed to published and discusses initiatives for open data publication and peer review.
Reproducibility, argument and data in translational medicineTim Clark
Failures in reproducibility and robustness of scientific findings are explored from statistical, historical, and argumentation theory perspectives. The impact of false positives in the literature is connected to failures in T1 and T2 biomedical translation, and is shown to have a significant impact on the costs of therapeutic development and availability of needed treatments to the public. Technological and social approaches to resolve these issues are presented. "Reproducibility" initiatives are critiqued as unsustainable and non-authoritative; improved requirements and methods for scientific communication of findings including data, methods and material are supported as the best approaches for improved reproducibility.
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds
The document discusses the benefits of open science data and argues that open data is important for addressing issues like climate change, disease outbreaks, and environmental problems. It provides an example where open genomic data from an E. coli outbreak in Germany was released under an open license and analyzed by researchers around the world, leading to important findings that helped control the outbreak. The document advocates for more open access and open data policies in Hong Kong to maximize the benefits of research and address issues like a lack of transparency in China.
Scientists used over 1 million home computers linked together to test potential drug compounds for treating anthrax. They were able to test 3.5 billion molecules in just 24 days, far more than any pharmaceutical company could test alone. The researchers identified 12,000 potential drug candidates, including antitoxins that could counter the lethal effects of the anthrax bacteria. This worldwide distributed computing project showed that harnessing vast computing resources in this way could lead to medical breakthroughs much faster than traditional methods.
This presentation was provided by Alberto Pepe of Authorea, during the NISO hot topic event "Preprints." The virtual conference was held on April 21, 2021.
The document describes the development of the Open Drug Discovery Teams (ODDT) mobile app, which aims to facilitate collaboration in drug discovery. The app aggregates open science data from sources like Twitter on topics related to rare and neglected diseases. It provides a magazine-style interface for browsing recent posts. The app and its backend were developed iteratively, with input from researchers during testing. The app harvests tweets with specific hashtags and allows users to endorse or reject posts. It can visualize chemical structures and tables linked from tweets. The goal is to connect researchers and data to help accelerate open drug discovery.
BGI training lecture: Scott Edmunds - Science 2.0, why new developments on th...Scott Edmunds
The document discusses how new developments on the web can help scientists share information more openly and collaboratively. It describes how tools like blogs, wikis, and social networks allow researchers to openly discuss data, findings, and ideas. As an example, it highlights how Chinese scientists rapidly shared genomic data on the 2011 E. coli outbreak online, enabling global crowdsourcing efforts that helped analyze and understand the outbreak more quickly. The document advocates for open science through open access, open source, and open data practices to accelerate discovery and make science fairer and more impactful.
This presentation was provided by Leslie McIntosh of Ripeta, during the NISO hot topic event "Preprints." The virtual conference was held on April 21, 2021.
Thesis Proposal, as presented for dissertation proposal defenseHeather Piwowar
The slides I presented for my PhD proposal defense for my project, "Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data." Dept of Biomedical Informatics, University of Pittsburgh.
On Dec. 20th 2016, the HRB published their "Health Research In Action" booklet that detailed a small selection of recent success stories from their research funding portfolio which "...really show health research in action".
The corneal-limbal stem cell research work carried out at NICB (by Finbarr O’Sullivan and Prof. Martin Clynes) and which led to the first corneal-limbal stem cell transplant in Ireland (carried out by Mr. William Power of the RVEEH) on June 7th, 2016 got an honorable mention (Page 17)
1. The document discusses three areas of change in scholarly communication: public access to papers, treating papers as data, and dataset archiving. Attendees of iEvoBio are well-positioned to understand and guide these changes.
2. Preliminary results from a study on researcher attitudes towards data archiving show that some researchers are worried about others using their data without proper recognition or collecting their own data.
3. The key messages are that the world of scholarly communication is changing, attendees can help shape the future by raising expectations, voices, and glasses to change the status quo.
Public Sharing of Research Datasets: A Pilot Study of Associations Heather Piwowar
Presented at ASIST & ISSI Pre-Conference
Symposium on Informetrics and Scientometrics on Nov 7, 2009
http://www.sois.uwm.edu/MetricsPreCon/program.html
This document discusses analyzing data about research data and datasets to better understand their impact. It notes that impact goes beyond just citations and includes many types of engagement like views, saves, discussions, recommendations by different groups. More metrics from different sources need to be exposed about datasets to analyze diverse impacts. The data and metrics also need to be more open through text mining and aggregators. This will help drive more awareness of different types of research products and changes in how they are valued.
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
This document discusses the need to better understand the impact of datasets beyond just citations. It notes that datasets can be engaged with in many ways, such as through views, saves, discussions, and recommendations, by various groups like researchers, teachers, students, and policymakers. It calls for exposing more metrics of engagement, supporting more tools for interacting with datasets at all stages, and making metrics and data more openly available to help reveal how datasets are being used.
Text Mining Rights from Three Perspectives: Researcher.Heather Piwowar
Presentation by Heather Piwowar at the Charleston Conference 2012 as part of the "Text Mining Rights from Three Perspectives" session with Teresa Lee and Judson Dunham
http://2012charlestonconference.sched.org/event/fefb0c29aa6bbf91521e35efc2dd151c
See Jud's slides at http://www.slideshare.net/judsondunham/three-perspectives-on-text-mining-publisher
Libraries empowering scholars (and scholarly communication) through #altmetricsHeather Piwowar
This document discusses how libraries can empower scholars and scholarly communication through altmetrics. It notes that traditional research evaluation focuses too much on impact factor and that altmetrics provide additional ways to measure impact, including social media mentions, citations in policy documents or Wikipedia. The document recommends that libraries can help by raising expectations of diverse metrics, advocating for their use in evaluation, and supporting altmetrics tools. This would help move evaluations away from a single-dimensional system and capture different types of research impact.
submission summary for #WSSSPE Policy session on Credit, Citation, and ImpactHeather Piwowar
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
presentation by Heather Piwowar
November 2013
agenda: http://wssspe.researchcomputing.org.uk/
Heather Piwowar and Jason Priem presented on Depsy, a software tool for measuring the impact of research software using metrics like downloads, citations, and authorship. Depsy launched in November 2016 and gathered feedback, with the biggest request being to pull in new data from GitHub. Future plans for Depsy include adding more data sources, improving text mining, and integrating more tightly with libraries.io and Impactstory.
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
Scott Edmunds presentation on: Reproducible method and benchmarking publishing for the data (and evidence) driven era. The Silk Road Forensics Conference, Yantai, 18th September 2018
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...GigaScience, BGI Hong Kong
Slides from GigaScience press-conference at BGI's Bio-IT APAC meeting on the GigaScience website launch and release of first unpublished animal genomes released from database. Genomes include polar bear, penguin, pigeon and macaque. 6th July 2011
Participant-centered research design and “equal access” data sharing practice...Jason Bobe
Topics include:
What is "equal access" to data?
How have the roles of human subjects expanded over time?
Where has equal access to data been a success?
What are the barriers to equal access in research?
CCI32 - Citizen Participation in the Biological Sciences: A Literature Review...Todd Suomela
This document summarizes a literature review on citizen science projects in biological sciences. It finds that citizen science has grown significantly in the last decade, especially in environmental biology fields where large data collection is needed. Most projects aim to involve the public in research to increase science understanding while collecting reliable data. However, the quality of citizen-collected data is a primary concern that many papers addressed. Further research is needed to understand differences in citizen participation across biological subfields.
- The document discusses how biomedical research is entering a period of disruption due to factors like big data, digitization, and open science.
- Key points discussed include the history and changing nature of computational biomedicine, implications of large initiatives like the Precision Medicine Initiative, and how funders should respond by encouraging global open science and sharing infrastructure and policies.
- The author advocates for creating a "commons" environment to enable finding and reusing shared digital research objects according to FAIR principles in order to advance open collaborative science.
Stat 1040, Recitation packet 11. A 1999 study claimed that.docxdessiechisomjj4
Stat 1040, Recitation packet 1
1. A 1999 study claimed that
Infants who sleep at night in a bedroom with a light on may be at higher risk for myopia (nearsight-
edness) later in childhood.
The researchers surveyed parents of 479 children aged 2 to 16 seen in the ophthalmology outpatient
department of a children’s hospital. A questionnaire asked about the child’s nighttime light exposure
at the time of the survey and before age two. They noticed a positive association between myopia
and nighttime light exposure.
(a) Explain how you know that this is an observational study.
(b) Explain why this is not strong evidence that sleeping with a light on causes myopia by suggesting
a possible confounding factor and explaining clearly how this confounding factor could account
for the association they observed.
2. The following paragraph appears on the website www.alternative-medicine-and-health.com
Elmer Cranton, M.D., in his book, “Bypassing Bypass”, indicates that a ten year, 24
million dollar study conducted by the National Heart, Lung and Blood Institute, which
screened 16,000 patients who underwent coronary artery bypass at eleven leading medical
centers, revealed no increase in post-surgical survival rates as compared with a matched
group of non-surgically treated patients.
You may assume that the “matched group” was selected to resemble the original 16,000 with respect
to age, sex and type of heart disease.
(a) Based on what you read in the paragraph, was the study randomized? Explain clearly.
(b) Was the study blind? Explain clearly.
(c) Explain the major problem with a study such as this one, and why it would probably not give
very reliable results.
3. A recent study in Europe looked at a large group of women of childbearing age. The researchers asked
each woman how much alcohol they had consumed over the past 12 months. The researchers found
that women who drank moderate amounts of alcohol were somewhat less likely to have infertility than
women who did not (November, 2001). The study said it “controlled for age, income and religion”.
(a) Based on the information above, was this a controlled experiment or an observational study?
(b) Why did they “control for” age, income and religion?
(c) Is this convincing evidence that infertility would decrease if women with infertility started to
drink moderate amounts of alcohol? (Note: we are only asking about infertility. There may be
other problems introduced by such behavior, but ignore these for answering this question).
(d) Suggest a possible confounding factor (other than age, income, or religion) and clearly explain
why you think it might be a confounding factor.
4. A randomized, controlled, double-blind study published in March, 2008 shows the well-known “placebo
effect” works even better if the placebo costs more. In the study, volunteers were given an electric
shock and took a pill. Volunteers in the treatment group were told it was an expensive painkiller,
while those in the c.
Palestra apresentada à CONFOA 2013 (Universidade de São Paulo, São Paulo, Brasil, de 06 a 08 de outubro de 2013) na Mesa III - A ciência aberta e a gestão de dados de pesquisa - pelo Prof. Dr. Peter Elias – REINO UNIDO - The Royal Society of UK.
Museum collections as research data - October 2019Dag Endresen
This document discusses how natural history museums can embrace open science principles by making their collections openly available as research data. It provides context on initiatives like GBIF and DiSSCo that aim to publish biodiversity data according to common standards. While only around 5-10% of specimen records are currently digitized globally, the push for open access to publicly funded research means that museums need to develop new approaches to remain relevant providers of scientific resources. Open science practices like data sharing, citation and reuse can help address reproducibility issues and enable new discovery.
CINECA webinar slides: Ethics/ELSI considerations - From FAIR to fair data sh...CINECAProject
The FAIR principles – standing for Findability, Accessibility, Interoperability, and Reusability – have become the guiding principles for the wider sharing of research data in the life sciences. While FAIR provides guidance for the management of data as well as tools and workflows, the institutional conditions and organizational challenges associated with data sharing need to be taken into account to ensure responsible and fair data practices. This requires considering the context of legal requirements, for instance the principle of fairness and transparency in GDPR, expectations of research participants/data subjects, societal aspects and the “ethics work” that is an integral part of data flows, as well as fairness, equity and benefit sharing within transnational collaborations, which is of utmost importance. This webinar will, from the perspective of ethical, legal and societal implications (ELSI), discuss this broader context of responsible and fair data sharing associated with FAIR.
The “How FAIR are you” webinar series and hackathon aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing.
This webinar took place on 15th April 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
Doing more with less resources used to be a situation common just for academic scientists. This is unfortunately still true for academics but we are seeing others facing many of the same challenges. With the squeeze on budgets and cost cutting resulting from recent worldwide economic challenges, the failure of many drugs to make it through the pipeline to the market, and the increasing costs associated with the drug development process, we are now seeing in the pharmaceutical industry a dramatic shift, perhaps belatedly, to have to accommodate similar challenges of doing more with less
Opening up to Diversity talk by @phylogenomics at #UCDPHSAJonathan Eisen
This document summarizes the key points of an article on the diversity and composition of bacteria in indoor environments. It finds that the bacterial communities found indoors are less diverse than outdoors, and that mechanically ventilated rooms contain less diverse communities than window ventilated rooms. Certain building attributes like ventilation source, airflow rates, humidity and temperature are correlated with the diversity and types of bacteria present. Rooms with lower airflow and humidity have higher abundances of potential human pathogens. The study suggests that building design and operation can manage the indoor microbiome and species that may colonize the human microbiome.
Published on Jul 10, 2015 by PMR
Scholarly Publishing wastes huge amounts of valuable science. This presentation to the Public Library of Science suggests how we can work together to put this right
Scholarly Publishing wastes huge amounts of valuable science. This presentation to the Public Library of Science suggests how we can work together to put this right
Similar to 2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sharing (20)
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref
In November 2020, Crossref formally adopted the “Principles of Open Scholarly Infrastructure” (POSI). POSI is a list of sixteen commitments that will now guide the board, staff, and Crossref’s development as an organisation into the future.
This webinar took place on the 29th October at 03:00 PM AEST (UTC+10) and covered:
- What are the Principles of Open Scholarly Infrastructure (POSI) and why are they needed?
- Why POSI is important for Crossref and how it will help realise the Research Nexus
- Open metadata and infrastructure services from Crossref
Presented in English by Cameron Neylon, Professor of Research Communications, Centre for Culture and Technology, at Curtin University, Amanda Bartell, Head of Member Experience at Crossref, and Vanessa Fairhurst, Community Engagement Manager at Crossref.
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021 Crossref
Crossref使研究成果易于查找、引用、链接、评估以及重复利用。我们是一个非营利性会员组织,其存在是为了使学术交流变得更好。
施普林格·自然旗下Atlantis Press图书部门编辑总监、Crossref大使党冉女士将与万方数据的郭晓峰女士携手介绍Crossref的概况,主要包括:
Crossref的简要历史
我们的会员
数字对象唯一标识符(DOI)和元数据的重要性
加入Crossref的好处
如何加入并开始工作
本次网络研讨会与新会员、出版商、研究人员、图书馆员、编辑以及任何想了解如何与Crossref合作的人有关。
本次网络研讨会将以中文进行,包括提问时间共持续60分钟。
----
Presented on the 14th October 2021, Ran Dang, Editorial Director of Atlantis Press Books, Springer Nature and Crossref Ambassador, together with Guo Xiaofeng of WanFang Data, provide an overview of Crossref including:
A brief history of Crossref
Our membership
Persistent identifiers (DOI) and the importance of metadata
The benefits of joining Crossref
How to join and get started
This webinar is relevant for new members, publishers, researchers, librarians, editors, and anyone who would like to know more about how to work with Crossref.
The webinar is presented in Chinese and lasts 60 minutes including time for questions.
En este webinario veremos una descripción general de nuestro servicio Crossmark, que incluye:
Qué es Crossmark
Cómo usar el servicio
La importancia de mantener el contenido actualizado
Cómo encontrar más ayuda y soporte
Working with ROR as a Crossref member: what you need to knowCrossref
Webinar focusing on the importance of ROR and how to implement that as a Crossref member.
Covers:
What is ROR?
Why is Crossref supporting ROR?
Publisher use cases for ROR (from Hindawi)
How to become a ROR adopter
Discussion/Q&A
A recording of the presentation is available on the Crossref YouTube channel: https://www.youtube.com/watch?v=D9Mtqb64OEk
Преимущества и варианты использования метаданных в Crossref / The Value and ...Crossref
Онлайн-трансляция организована при поддержке НЭИКОН в рамках Специального мероприятия “Научная информация и научные ресурсы в условиях локдауна 2020-2021”.
Во время трансляции будут обсуждаться следующие вопросы:
регистрация контента в Crossref;
важность метаданных для Crossref: качество и количество;
как улучшить метаданные?
где получить помощь и поддержку.
------
The webinar was held on September 17, 2021 at 10.00 (Moscow time UTC+3).
This online event was organized in collaboration with NEICON and takes place within the framework of the wider conference “Scientific information and scientific resources in the conditions of the lockdown 2020-2021”.
During the webinar we cover:
- Content Registration at Crossref
- The importance of Crossref metadata: Quality and Quantity
- How to improve your metadata
- Where to find further help and support
The webinar lasts approximately 60 minutes including time for questions. Presented in Russian.
Seminario web ‘Similarity Check’, en españolCrossref
Similarity Check es una herramienta de Turnitin que ayuda a los editores a detectar plagio mediante la comparación de documentos con una gran base de datos de más de 70 mil millones de páginas web y 135 millones de artículos. Los editores pueden cargar documentos en iThenticate para obtener un informe de similitud que analiza las coincidencias y les ayuda a determinar si existe plagio o no. Cualquier editor puede participar pagando una tarifa administrativa y tarifas de verificación de documentos.
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...Crossref
Indonesia One Search merupakan portal satu pintu untuk mencari koleksi publik yang dimiliki perpustakaan dan lembaga penyedia lainnya di Indonesia. Saat ini telah terkumpul lebih dari 10 juta judul koleksi seperti buku, tesis, jurnal, video, gambar, dan dokumen teks lengkap dari berbagai perpustakaan yang tergabung. Indonesia One Search terus dikembangkan untuk menambah fitur seperti ekstraksi teks penuh, analisis konten, de
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...Crossref
Dr. Lukman provides an overview of journal publishing in Indonesia. Presented in Bahasa Indonesian.
This webinar was presented as part of the Crossref LIVE Indonesia webinar series from the 13th - 15th July 2021.
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...Crossref
This webinar was presented in English by Crossref staff Vanessa Fairhurst and Ginny Hendricks on the 15th July 2021 as part of a series of Crossref LIVE Indonesia webinars.
This webinar covers:
- A quick re-cap of content registration
- What metadata you can send to Crossref
- How your metadata is used in Crossref tools and services and in the wider academic community
- How you can use our Participation Reports tool to assess and improve your metadata records at Crossref
The content is relevant for Crossref members, particularly new members, and anyone who would like to know more about how to work with Crossref and how we fit into the wider scholarly community.
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...Crossref
This webinar was presented in English by Crossref staff Vanessa Fairhurst and Amanda Bartell on the 14th July 2021 as part of a series of Crossref LIVE Indonesia webinars.
This webinar covers:
- What is a DOI
- What do we mean by metadata
- Different content types you can register at Crossref
- Different ways for you to register your content at Crossref (including a demo of the web deposit form and OJS Crossref plug-in)
- How to make corrections or additions to your metadata
- What happens if content moves to a different publisher
The content is relevant for Crossref members, particularly new members, and anyone who would like to know more about how to work with Crossref and how we fit into the wider scholarly community.
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021Crossref
This webinar was presented by Crossref staff Vanessa Fairhurst and Rachael Lammey on the 13th July 2021 as part of a series of Crossref LIVE Indonesia webinars.
This webinar covers:
- A brief history of Crossref
- Who are our members
- How to join Crossref
- Persistent identifiers (DOI) and related metadata
- What are the benefits of joining Crossref?
- Why publishers (and other organizations) around the world join Crossref
The content is relevant for Crossref members, particularly new members, and anyone who would like to know more about how to work with Crossref and how we fit into the wider scholarly community.
Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...Crossref
Content Registration at Crossref. Webinar held on Tuesday, June 8th at 14:00 Turkey (UTC+3).
Presented by Crossref Turkish Ambassador Haydar Oruç, the webinar included an overview of how to register content at Crossref and the importance and use of scholarly metadata.
Agenda:
- Content registration tools
- Importance of accurate, comprehensive and up-to-date metadata
- How to update and fix metadata records
- Ways to get further help and support
Webinar held on 8 June 2021
------
"Çapraz Referans İçerik Kaydı". Webinar 8 Haziran Salı günü Türkiye saati ile 14:00'te (UTC+3) gerçekleştirildi.
Crossref Türkiye Büyükelçisi Haydar Oruç'un yapacağı sunumda; Crossref'e içeriğin nasıl kaydedileceği, bilimsel metadatanın önemi ve kullanımı hakkında genel bir bakış sunulacak ve konular aşağıdaki gibi olacaktır:
- İçerik kaydetme araçları
- Doğru, kapsamlı ve güncel üst verinin önemi
- Meta veri kayıtları nasıl güncellenir ve düzeltilir
- Daha fazla yardım ve destek almanın yolları
Web semineri, Crossref ile nasıl çalışılacağını öğrenmek ve Crossref içeriğini daha geniş akademik toplulukla, özellikle Crossref üyeleriyle (özellikle yeni üyeler) alakalı hale getirmek isteyen herkese açıktır.
Los Metadatos Para la Comunidad de InvestigacionCrossref
Los miembros del equipo de la comunidad Crossref presentarán un taller para discutir:
• Introducción a Crossref
• DOI y registro de contenido
• Los metadatos para la comunidad de investigación
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...Crossref
This webinar was held on Wednesday 17 March 2021 at 14.00 UAE (UTC+4).
Mohamad Mostafa, Publishing Editor at Knowledge E and Crossref Ambassador, provided an overview of how to register content with Crossref including:
- Tools for registering content
- The importance of accurate, comprehensive and up-to-date metadata
- How to make updates and corrections to metadata records
- The importance of conflict and resolution reports
- Ways to get further help and support
This webinar content is relevant for Crossref members, publishing service providers, researchers, librarians, editors, and anyone who would like to know more about how to work with Crossref.
سيقوم محمد مصطفي، محرر النشر لدي نوليدچ إي وسفير كروس رف بتقديم نظرة شاملة حول كيفية تسجيل المحتوى لدى كروس رف تتضمن النقاط التالية:
- أدوات تسجيل المحتوى
- أهمية ان تكون البيانات الوصفية دقيقة، شاملة وحديثة
- كيفية إجراء تحديت وتصحيح للبيانات الوصفية المسجلة سابقًا
- أهمية تقارير المشاركة والتضارب
- طرق طلب المزيد من المساعدة والدعم
المحتوي مناسب لأعضاء كروس رف، ومقدمو خدمات النشر، والباحثين، وأمناء المكتبات والمحررين وكل من لديه الرغبة في معرفة المزيد حول كيفية العمل مع كروس رف.
Presented by Vanessa Fairhurst, Paul Davis and Rachael Lammey on March 3rd 2021.
The webinar covers how to create and correctly display a DOI, the importance of metadata and the various tools for content registration including the web deposit form, Metadata Manager and OJS plug-ins.
This document provides an overview of CrossMark, a CrossRef initiative to help readers determine if a scholarly work has been updated. CrossMark uses a logo to identify publisher-maintained versions of content. Clicking the logo tells the reader if there have been any updates and directs them to the publisher's version. It can also display additional publication record information like funding sources, conflicts of interest, or peer review details. The CrossMark pilot launched in summer 2011 and is being implemented more widely, with marketing support and training webinars for publishers.
Participation reports webinar December 2020Crossref
During this webinar we’ll take you on a tour of our Participation Reports, which give Crossref members and the wider scholarly community a clear, visual snapshot of the metadata that each one of our members is registering with Crossref.
Registering richer metadata makes your content more useful and more discoverable to researchers and the wider scholarly community. This webinar was held on 8th December 2020.
Participation reports webinar November 2020Crossref
During this webinar we’ll take you on a tour of our Participation Reports, which give Crossref members and the wider scholarly community a clear, visual snapshot of the metadata that each one of our members is registering with Crossref.
Registering richer metadata makes your content more useful and more discoverable to researchers and the wider scholarly community. This webinar was held on 18 November 2020,
Introduction to Crossmark/Crossmark: O que é e como usarCrossref
"Crossmark: O que é e como usar" – O webinário será apresentado em português (do Brasil) - 14 de Outubro de 2020.
O Crossmark oferece às revistas uma forma padronizada de comunicar importantes atualizações no conteúdo e garantir que as informações do artigo divulgado são atuais e seguras.
A apresentação mostrará o que é necessário para implementar o Crossmark, requisitos técnicos e, claro, oportunidade para sanar dúvidas.
O conteúdo é interessante para quem é membro da Crossref, empresas de serviços editoriais, pesquisadores, bibliotecários, agências de fomento e membros de comitês editoriais de periódicos científicos.
A apresentação ficará por conta dos embaixadores da Crossref no Brasil, Bruna Erlandsson e Edilson Damasio.
****
"Crossmark: What it is and how to use it" - The webinar will be presented in Brazilian Portuguese - October 14, 2020.
Crossmark provides publishers with a standardized way of communicating important updates to content and ensuring that the information in the published article is current and secure.
The presentation will show what is needed to implement the Crossmark, technical requirements and the opportunity to answer questions.
The content is interesting for those who are members of Crossref, publishing services companies, researchers, librarians, funding agencies and members of editorial committees of scientific journals.
The presentation will provided by Crossref ambassadors in Brazil, Bruna Erlandsson and Edilson Damasio.
Webinar held 6 October 2020.
The webinar is relevant for new and existing Crossref members, publishers, editors, researchers, service
providers, hosting platforms, funders, librarians; really anyone interested in finding out a bit more about what
Crossref is and does.
This webinar covers:
• How to register content with Crossref
• How to make updates to your metadata in order to make changes, corrections, or to add more detail
• Participation reports
• Additional services and where to find help.
Sessions presented in English by Crossref staff.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sharing
1. Ways and Needs to Promote
Rapid Data Sharing
Laurie Goodman, PhD
Editor-in-Chief GigaScience
ORCID ID: 0000-0001-9724-5976
2. Scientific Communication
Via Publication
• Scholarly articles are merely advertisement of scholarship .
The actual scholarly artefacts, i.e. the data and
computational methods, which support the
scholarship, remain largely inaccessible --- Jon B.
Buckheit and David L. Donoho, WaveLab and reproducible
research, 1995
• Core scientific statements or assertions are intertwined and
hidden in the conventional scholarly narratives
• Lack of transparency, lack of credit for anything other than
“regular” dead tree publication
3. A Tale of Two Bacteria
1. On May 2, 2011 German Doctors Reported the first case of an
E.coli infection, that was accompanied by hemolytic-uremic
syndrome
2. On May 21, 2011 the first death occurred from this bacteria
(denoted E.coli O104:H4)
3. On June 3, 2014, BGI completed a draft sequence of E.coli
O104:H4 from a sample provided by doctors at the University
Medical Centre Hamburg-Eppendorf
4. At this point- the leaders at BGI held a discussion about
whether to release the sequence data immediately: what were
the potential repercussions of doing so
The question arose:
If the data were released now- would it affect
their ability to publish later?
4. A Tale of Two Bacteria
• In one world- the researchers — who were concerned about their
ability to publish as this is the way to obtain recognition and
obtain grants (which are essential for them to work) — waited.
The first publication appeared on July 29th
• In another world, the researchers — who decided public health
was more important than obtaining a publication — released the
data immediately.
The first publication appeared on July 29th — but was not
from that group who released the data (though information on
that data was included.
5. Whether the concern about the ability to publish
if data are released early is real or imagined
Researchers act on that concern
6. Whether the concern about the ability to publish
if data are released early is real or imagined
Researchers act on that concern
7. These data were put on an FTP
server under a CCO waiver and also
given a DOI to make access
‘permanent’
To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang,
J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J;
Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X;
Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the
Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium
(2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI
Shenzhen. doi:10.5524/100001
http://dx.doi.org/10.5524/100001
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
8.
9.
10. Downstream consequences:
1. Citations (~180) 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons
4. Example for faster & more open science
“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli
strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days
for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could
use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that
allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and
publish their work without wasting time on legal wrangling.”
11. 1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastro-intestinal
infection in Hamburg in Germany in May 2011. This
spread through several European countries and the US,
affecting about 4000 people and resulting in over 50 deaths. All
tested positive for an unusual and little-known Shiga-toxin–
producing E. coli bacterium. The strain was initially analysed by
scientists at BGI-Shenzhen in China, working together with
those in Hamburg, and three days later a draft genome was
released under an open data licence. This generated interest
from bioinformaticians on four continents. 24 hours after the
release of the genome it had been assembled. Within a week
two dozen reports had been filed on an open-source site
dedicated to the analysis of the strain. These analyses
provided crucial information about the strain’s virulence and
resistance genes – how it spreads and which antibiotics are
effective against it. They produced results in time to help
contain the outbreak. By July 2011, scientists published papers
based on this work. By opening up their early sequencing
results to international collaboration, researchers in Hamburg
produced results that were quickly tested by a wide range of
experts, used to produce new knowledge and ultimately to
control a public health emergency.
12. All that aside
Can we all agree that releasing the E.coli data
ahead of publication was ‘good’
At least from a public health perspective
Here are the numbers for the E.coli 2011 Outbreak
In total, ~4000 people were infected and 53 died
13. From a Public Health perspective…Deaths
Worldwide*
Infectious Disease
Measles: 122,000 per year
Hepatitis C-related liver disease: 350,000-500,000 per year
Malaria: 627,000 per year
HIV/AIDS: 1.4-1.7 million per year
Non-communicable, with genetic predisposition
Prostate cancer: 307,000 per year
Breast cancer: 522,000 per year
Suicide: 800,000 per year
Diabetes: 1.5 million per year
Cancer: 8.2 million per year
Cardiovascular Disease: 17.5 million per year
Non-genetic/Non-infectious
Pesticide Poisoning: 250,000 per year
Malnutrition: 2.8 million children (under 5) per year
*World Health Organization Fact Sheets http://www.who.int/en/
15. Sharing aids fields…
Rice v Wheat: consequences of publically available genome data
700
600
500
400
300
200
100
0
rice wheat
Every 10 datasets collected contributes to at least 4 papers in the
following 3-years.
Piwowar, HA, Vision, TJ, & Whitlock, MC (2011). Data archiving is a good investment Nature, 473
(7347), 285-285 DOI: 10.1038/473285a
16. Sharing aids authors…
Sharing Detailed Research
Data Is Associated with
Increased Citation Rate.
Piwowar HA, Day RS, Fridsma DB (2007)
PLoS ONE 2(3): e308.
doi:10.1371/journal.pone.0000308
17. Lack of Sharing Impacts Reproducibility
Out of 18 microarray papers, results
from 10 could not be reproduced
1. Ioannidis et al., (2009). Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8)
18. Sharing can reduce retractions
>15X increase in last decade
Strong correlation of “retraction index” with
higher impact factor
At current % increase by 2045 as
many papers published as
retracted!
1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?
19. Data Sharing Hurdles
?
If only it were easy…
There are numerous reasons why researchers
do not share data:
The majority of which are good reasons
20. Wiley Researcher Data Insights Survey
Our objective was to establish a baseline view of data sharing
practices, attitudes, and motivations globally, with participation
from researchers in every scholarly field.
In March 2014, more than 90,000 researchers around the world
were invited to participate in Wiley’s Researcher Data Insights
Survey. Participants were researchers who had published at least
one journal article in the past year with any publisher.
We received an overwhelming 2,886 responses from around the
world.
Slide from Catherine Giffi, Director, Strategic Market Analysis, Global Research, Wiley
21. Wiley Researcher Data Insights Survey
Key Findings
• Most researchers are sharing their data.
• Those not sharing have a variety of reasons.
• Data that’s being shared typically is <10 GB.
• The most common type of data that is being
shared is flat, tabular data (.csv, .txt, .xl)
• Data is usually saved on hard drives.
Slide from Catherine Giffi, Director, Strategic Market Analysis, Global Research, Wiley
22. Wiley Researcher Data Insights Survey
Why Researchers Do Not Share
• Intellectual property or confidentiality issues (59%)
• Concerned research might be “scooped” (39%)
• Concerns about misinterpretation or misuse (32%)
• Concerns about attribution/citation credit (31%)
• Ethical concerns (24%)
• Insufficient time/resources (19%)
• Funder/institution does not require sharing (13%)
• Lack of funding (13%)
• Not sure where to share (5%)
• Not sure how to share (3%)
Slide from Catherine Giffi, Director, Strategic Market Analysis, Global Research, Wiley
See also:
http://exchanges.wiley.com/blog/2014/11/03/how-and-why-researchers-share-data-and-why-they-dont/
http://scholarlykitchen.sspnet.org/2014/11/11/to-share-or-not-to-share-that-is-the-research-data-question/
23. How Can Publishers Promote Data Sharing
Researchers are never so captive as when they publishing
But we need to help — not just harass.
Carrots and Sticks
And- why us?
– Create Journal Data Release Policies
– Check Data Release Policy is followed
– Find Ways to Aid Researchers in Releasing Data
– Consider ways to support/protect researchers
who do share ahead of publications
– Promote Data Citation
24. How Can Publishers Promote Data Sharing
Researchers are never so captive as when they publishing
But we need to help — not just harass.
Carrots and Sticks
And- why us?
– Create Journal Data Release Policies
– Check Data Release Policy is followed
– Find Ways to Aid Researchers in Releasing Data
– Consider ways to support/protect researchers
who do share ahead of publications
– Promote Data Citation
25. Incentives/credit
Credit where credit is overdue:
“One option would be to provide researchers who release data to
public repositories with a means of accreditation.”
“An ability to search the literature for all online papers that used a
particular data set would enable appropriate attribution for those
who share. “
Nature Biotechnology 27, 579 (2009)
Prepublication data sharing
(Toronto International Data Release Workshop)
“Data producers benefit from creating ?
a citable reference, as it can
later be used to reflect impact of the data sets.”
Nature 461, 168-170 (2009)
26. Genomics Data Sharing Policies…
Bermuda Accords 1996/1997/1998:
1. Automatic release of sequence assemblies within 24 hours.
2. Immediate publication of finished annotated sequences.
3. Aim to make the entire sequence freely available in the public domain for
both research and development in order to maximise benefits to society.
Fort Lauderdale Agreement, 2003:
1. Sequence traces from whole genome shotgun projects are to be
deposited in a trace archive within one week of production.
2. Whole genome assemblies are to be deposited in a public nucleotide
sequence database as soon as possible after the assembled sequence
has met a set of quality evaluation criteria.
Toronto International data release workshop, 2009:
The goal was to reaffirm and refine, where needed, the policies related to
the early release of genomic data, and to extend, if possible, similar data
release policies to other types of large biological datasets – whether from
proteomics, biobanking or metabolite research.
27. Sharing Data from Large-scale Biological Research Projects: A System of
Tripartite Responsibility (From the Fort Lauderdale Meeting 2003)
http://www.genome.gov/pages/research/wellcomereport0303.pdf
28. Citing Data Isn’t New
The Physical Sciences have been doing this for a while
DataCite and DOIs
“increase acceptance of research data as
legitimate, citable contributions to the
scholarly record”.
Aims to:
“data generated in the course of research
are just as valuable to the ongoing
academic discourse as papers and
monographs”.
29. How We Envision Research Publication
(Communicating Science)
Open-access journal Data Publishing Platform
Data Sets in
GigaDB
Analyses in
GigaGalaxy
Paper in
GigaScience
Data Analysis Platform
30. Other Journals are now doing similar
This is most commonly done in the form of a Data Paper
rather than a release of data that is citable in itself.
• A Data Paper is affectively a Description of the Data
• Other journals that do Data Publishing as a formal
paper type
• F1000 Research (launched in 2012)
• Has Data papers as one of several types of papers
• Scientific Data (launched in 2014)
• Solely publishes Data Descriptors
• There are more…
31. Making the Data Itself Citable
We provide a linked database
The data are then directly linked to the paper- but can also be cited
separately through a Data DOI
We can do this because we have a collaboration between BMC
(who handles the standard paper publication) and BGI (which has
enormous data storage capacity.)
However: There are many community available databases- so in
principle- any journal can do this by taking advantage of such
available resources.
These include the usual suspects: EBI, NCBI, DDBJ etc.
Databases that take all data types and provide Data DOIs: Dryad,
FigShare, etc.
There are also numerous smaller community databases specific to
different fields or data types.
32. For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
33. For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
35. Back to E.coli O104:H4
• As noted: articles on these early released and
citable data were published
• Also- the early releasers were not the first to
publish
• Nor was the data cited
37. The journal did
not approve of
inclusion of the
data citation.
Nor was any
indication of
where the
genome
information
could be found
38.
39. This report was the first to
be publisher- and it
included and used
information from the
crowd-source release as
well as the other early
release.
No where in the paper is
there any indication of
where to obtain this data
Nor is there an indication
of where to obtain the
sequence data they
generated
40. This group made
their 0104:H4
sequence available
at the time of
completion- prior
to publication in
the NCBI database.
Though no link to
the Accession
Number is easily
found in the paper.
41. This report DID include a reference for the data
(even though they did not use it in their analysis)
42.
43. For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
45. • Data submitted to NCBI databases:
- Raw data SRA:SRA046843
- Assemblies of 3 strains Genbank:AHAO00000000-AHAQ00000000
- SNPs dbSNP:1056306
- CNVs
- InDels }
dbVAR:nstd63
- SV
• Submission to public databases complemented by
its citable form in GigaDB (doi:10.5524/100012).
52. The polar bear DATA was released –prepublication- in 2011
They were used and cited in the following studies- before the main paper on
the sequencing was published
Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old
and distinct bear lineage. Science. 2012 Apr 20;336(6079):344-7.
doi:10.1126/science.1216424.
Cahill, JA et al., Genomic evidence for island population conversion resolves
conflicting theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345.
doi:10.1371/journal.pgen.1003345.
Morgan, CC et al., Heterogeneous models place the root of the placental
mammal phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56.
doi:10.1093/molbev/mst117.
Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears
(Ursus maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus)
Derived from Genome Sequences. J Hered. 2014; 105(3):312-23.
doi:10.1093/jhered/est133.
Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-
Biased Gene Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4.
doi:10.1093/molbev/msu109
56. Removing data citations from the
references
One journal informed the authors that non-reviewed material could
not be cited in the references of the paper
Another journal stripped the data citation from the references- and
went an extra step and changed the citation in the Data Availability
section to the URL where the DOI directed it to at that time
We happened to know about this one- and were able to create a forward to the
DOI’d page when the URL broke after we moved our database platform
Note: Much of this was due to a standard operating procedure in the
production department
Lesson: If you decide to include Data Citations- tell your entire team
57. For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
58.
59. For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
This is a work in progress…
60. Data Citation Really is a Major Incentive
On Weds this week- we released the genome sequence
from 3000 Rice strains (13.4 TB of data)
• These data were also deposited in NIH SRA repository
• So why did we do it too?
1. It is linked directly to the Data Paper that provides
details of data production, quality, and basic analysis
2. Authors were hesitant to release these data (a HUGE
community resource) prior to the analysis paper
publication (which, for 3000 strains… would take
years…). The opportunity to have these data citable
(and trackable) encouraged the authors and led to
their releasing these data and doing so in
collaboration with GigaScience’s Biocurator
The 3,000 Rice Genomes Project. (2014) GigaScience 3:7 http://dx.doi.org/10.1186/2047-217X-3-7;
The 3000 Rice Genomes Project (2014) GigaScience Database. http://dx.doi.org/10.5524/200001
61. No: your data is not too large to share
Rice 3K project: 3,000 rice genomes, 13.4TB public data
IRRI GALAXY
62. Beyond Data Citation
Reviewing Data
Data Release policies include the need to
help authors
Data availability without metadata is
practically useless
63. Beyond Data Citation
Reviewing Data
It’s too hard- we can’t ask our reviewers
to do that!
Use Data Reviewers
64. Example in Neuroscience
1. Neuroscience Data
are not typically
shared
2. For most papers: Data
AND Tools are not
typically made
available to the
reviewers
3. Journal Editors think
Reviewers will not
want to review data
GigaScience 2014, 3:3 doi:10.1186/2047-217X-3-3
65. Example in Neuroscience
• Neuroscience Data are not typically shared
• Author Dr. Stephen Eglen said: “One way of encouraging neuroscientists to
share their data is to provide some form of academic credit.”
• We hosted with a DOI: 366 recordings from 12 electrophysiology datasets
• GigaDB is included in Thompson Reuters Data Citation Index
• Data AND Tools are not typically made available to the reviewers
• We made manuscript, data and tools all available to the reviewers.
• We make sure to include reviewers who are able to properly assess the data
itself and rerun the tools
• To reduce burdens- we sometimes select a reviewer who ONLY looks at the
data.
• Journal Editors think Reviewers will not want to review data
• What Reviewer Dr. Thomas Wachtler said: “The paper by Eglen and
colleagues is a shining example of openness in that it enables replicating the
results almost as easily as by pressing a button.”
• What Reviewer Dr. Christophe Pouzat said: “In addition to making the
presented research trustworthy, the reproducible research paradigm
definitely makes the reviewers job more fun!”
66. Beyond Data Citation
Data Release policies include the need to
help authors
Collaborations
With data repositories
With other journals
67. Consider Cross Journal Support
Competition is good…
….but sometimes we should collaborate
for the community good
• PLoS recent data deposition policies have led to
community concerns about feasibility.
• We support (and applaud) this …we have an even stricter
data deposition policy
• But- PLoS ONE received a submission that was a
comparative study of earthworm morphology and
anatomy using a 3D non-invasive imaging technique
called micro-computed tomography (or microCT) …And
there is no good place to put this
• These data are extremely complex, videos, multiple files-with
several folders of ~10 GB
68. Consider Cross Journal Support
• GigaScience and PLOS ONE collaborated. They published
the main article; we published a Data Note describing the
data itself and hosted all the data on GigaDB under
separate citation.
• With our Aspera Connection- reviewers could download
even the 10 TB folders in ~1/2 hour
• Reviewer Dr. Sarah Faulwetter noted the usefulness of
having these data available, saying: Instead of having to
go through the lengthy process of obtaining the physical
specimen from a museum, I can now download a fairly
accurate representation from the web.
Lenihan et al (2014). GigaScience, 3:6 http://dx.doi.org/10.1186/2047-217X-3-6; Lenihan, et al (2014): GigaScience Database.
http://dx.doi.org/10.5524/100092; Fernández et al (2014) PLOS ONE 9 (5) e96617 http://dx.doi.org/10.1371/journal.pone.0096617
69. Beyond Data Citation
Data availability without metadata is
practically useless
Engage/Employ/Interact with Curators
70. Challenges for the future…
1. Lack of interoperability/sufficient metadata
2. Long tail of curation (“Democratization” of “big-data”)
?
71. Think about what you do… and what you can do…
• Promote- rather than inhibit- prepublication data sharing
• Promote Data Citation in the reference section
– incentivizes data release
– Makes it easier for readers to find
• Promote Data Sharing upon publication
– Consider your data release policies
• Form collaborations with repositories to aid authors in depositing
their work
– Identify community organizations with metadata standards
• Make data available for reviewers (author website, community
repositories, dryad and similar (your publisher?)
– at least do a sanity check
– Use “data reviewers”
No- this isn’t easy, but do what you can now
And work toward the rest
Evolve
72. It’s Time to Move Beyond
Dead Trees
1665 1812 1869
73. Thanks to:
Scott Edmunds, Executive Editor
Nicole Nogoy, Commissioning Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Rob Davidson, Data Scientist
Xiao (Jesse) Si Zhe, Database Developer
Amye Kenall, Journal Development Manager
Contact us:
editorial@gigasciencejournal.com
database@gigasciencejournal.com
Follow us:
@GigaScience
facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog
www.gigasciencejournal.com
www.gigadb.org
Editor's Notes
Thank you very much to the Meeting Organizers for Inviting me to Speak.
Happily we live in the 2nd world- but, that the fact that even gave them pause
Isn’t hyperbole fun?
And a paper by the group was published in a high impact journal even though the data were released early in a citable format
The data were released on an FTP server, and were given a data DOI should the data need to be cited in a more permanent fashion.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
(A) Cumulative base pairs in INSDC over time, excluding the Trace Archive (raw data from capillary sequencing platforms). (B) Base pairs in INSDC over time since 1980, broken down into selected data components. Cumulative data volume in base pairs broken down into assembled sequence (whole genome shotgun methods and others) and raw next-generation-sequence data.