Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Open Access Week - Oxford, 20-24 Oct 2014
1. Consultant,
Honorary Academic Editor
Associate Director,
Principal Investigator
!
Open access and open data at !
Nature Publishing Group: !
better data = better science!
!
Susanna-Assunta Sansone, PhD!
!
!
@biosharing!
@isatools!
@scientificdata!
!
Open Access Week at Oxford, 20-24 October, 2014
http://www.slideshare.net/SusannaSansone
3. A community mobilization
http://discovery.urlibraries.org/
image by Greg Emmerich
http://www.theguardian.com/higher-education-network/blog/2014/jun/26
https://okfn.org
4. Open access is not enough on its own
http://www.theguardian.com/higher-education-network/blog/2014/jun/26
If your research has been funded by
the taxpayer, there's a good chance
you'll be encouraged to publish your
results on an open access basis…..
This final article makes publicly
available the hypotheses,
interpretations and conclusions of your
research.
But what about the data that led you
to those results and conclusions?
5. Also open data is not always enough
http://www.theguardian.com/higher-education-network/blog/2014/jun/26
So data that is in theory open and
free to access!
• may still be hard to get hold of!
• it may not have been stored or cited
in the appropriate manner!
• it may not be interoperable with
related data because it is not
formatted appropriately; or!
• it may not be reusable because it
may not contain enough information
for others to understand it!
6. Benefits and barriers to data sharing
Credit to:
Iain Hrynaszkiewicz
Benefits! Barriers!
• Reduction of error and fraud!
• Increased return on investment in
research!
• Compliance with funder and
journal mandates!
• Reduce duplication and bias!
• Reproduction/validation of
research!
• Testing additional hypotheses!
• Use for teaching!
• Integration with other data sets!
• Increased citations !
• Concerns over inappropriate reuse!
• Limited time/resources!
• Costs associated with data sharing!
• Human privacy concerns!
• Unclear ownership of data/
authority to release data!
• Lack of academic incentives/
recognition!
• Lack of repositories or lack of
awareness of repositories!
• Protecting commercially sensitive
information !
7. Movement for FAIR data in life and medical sciences
http://bd2k.nih.gov/workshops.html#ADDS
8. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
8
• make annotation explicit
and discoverable
• structure the descriptions for
consistency
• ensure/regulate access
• deposit and publish
• etc….
§ To make any dataset ‘FAIR’, one
must have standards, tools and
best practices to:
• report sufficient details
• capture all salient features of
the experimental workflow
9. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
9
…breadth and depth !
of the experimental context!
…is pivotal!
10. sample characteristic(s)!
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
1
0
experimental design!
experimental variable(s)!
technology(s)!
measurement(s)!
protocols(s)!
data file(s)!
......!
11. Doing my fair share of work
Working with and for:
Increase the level of annotation at the source, tracking provenance and using community standards
Notes and narrative! Spreadsheets and tables! Linked data and nanopublications!
Notes in Lab Books
(information for humans)
Spreadsheets and Tables
( the compromise)
Facts as RDF statements
(information for machines)
14. Role of publishers as “agents of change”
• Data has to become an integral part
of the scholarly communications!
• Responsibilities lie across several
stakeholder groups: researchers,
data centers, librarians, funding
agencies and publishers!
• Publishers occupy a leverage point
in this process!
15. Publishers and data/reproducibility
• Policies on access (to data, code, reagents etc.)!
o Supporting funder & community needs!
• Format and amount of content!
o Methodological details, supplementary info, data integration and
links to repositories!
• Licensing for reuse!
• Incentives to share!
o Data citations!
o Data journals and articles!
• Quality assurance through peer review!
Credit to:
Iain Hrynaszkiewicz
16. Data/reproducibility at NPG
Some important events!
!
• 1996: Bermuda Principles!
o prepublication of DNA sequence data!
• 1998: Structural data!
o accession codes required by Nature & Science!
• 2002: MIAME community standards!
• microarray data deposition public repositories required!
• 2007: Methods sections!
o Limitations for the online version removed!
• 2009: Ioannidis et al. Nat Gen 41, 2, 149 !
Credit to:
Veronique Kiermer
18. Data/reproducibility at NPG
Some important recent events 2013-2014
• Figure source data
o putting data behind figures/graphs
o rolled out at Nature and progressively across all other Nature branded
titles
Wang et al, Nature, 2013
doi:10.1038/nature12730
19. Data/reproducibility at NPG
Some important recent events 2013-2014
• Figure source data
o putting data behind figures/graphs
o rolled out at Nature and progressively across all other Nature branded
titles
• Extended data
o expandable text and extra figures; rolled out at Nature
20. Data/reproducibility at NPG
Some important recent events 2013-2014
• Figure source data
o putting data behind figures/graphs
o rolled out at Nature and progressively across all other Nature branded
titles
• Extended data
o expandable text and extra figures; rolled out at Nature
• Data citation
o tackling both styling and format; monitoring community developments,
such the Data Citation Synthesis Group
o to be rolled out across all Nature branded titles and Scientific Data
• Code reproducibility
o peer review, availability and reuse
• NPG’s Linked Data release – CC0
• A new data publication platform:
22. The role of data journals/articles
• Credit!
• Unpublished data!
• Peer review focus!
• Value of data vs. analysis!
• Discoverability!
• Reusability!
• Narrative/context!
• “Intelligently open data”!
Credit to:
Iain Hrynaszkiewicz
24. market research (2011)
• Scope of survey!
o How much data researchers produce, in what format and
what they do with it!
o Perceived availability of public repositories!
o Perceptions of the Scientific Data concept!
o Level/nature of data journal peer review!
• Respondent characteristics!
o 387 respondents (329 active researchers)!
o Physics (24%), Earth and environmental science (21%),
Biology (20%) Chemistry (19%) Others (16%)!
Credit to:
Iain Hrynaszkiewicz
25. market research (2011)
• Key survey data
o 60% share their data with their colleagues
o 50% look at other researchers’ datasets at least once a month
o very few respondents produce more than 1TB of data per
year; the majority produce less than 1G
o 45% unaware of a repository for some of their data
o 90% reacted positively to the concept of Scientific Data
o 80% believed Scientific Data would increase data deposition
o what do researchers want from a data publication?
96% - increased visibility and discovery
95% - increased usability of their research data
93% - credit mechanism for deposit of data
80% - peer review of content/datasets Credit to:
Iain Hrynaszkiewicz
26. • Get Credit for Sharing Your Data
• Publications will be listed in the major indexes and will be citeable
• Focused on Data Reuse
• All the information others need to reuse the data; no interpretative
analysis or hypothesis testing
• Open-access
• Authors select from three Creative Commons licences for the main
• Data Descriptor. Each publication supported by curated CC0
metadata
• Peer-reviewed
• Rigorous peer-review managed by our Editorial Board of academic
researchers ensures data quality and standards
• Promoting Community Data Repositories
• Data stored in community data repositories
27. Supported by:!
Advisory Panel including senior researchers, funders, librarians and curators
Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment Research
Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office of
Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss
Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta,
UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute,
USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ●
Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics
Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical
Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ●
Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ●
Wellcome Trust, UK ● Wolfram Horstmann ● University of Oxford, UK ● Piero Carninci ● RIKEN
Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ●
Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter
Institute, USA ● Caroline Shamu ● Harvard Medical School, USA
Susanna-Assunta Sansone
Honorary Academic Editor
(University of Oxford, UK)
Andrew L Hufton
Managing Editor
Varsha Khodiyar
Editorial Curator
Iain Hrynaszkiewicz
Publisher
An open access, peer-reviewed publication for
descriptions of scientifically valuable datasets!
Launched May 2014
28. Introducing a new content type: the Data Descriptor
• The data descriptor is only concerned with
the facts behind the methodology of data
generation/collection and processing!
• A data descriptor complements a journal
Synthesis
Analysis
Data Descriptor
Conclusions
Interpretation
What is the
sample?
What did I do to
generate the data?
How was the data
processed?
Where is the data?
Who did what when?
Summary of
Data
Descriptor
Facts
Data Descriptor
Journal article
NARRATIVE
article!
29. Data Descriptor: narrative and structure!
Article or !
narrative component!
(PDF and HTML) !
!
!
!
Experimental metadata or !
structured component!
(in-house curated, machine-readable
formats)!
30. Data Descriptor: narrative and structure!
Article or !
narrative component!
(PDF and HTML) !
!
!
!
Experimental metadata or !
structured component!
(in-house curated, machine-readable
formats)!
31. Data Descriptor: narrative!
Focus on data reuse!
Detailed descriptions of the methods and technical analyses supporting the
quality of the measurements.!
Does not contain tests of new scientific hypotheses!
In traditional publications this
information is not provided in a
sufficiently detailed manner
However this information is
essential for understanding,
reusing, and reproducing
datasets
Sections:!
• Title!
• Abstract!
• Background & Summary!
• Methods!
• Technical Validation!
• Data Records!
• Usage Notes !
• Figures & Tables !
• References!
• Data Citations!
!
32. Data Descriptor: narrative!
Focus on data reuse!
Detailed descriptions of the methods and technical analyses supporting the
quality of the measurements.!
Does not contain tests of new scientific hypotheses!
Sections:!
• Title!
• Abstract!
• Background & Summary!
• Methods!
• Technical Validation!
• Data Records!
• Usage Notes !
• Figures & Tables !
• References!
• Data Citations!
!
33. Data Descriptor: narrative!
Focus on data reuse!
Detailed descriptions of the methods and technical analyses supporting the
quality of the measurements.!
Does not contain tests of new scientific hypotheses!
Sections:!
• Title!
• Abstract!
• Background & Summary!
• Methods!
• Technical Validation!
• Data Records!
• Usage Notes !
• Figures & Tables !
• References!
• Data Citations!
!
Joint Declaration of Data Citation Principles by the
Data Citation Synthesis Group
34. Data Descriptor: structure - content !
Includes fields describing:
• each study, linking to relevant sections of the
Data Descriptor article
• authors’ details, including ORCID
• publications
• funding sources and funders’ name, via FundRef
• experimental factors
• study design
• assays
• protocols
Data file or !
record in a
database!
analysis !
method! script!
35. Data Descriptor: structure - content !
In-house editorial curator:!
• assists users to submit the structured
content via simple templates and an
internal authoring tool!
• performs value-added semantic
annotation of the experimental
metadata!
For advanced users/service providers
willing to export ISA-Tab for direct
submission, we will release a technical
specification:!
Data file or !
record in a
database!
analysis !
method! script!
36. Relation with traditional articles - content!
!
!
!
!
!
!
!
!
Scientific hypotheses:!
Synthesis!
Analysis!
Conclusions!
Methods and technical analyses supporting the quality
of the measurements:!
What did I do to generate the data?!
How was the data processed?!
Where is the data?!
Who did what when!
37. Relation with traditional articles - time!
BEFORE: get your data to the community as soon as possible (see NPG pre-publication policy)
AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)
AFTER: expand on your research articles, adding further information for reuse of the data
39. Value added component integrated in a
growing ecosystem!
We currently recognize over
60 public data repositories!
!
Research
papers
Descriptors
Data
Data
records
40. Over 500 Over 600
A web-based, curated and searchable portal works to ensure the
standards and databases are registered, informative and discoverable
and accessible, monitoring the development and evolution of standards,
their use in databases and the adoption of both in data policies.
41. Over 500 Over 600
Including minimum
information reporting
requirements, or
checklists to report the
same core, essential
information
Including controlled
vocabularies, taxonomies,
thesauri, ontologies etc. to
use the same word and
refer to the same ‘thing’
Including conceptual
model, conceptual
schema from which an
exchange format is derived
to allow data to flow from
one system to another
42. Over 500 Over 600
Mapping the landscape of community –developed standards, databases
and data policies in the life sciences, broadly covering
biological, natural an biomedical sciences
43. Researchers, developers and curators lack support and guidance on how to best navigate and
select content standards, understand their maturity, or find databases that implement them;
Funders, journals and librarians do not have enough information to make informed decisions
on which content standards or database to recommended in policies, or funded or implemented
44. Advisory Board and Working Group - core members and adopters
Operational Team
45. Helping authors find the right place for the data!
• We currently recognize over 60 public data
repositories, and provide advice on the best
place for authors to archive their data!
• We have integrated systems with both:!
!
!
2
4
3
10 4
1
4
3
4
“Omics” is emphasized
among basic life-sciences
repositories
DNA and protein sequence
Functional genomics
Genetic association and genome variation
Metagenomics
Molecular interactions
Organism- or disease-specific
Proteomics
Taxonomy and species diversity
Traces and sequencing reads
46. 4 Big
data
|
CSE
2014
6
Repositories criteria!
1. Broad support and recognition within their scientific community !
2. Ensure long-term persistence and preservation of datasets!
3. Provide expert curation !
4. Implement relevant, community-endorsed reporting requirements !
Progressively monitor this via !
5. Provide for confidential review of submitted datasets !
6. Provide stable identifiers for submitted datasets !
7. Allow public access to data without unnecessary restrictions !
47. Open Access – APC supported!
Data: the primary datasets resides in public
repositories. Partnering with FigShare and Dryad,
which are both CC0!
Data Descriptor - structured component (ISA-Tab):
as NPG has already done with its existing Linked
Data Portal, the metadata about data descriptors in
Scientific Data is CC0!
Data Descriptor - narrative component: describing
the methodology of data generation/collection and
processing is licensed under either of the following, by
author choice:
OA Article processing charges: $1,000 USD / £650 GBP / €750 for each accepted article
48. Peer review process focused on quality and reuse!
Evaluation is not be based on the perceived impact !
or novelty of the findings or size of the data!
!
• Experimental rigour and technical data quality!
o Methodologically sound!
o Technical validation experiments and statistical analyses!
o Depth, coverage, size, and/or completeness of data sufficient for the types
of applications!
• Completeness of the description!
o Sufficient details to allow others to reproduce the results, reuse or
integrate it with other data!
o Compliance with relevant minimum information or reporting standards!
• Integrity of the data files and repository record!
o Data files match the descriptions in the Data Descriptor!
o Deposited in the most appropriate available data repository!
49. Current content is diverse - bimonthly releases !
• Neuroscience, ecology, epidemiology, environmental science, functional
genomics, metabolomics, toxicology etc.!
• New previously published individual datasets, curated aggregation and
citizen science:!
o a fuller, more in-depth look at the data processing steps, supported by
additional data files and code from each step!
o additional tutorial-like information for scientists interested in reusing or
integrating the data with their own!
• Datasets in figshare, Dryad and domain specific databases!
• Code deposited in figshare and GitHub!
• First collection:!
49
50. Hanke: Neuroscience !
New Dataset
Data in OpenfMRI
Source code in GitHub
Big Data
!
!
!
!
!
!
!
!
!
Code in GitHub
51. Stefano: Stem Cells!
Associated Nature Article
Data
- figshare
- NCBI GEO
Integrated figshare data viewer
52. Hao: Environmental!
New Dataset
Data in figshare
Code in figshare
Integrated figshare data
viewer
Cited in Science
53.
54. http://www.flickr.com/photos/12308429@N03/4957994485/
u Make sure your research outputs make an impact!
u Open your research outputs, via the right channels to get cited and credited
u Contribute to the reproducible research movement and to FAIR data
55. u Uniquely identify yourself via ORCID
u Share identified generic research outputs, e.g. FigShare
u Share and deposit code, e.g. GitHub, Bitbucket
http://www.flickr.com/photos/idiolector/289490834/
56. u Learn about open standards in your area, via e.g. BioSharing
u Select tools that implement relevant standards, e.g. ISA
u Publish not just in traditional journals, but think Scientific Data
http://www.flickr.com/photos/webhamster/2582189977/
57. Acknowledgements!
Advisory Boards and Collaborators
Philippe
Rocca-Serra, PhD
Alejandra
Gonzalez-Beltran, PhD
Milo
Thurston, PhD
Visit
nature.com/scientificdata
Email
scientificdata@nature.com
Tweet
@ScientificData
Honorary Academic Editor
Susanna-Assunta Sansone, PhD
Managing Editor
Andrew L Hufton, PhD
Editorial Curator
Varsha Khodiyar
Publisher
Iain Hrynaszkiewicz
Eamonn
Maguire, DPhil candidate
And we are hiring a software developer!