Scott Edmunds at the China National GeneBank Youth Biodiversity MegaData Forum: Democratising biodiversity and genomics research: open and citizen science to build trust and fill the data gaps. 18th December 2018
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
Scott Edmunds talk at IARC, Lyon. How can we make science more trustworthy and FAIR? Principled publishing for more evidence based research. 8th July 2019
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
Scott Edmunds talk at IARC, Lyon. How can we make science more trustworthy and FAIR? Principled publishing for more evidence based research. 8th July 2019
Scott Edmunds talk on GigaScience Big-Data, Data Citation and future data handling at the International Conference of Genomics on the 15th November 2011.
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...GigaScience, BGI Hong Kong
Scott Edmunds talk at the 7th Internation Conference on Genomics: "Channeling the Deluge: Reproducibility & Data Dissemination in the “Big-Data” Era. ICG7, Hong Kong 1st December 2012
"
Massive-Scale Analytics Applied to Real-World Problemsinside-BigData.com
In this deck from PASC18, David Bader from Georgia Tech presents: Massive-Scale Analytics Applied to Real-World Problems.
"Emerging real-world graph problems include: detecting and preventing disease in human populations; revealing community structure in large social networks; and improving the resilience of the electric power grid. Unlike traditional applications in computational science and engineering, solving these social problems at scale often raises new challenges because of the sparsity and lack of locality in the data, the need for research on scalable algorithms and development of frameworks for solving these real-world problems on high performance computers, and for improved models that capture the noise and bias inherent in the torrential data streams. In this talk, Bader will discuss the opportunities and challenges in massive data-intensive computing for applications in social sciences, physical sciences, and engineering."
Watch the video: https://wp.me/p3RLHQ-iPk
Learn more: https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: the reproducibility crisis, and the need for transparency. Melbourne University 19th September 2014
The ChemSpider database is a resource hosted by the Royal Society of Chemistry. With over 28 million unique chemicals on the database linked out to over 400 data sources the platform provides access to experimental and predicted data (properties, spectra etc.), links to publications, patents and a myriad of other resources. The ChemSpider database has been used as the foundation of a number of other resources for chemists including ChemSpider SyntheticPages, the Learn Chemistry Wiki and the Spectral Game. This presentation will provide an overview of ChemSpider and discuss how chemists can both derive value from and contribute to the content available from the database and its related resources. We will also discuss our view of future platform for managing personal, institutional and public chemistry in a shared environment.
Lecture for a course at NTNU, 27th January 2021
CC-BY 4.0 Dag Endresen https://orcid.org/0000-0002-2352-5497
See also http://bit.ly/biodiversityinformatics
https://www.gbif.no/events/2021/lecture-ntnu-gbif.html
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
Scott Edmunds talk at the HUPO congress in Geneva, September 6th 2011 on GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami.
Research Data Management Services at UWA (November 2015)Katina Toufexis
Research Data Management Services at the University of Western Australia (November 2015).
Created by Katina Toufexis of the eResearch Support Unit (University Library).
CC-BY
GBIF and reuse of research data, Bergen (2016-12-14)Dag Endresen
Biodiversity informatics seminar at the Department of Biology, University of Bergen on data publication and reuse of GBIF-mediated biodiversity data on 14th December 2016. Organized by the Norwegian GBIF Node and the Norwegian Biodiversity Information Center (NBIC, Artsdatabanken).
See also: http://www.gbif.no/events/2016/data-publishing-seminar-in-bergen.html
See also: http://doi.org/10.13140/RG.2.2.24290.32969
The 4th paradigm of research is manifest in the rising popularity of data science. Data science developments relevant to human genetics are discussed with particular reference to cloud computing and data accessibility.
American Society for Human Genetics, October 16, 2018, San Diego
Scott Edmunds slides for class 8 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering science data, medical data and ethics, and the FAIR data principles.
Scott Edmunds talk on GigaScience Big-Data, Data Citation and future data handling at the International Conference of Genomics on the 15th November 2011.
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...GigaScience, BGI Hong Kong
Scott Edmunds talk at the 7th Internation Conference on Genomics: "Channeling the Deluge: Reproducibility & Data Dissemination in the “Big-Data” Era. ICG7, Hong Kong 1st December 2012
"
Massive-Scale Analytics Applied to Real-World Problemsinside-BigData.com
In this deck from PASC18, David Bader from Georgia Tech presents: Massive-Scale Analytics Applied to Real-World Problems.
"Emerging real-world graph problems include: detecting and preventing disease in human populations; revealing community structure in large social networks; and improving the resilience of the electric power grid. Unlike traditional applications in computational science and engineering, solving these social problems at scale often raises new challenges because of the sparsity and lack of locality in the data, the need for research on scalable algorithms and development of frameworks for solving these real-world problems on high performance computers, and for improved models that capture the noise and bias inherent in the torrential data streams. In this talk, Bader will discuss the opportunities and challenges in massive data-intensive computing for applications in social sciences, physical sciences, and engineering."
Watch the video: https://wp.me/p3RLHQ-iPk
Learn more: https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: the reproducibility crisis, and the need for transparency. Melbourne University 19th September 2014
The ChemSpider database is a resource hosted by the Royal Society of Chemistry. With over 28 million unique chemicals on the database linked out to over 400 data sources the platform provides access to experimental and predicted data (properties, spectra etc.), links to publications, patents and a myriad of other resources. The ChemSpider database has been used as the foundation of a number of other resources for chemists including ChemSpider SyntheticPages, the Learn Chemistry Wiki and the Spectral Game. This presentation will provide an overview of ChemSpider and discuss how chemists can both derive value from and contribute to the content available from the database and its related resources. We will also discuss our view of future platform for managing personal, institutional and public chemistry in a shared environment.
Lecture for a course at NTNU, 27th January 2021
CC-BY 4.0 Dag Endresen https://orcid.org/0000-0002-2352-5497
See also http://bit.ly/biodiversityinformatics
https://www.gbif.no/events/2021/lecture-ntnu-gbif.html
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
Scott Edmunds talk at the HUPO congress in Geneva, September 6th 2011 on GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami.
Research Data Management Services at UWA (November 2015)Katina Toufexis
Research Data Management Services at the University of Western Australia (November 2015).
Created by Katina Toufexis of the eResearch Support Unit (University Library).
CC-BY
GBIF and reuse of research data, Bergen (2016-12-14)Dag Endresen
Biodiversity informatics seminar at the Department of Biology, University of Bergen on data publication and reuse of GBIF-mediated biodiversity data on 14th December 2016. Organized by the Norwegian GBIF Node and the Norwegian Biodiversity Information Center (NBIC, Artsdatabanken).
See also: http://www.gbif.no/events/2016/data-publishing-seminar-in-bergen.html
See also: http://doi.org/10.13140/RG.2.2.24290.32969
The 4th paradigm of research is manifest in the rising popularity of data science. Data science developments relevant to human genetics are discussed with particular reference to cloud computing and data accessibility.
American Society for Human Genetics, October 16, 2018, San Diego
Scott Edmunds slides for class 8 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering science data, medical data and ethics, and the FAIR data principles.
Scott Edmunds from GigaScience on 'Publishing in the Open Data Era", at the "Open, Crowdsource and Blockchain Science!" hangout at Hackerspace.sg, 23rd March 2015
This is an overview of the Data Biosphere Project, its goals, its architecture, and the three core projects that form its foundation. We also discuss data commons.
The slides that will accompany my live webcast for OpenCon 2014 attendees, all about open data in research. The benefits, the how to (both legally & technically), examples, pitfalls, and the future of open research data.
Scott Edmunds talk on Big Data Publishing at the "What Bioinformaticians need to know about digital publishing beyond the PDF" workshop at ISMB 2013, July 22nd 2013
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
Scott Edmunds presentation on: Reproducible method and benchmarking publishing for the data (and evidence) driven era. The Silk Road Forensics Conference, Yantai, 18th September 2018
International perspective for sharing publicly funded medical research dataARDC
Presentation by Olivier Salvado, CSIRO, to the 'Unlocking value from publicly funded Clinical Research Data' workshop, cohosted by ARDC and CSIRO at ANU on 6 March 2019.
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
Scott Edmunds at International Data Week 2022: A decades experiences in transparent and interactive publication of FAIR data and software via an end-to-end XML publishing platform. 21st June 2022
GigaByte Chief Editor Scott Edmunds presents on how to prepare a data paper for the TDR and WHO sponsored call for data papers describing datasets on vectors of human diseases launched in Nov 2021. Presented at the GBIF webinar on 25th January 2022 and aimed at authors interested in submitting a manuscript submitted to the series.
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
Scott Edmunds at the STM Week 2020 Digital Publishing seminar on Demonstrating bringing publications to life via an End-to-end XML publishing platform. 2nd December 2020
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
Scott Edmunds on a new publishing workflow for rapid dissemination of genomes using GigaByte & GigaDB. Presented at Biodiversity 2020 in the Annotation & Databases track, 9th October 2020.
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
Scot Edmunds talk at CODATA2019 on Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment. 19th September 2019 in Beijing
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
A 3 part talk presented at PAG Asia 2019 in Shenzhen- The Digitalization of Ruili Botanical Garden Project: Production, Curation and Re-Use. Presented by Huan Liu (CNGB), Scott Edmunds (GigaScience) & Stephen Tsui (CUHK). 8th June 2019
Ricardo Wurmus at #ICG13: Reproducible genomics analysis pipelines with GNU Guix. Presented at the GigaScience Prize Track at the International Conference on Genomics, Shezhen 26th October 2018
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
Paul Pavlidis talk at the #ICG13 GigaScience Prize Track: Monitoring changes in the Gene Ontology and their impact on genomic data analysis (GOtrack). Shenzhen, 26th October 2018
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
Stefan Prost presentation for the #ICG13 GigaScience Prize Track: Genome analyses show strong selection on coloration, morphological and behavioral phenotypes in birds-of-paradise. Shenzhen, 26th October, 2018
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
Lisa Johnson's talk at the #ICG13 GigaScience Prize Track: Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes. Shenzhen, 26th October 2018
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
Mary Ann Tuli's talk at the International Society of Biocuration meeting : What MODs can learn from Journals – a GigaDB curator’s perspective. Shanghai 9th April 2018
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...GigaScience, BGI Hong Kong
Laurie Goodman's pre-prepared slides for the Subgroup S Sharing and Reusing Cell Image Data session at the 2017 ASCB│EMBO meeting in Philadelphia. December 2017
Susanna Sansone's talk at the "Beyond Open" Knowledge Dialogues/Open Data Hong Kong event on research data, hosted at the Hong Kong Innocentre on Monday 20 November 2017.
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...GigaScience, BGI Hong Kong
Jie Zheng at the #ICG12 GigaScience Prize Track: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome. ICG12, Shenzhen, 26th October 2017
Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biog...GigaScience, BGI Hong Kong
Valerie de Anda Torres at the #ICG12 GigaScience Prize Track: A new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle. Shenzhen, 26th October 2017
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Comparative structure of adrenal gland in vertebrates
Democratising biodiversity and genomics research: open and citizen science to build trust and fill the data gaps.
1. Democratising biodiversity and genomics
research: open and citizen science to
build trust and fill the data gaps.
Scott Edmunds
CNGB
18th December 2018
4. Scientists: need to convince public + politicians
科学家:取信于官民
https://www.nature.com/articles/s41538-018-0018-4
“China’s Ministry of Agriculture and the science community generally expressed a positive attitude
toward GM food, but the percentage of respondents that trusted the government and scientists was
only 11.7 and 23.2%, respectively.”
7. How to regain trust?
如何重获信任?
Areas we need to tackle to allow citizens to trust us
Citizen Science - Involve the public
in the scientific process
Open Science - Increase
transparency & fill the data gaps
Open Access - Change incentive
systems away from dead tree
advertising to reproducibility
8. How to build a community genome project using local pride
How to regain trust?
如何重获信任?
9. We need genetic literacy to make decisions on
Health Starting a family Shopping
What we need to know: 21st Century Edition
Context:
12. HK Botanical &
Afforestation Dept.
"The mysterious origin
of the tree & its
magnificent flowers at
once arrest the interest.
Solve the Bauhinia Mystery?
1903
So far, all efforts to identify them with
any foreign species have failed"
23. Need to fill biodiversity gaps
Expert predictions
of species richness
https://www.nature.com/articles/ncomms9221
Completeness of
biodiversity records
24.
25. HK Citizens far outpacing academic research grade GBIF observations
https://www.gbif.org/country/HK/summary
…
• Much higher eBird (146,113) & iNaturalist (39,152) research grade observations than HKU
Herbarium (1,061)
• Korean International School made 10,792 iNaturalist observations during Inter-schools
Challenge, and CFSS saw 931 species
27. Into an information vacuum fills rumour
How not to regain trust?
失信的深渊?
https://www.independent.co.uk/news/world/asia/japan-cracks-down-on-leaks-after-scandal-of-fukushima-nuclear-power-plant-8965296.html
38. Buckheit & Donoho: Scholarly articles are merely advertisement of
scholarship. The actual scholarly artifacts, i.e. the data and
computational methods, which support the scholarship, remain largely
inaccessible.
How not to regain trust?
失信的深渊?
39. Provide evidence not advertising
Transparency or bust
Show me the peer reviews
Give me the data/ code/protocols
Let me publish replication studies
Buckheit & Donoho: Scholarly articles are merely advertisement of
scholarship. The actual scholarly artifacts, i.e. the data and
computational methods, which support the scholarship, remain largely
inaccessible.
How to regain trust?
如何重获信任?
用证据说话
40. GigaScience Ethos/Policies: ‘Impact' is subjective. Data is quantitive.
Reward evidence (data), not advertising
鼓励证据(数据)而非包装
• Data
• Software
• Models
• Pipelines
• Reviews
• Re-use…
= Credit
41. Data Publishing: nothing new…
Data & Metadata Collection/Experiments
Analysis/Hypothesis/Analysis
Conclusions
+ Area of Interest/Question
1839
1859
20 Yrs.
42. Rewarding open data & code
鼓励开放数据和代码
http://gigasciencejournal.com/
Since July 2012. Publishes “Data Notes” for CC0 data, “Tech Notes” for OSI software.
43. Integrated GigaDB repository. DataCite DOIs. No size limits, APC covers storage.
http://gigadb.org/
Rewarding open data & code
鼓励开放数据和代码
46. Visualisations
& DOIs for workflows
http://www.gigasciencejournal.com/series/Galaxy 46
Rewarding & enabling interaction
鼓励并实现互动
47. Workflows/Virtual Machines/containers
• Downloadable as virtual harddisk/available as Amazon Machine Image
• Now publishing container (docker) submissions
• CodeOcean widgets for code, “compute capsule” run on AWS
48. First journal with deep integration with
Launched 2nd June 2016
Reward better handling of “wet” protocols…
• Create, share, modify forkeable protocols in repo.
• Download & run on smartphone app.
• Widgets embedded in GigaDB
• Get discoverability, credit, DOIs for sharing methods.
• Create your own, or let us set up & you claim.
https://www.protocols.io/groups/gigascience-journal
49. Rewarding & enabling interaction
鼓励并实现互动
Building tools (inc Jbrowse for genomes, sketchfab for 3D images) on top of datasets…
[Insert Widget Here]
50. Democratising Data at GigaScience
• From Big Data to usable Data
• Example: WebTools for easy browsing and visualisation
• Pan-and-zoom map browser as a visual aid to allow the end user to
find datasets
51. • 3D viewer allows users to interact and explore image data prior to data
download
• 3D models are CC0, can be downloaded, and are printable
Democratising Data at GigaScience
• From Big Data to usable Data
• Example: WebTools for easy browsing and visualisation
https://sketchfab.com/GigaDB
52. Democratising Data at GigaScience
• Widening the target audience
• Bioinformaticians and ‘Big Data’ scientists are a
primary target audience
• Plugins and visualisations make access easier for
the less technically inclined
• Democratises access
through education
potential and ease of use
https://www.thingiverse.com/GigaScience/designs
55. To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang,
J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J;
Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X;
Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the
Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium
(2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI
Shenzhen. doi:10.5524/100001
http://dx.doi.org/10.5524/100001
Our first DOI:
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
Open Data to the rescue…
56.
57.
58.
59. Downstream consequences:
“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli
strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days
for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could
use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that
allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and
publish their work without wasting time on legal wrangling.”
1. Many Citations 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons
4. Example for faster & more open science
60. 1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastro-
intestinal infection in Hamburg in Germany in May 2011. This
spread through several European countries and the US,
affecting about 4000 people and resulting in over 50 deaths. All
tested positive for an unusual and little-known Shiga-toxin–
producing E. coli bacterium. The strain was initially analysed by
scientists at BGI-Shenzhen in China, working together with
those in Hamburg, and three days later a draft genome was
released under an open data licence. This generated interest
from bioinformaticians on four continents. 24 hours after the
release of the genome it had been assembled. Within a week
two dozen reports had been filed on an open-source site
dedicated to the analysis of the strain. These analyses
provided crucial information about the strain’s virulence and
resistance genes – how it spreads and which antibiotics are
effective against it. They produced results in time to help
contain the outbreak. By July 2011, scientists published papers
based on this work. By opening up their early sequencing
results to international collaboration, researchers in Hamburg
produced results that were quickly tested by a wide range of
experts, used to produce new knowledge and ultimately to
control a public health emergency.
62. Oxford Nanopore in the spotlight, Sept 2014. Does it work?
https://doi.org/10.1111/1755-0998.12324
http://omicsomics.blogspot.com/2014/09/oxford-takes-some-flak-fires-back.html
2014年9月面世的Oxford Nanopore,好用吗?
63. Nanopore MinION E. Coli genome
released via GigaDB 10-Sep-2014
Curated & converted to ISA-tab, &
worked with EBI to get raw data there
Data Note submitted & preprint version
out 26-Sept-2014
Peer reviewed & published 20-Oct-2014
http://dx.doi.org/10.5524/100102
66. Try before you buy: inspect ALL the data yourselves
https://doi.org/10.1093/gigascience/gix024
• Comparisons with Illumina for
PE50, 100 & 150
• Raw sequencing data in NCBI SRA
• FASTQ files in GigaDB
• Raw image files also shared
Would you trust a BGI sequencer?
华大测序仪可信吗?
先尝后买:亲自检查所有数据
67. Open, transparent and peer reviewed benchmarking
https://doi.org/10.1093/gigascience/gix024
http://dx.doi.org/10.5524/review.100698
http://dx.doi.org/10.5524/review.100699Open
Review
Would you trust a BGI sequencer?
华大测序仪可信吗?
70. Transparency saves wildlife
User-friendly pipeline for the rapid identification of CITES-listed
species in forensic samples using Illumina data.
• International validation trial by 16 laboratories.
• All input sequence data + results available in GigaDB.
• SOPs available in protocols.io.
https://doi.org/10.1093/gigascience/gix080
72. Democratising Data at GigaScience
• Challenges of Food security
• Rice, Oryza sativa L., is the
staple food for half the world’s
population
• By 2030, rice production must
increase by at least 25% to keep
pace with population growth
• 80% of countries face a serious
burden of malnutrition,
especially in Africa and SE Asia
73. Democratising Data at GigaScience
Rice 3K project
• 3,000 rice genomes
• 13.4TB public data
• 6 months to copy
data to Sequence
Read Archive (SRA)
• Data published 4
years before
analysis published
74. Democratising Data at GigaScience
• Orphan Crops
• The African Orphan Crop
Consortium (AOCC) is
developing genomic resources
for 101 crops that represent a
significant part of African/Asian
diets.
• To-date, the AOCC working on
69 genomes, first 5 of which
just published in GigaScience.
Hyacinth bean
https://doi.org/10.1093/gigascience/giy152
75. Democratising Data at GigaScience
• Each AOCC genome is a single GigaDB dataset (with DOI)
76. From Big Data to usable(ish) Data
• Although 13TB data in GigaDB was open (CC0), after analysing in
Tianhe supercomputer processed rice3K data = 100TB
• AWS hosted for free, but expensive to process
https://aws.amazon.com/public-data-sets/3000-rice-genome/
77. Processed data finally published 1st May 2018, Nature v557, p43–49
https://www.nature.com/articles/s41586-018-0063-9
78. Democratising Data at GigaScience
• From Big Data to usable Data
• Example: Easy-to-use plug and play RiceGalaxy
• GUI means plant breeders can utilise genetic data without coding skills
• Funded to run at low cost (<100 USD/month) via AWS Singapore & local
servers (2 vCPUs, 8GB RAM, 2 mounted volumes, 200GB total storage)
• CGIAR Excellence in Plant Breeding Platform/model will roll out to other
crops
80. Other beneficiaries: you!
Piwowar HA, Day RS, Fridsma DB (2007)
PLoS ONE 2(3): e308.
doi:10.1371/journal.pone.0000308
Sharing Detailed Research
Data Is Associated with
Increased Citation Rate.
Every 10 datasets collected contributes to at least 4 papers in the
following 3-years.
Piwowar, HA, Vision, TJ, & Whitlock, MC (2011). Data archiving is a good investment Nature, 473
(7347), 285-285 DOI: 10.1038/473285a
81. Open Science = Science
• Science needed more than ever to tackle grave
environmental challenges and fight disease
• Stand on the shoulders of giants, and allow others
to stand on yours
• Choose evidence not branding
• Being closed provokes distrust, prevents
downstream use, and ultimately harms science
• Being open helps science, your immediate
community, and ultimately your career
• Preempt new EU Open Science and MOST rules on
“strengthening research integrity”…
http://most.gov.cn/mostinfo/xinxifenlei/fgzc/gfxwj/gfxwj2018/201805/t20180531_139731.htm
82. Help GigaScience make it happen
www.gigasciencejournal.com
Give us your data,
pipelines & papers
scott@gigasciencejournal.com
editorial@gigasciencejournal.com
database@gigasciencejournal.com
Contact us:
助力GigaScience实现科研过程全公开
83. Thanks to:
Laurie Goodman, Editor in Chief
Nicole Nogoy, Editor
Hans Zauner, Assistant Editor
Hongling Zhao, Assistant Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Chris Armit, Data Scientist
Mary Ann Tulli, Data Ediitor
Xiao (Jesse) Si Zhe, Database Developer
Chen Qi, Shenzhen Office.
@GigaScience
facebook.com/GigaScience
http://gigasciencejournal.com/blog/
Follow us:
www.gigasciencejournal.com
www.gigadb.org
+
Weibo
& WeChat