SlideShare a Scribd company logo
1 of 57
Genome sharing projects
around the world
– and how you find data for
your research
Cambridge, April 26 2016
Slides will be made available online 
Tweets welcome #CamFindData
We are on twitter:
@glyn_dk
@repositiveio
@DNAdigest
@CamOpenData
Cambridge, April 26 2016
Slides will be made available online 
Tweets welcome #CamFindData
1. What data are you looking for? And Why?
2. Data resources from around the world
3. Tips on how to find and access data
4. Hands-on using Repositive
5. Summary and feedback
Workshop outline
1. What data are you looking for?
This workshop will focus on finding
and accessing human genomic data.
… And why would you be looking for
genomic data for your research?
Are you researching cancer or
genetic diseases?
How much data do you need to publish a paper?
2001: 1 human genome
2012: 1000 Genomes (1092 genomes, since increased to ~2500)
2015:
UK10K, Icelandic population (2,636 + 100k imputed),
Cancer genome atlas ~11,000 genomes
Exac consortium 65,000 exomes
?
Statistically speaking, you still need 10s of thousands of samples for
validation
The more severe the phenotype and the more complete penetrance, the
easier it will be for you to find your variant, but
“As the genetic complexity of the disease increases (for example,
reduced penetrance and increased locus heterogeneity), issues of
statistical power quickly become paramount.”
http://www.nature.com/nrg/journal/v15/n5/full/nrg3706.html
But I am just looking at this one disease…
What can I do?
PRO TIP: involve a statistician early on in your study design!
How can I determine significance?
“One potentially powerful approach is to assess conservation across and within
multiple species as whole-genome sequence data become more abundant.”
Look at extreme phenotypes “Sampling cases or controls from the extremes of an
appropriate quantitative distribution can often increase power”
Look at non-SNP variants, they are more likely to have functional effects
- “how to account for the technical features of sequencing, such as incomplete
sequencing and biased coverage over the genome?”
Think of how you can provide evidence that your result is not just a local
technical variation or sampling bias
e.g. data from same cell type, same seq technology, same alignment…
How to account for bias?
PRO TIP: include more reference data in your analysis
• Know what data is available in your lab,
your dept, your org
• Survey from Qiagen showed that one of
the main reasons researchers collaborate
is to get access to data!
How can I access more data for my research?
How can I find collaborators?
PRO TIP: Search for collaborators who have the data you need
PRO TIP: Tell your colleagues and peers what type of data you
have in your lab
2. Data resources from around the world
Public repositories
• some you apply for access,
especially if data contains
clinical info or whole genome
PID
• some are open access: GEO,
SRA, PGP, OpenSNP, GigaDB, …
• some are consented for
general research use, some
have specific consent
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Large amounts of data, but not accessible
≈ .5PB
Sequence
available
80+PB
Sequenced
every year
WGS data available
in public repos
Exponential
growth rate
Under-utilised data
has huge potential for
medical research
DATA is fragmented
It may be confusing
Hundreds of data sources
…but they aren’t easy to find!
10
25
33 35
102
163
0
20
40
60
80
100
120
140
160
180
200
Jan-15 Mar-15 Jun-15 Sep-15 Dec-15 Mar-16
http://dx.doi.org/10.1371/journal.pbio.1002418First 30 data sources listed here:
Data source content
Assay Types
Dedicated to…
Number of samples in Data sources
1
10
100
1000
10000
100000
1000000
Sample#(Log10)
Top 5:
GEO (1.8M)
PMI Cohort Program (1M)
Auria Biopankki (1M)
EGA (~0.6M)
SRA (~0.5M)
Data accessibility
Can download the
data straight away
or after logging in.
Need to apply for
access to the data.
Has both Open and Restricted
access data within one repository.
Online Data source ’types’
University – Affiliated to a
university. Often only members of
that university can
upload/download to/from it. Catalogue – doesn’t have raw
data but lists studies/datasets.
Initiative/Consortium – Has a
specific purpose/aim. Often
focussed on a question or
disease.
Repository – Can download
from, has data from multiple
institutions. Often can also
upload your own data there.
Company – For profit
organisation. Listing data is
not their main purpose.
Biobank – many have sequence
data of their biological samples.
Sequenced ethnicities
Aboriginals
African Americans
Africans
Australians
Chinese
Malays
Indians
Danish
Dutch Estonian
Russian
European Ancestry
Finnish
Icelandic
Japanese
Korean
Latin Americans
Saudi
Swedish
Machines & Data sources
947
5600
88
660
26
68
50
62
3
25
0
0
23 International
Interesting site to look at:
http://omicsmaps.com/stats
Main Repository funders
BGI = 4
EBI = 9NIH = 10
NCBI = 9
The Broad = 8
Wellcome = 4
EBI total 104 services, 19 repositories http://www.ebi.ac.uk/services/all
NCBI total 67 databases http://www.ncbi.nlm.nih.gov/guide/all/#databases_
Biobanks as data sources
- Biobanks are potential sources of genomic data
- Most biobanks contain large collections of samples (thousands)
- Some biobanks also contain data related to these samples
- A fraction of this data is genomic data (usually genotyping)
- Several biobanks (e.g. ToMMo biobank in Japan, UK biobank) have sequencing programs
- Many biobanks do not consider sequencing as their priority but are willing to give their samples to
researchers who would like to sequence them
- Most biobanks are supposed to share their samples with bona fide researchers (exception –
commercial biobanks, e.g. Abcodia)
- In most cases, the best thing is to ask them directly whether they have samples/data that you
need!
Name: UK Biobank
Type of data: genotyping
URL: http://biobank.ctsu.ox.ac.uk/crystal/gsearch.cgi
UK Biobank
Name: ToMMo Biobank
Type of data: genotyping, WGS
URL: https://ijgvd.megabank.tohoku.ac.jp/
Name: Diabetes Biobank Brussels
Type of data: data (including genomic; not specified) and
clinical samples on >20.000 diabetic patients and their first
degree relatives.
URL: http://www.diabetesbiobank.org/
Name: Dutch biobanks (dozens of them!)
Type of data: multiple
URL: http://bit.ly/1XxPA6W
Name: Auria Biobank Finland
Type of data: There are roughly one million human biological samples
stored in Auria Biobank, a considerable proportion of which are cancer
samples. At the moment, there is only the catalogue of samples, no
catalogue of data. In case a researcher needs to know what kind of data
we have, he/she needs to contact us.
URL: https://www.auriabiopankki.fi/?lang=en
More information about data sources
… in our recent paper:
http://tinyurl.com/plos-biology-repositive
• Case study: DNA data on Cancer
3. Tips to find and access data
Case Study – DNA data on Cancer
Repositories you
have heard of:
Ask around
(word of mouth):
Repository Data Type Access
ArrayExpress Expression Open
GEO Espression Open
EGA Mixed Restricted
dbGaP Mixed Restricted
Encode Healthy Reference Open
1000 Genomes Healthy Reference Open
Repository Data Type Access
COSMIC Somatic mutations & WGS Open
ClinVar Variant information Open
ExAC Allele Freq. but not raw data Open
SRA Individual sequences Open
TCGA Clinical & high level data Open
CGHub Low level data (DNA data) Restricted
Case Study – DNA data on Cancer
We have identified the first 27 cancer specific data sources 
And many more that contain cancer data alongside other data
types.
Abcodia
AmbryShare
BRCA Exchange
Breast Cancer Now Tissue Bank
Broad Cancer programme datasets
Cancer Moonshot 2020
CanGEM
CGCI
CGHub
Chinese cancer genome consortium
Chinese national human genome centre
Follicular Lymphoma Genome Data
G-DOC
GenoMel
ICGC
National Mesothelioma Virtual Bank
NCIP Hub
Project GENIE
Target
TCGA
Texa cancer research biobank
NCI-60
CCLE
COSMIC
Fantom
cancer methylome system
Cancer therepeutics response portal
1. Register for eRA account
2. Request access to specific dataset of interest
3. Download data
Registering for CGHub
https://cghub.ucsc.edu/keyfile/newuser.html
‘Principle signing
official’ registers
Email to verify
Email to
confirm/deny access
to website
Email with
temporary password
Change password Electronic signature
Login Fill in contact info,
Complete ‘424’ form
(research application
form)
Request reviewed by
DAC
Email to
confirm/deny access
to data
Login
Retrieve personal
access token
Download! 
Often a long process
Bottlenecks:
• Finding relevant and usable
data
• Getting authorisation to
access data
• Formatting data
• Storing and moving data
We studied the problem by
qualitative interviews followed
by a survey of researchers in
human genetics
Often a long process
T. A. van Schaik et al
The need to redefine genomic
data sharing: a focus on data
accessibility, Applied &
Translational Genomics, 2014
10.1016/j.atg.2014.09.013
Researchers spend months to
find and access genomic data,
and often choose to not access
data at all
Why the barrier?
Why the barrier?
• Benefits: strict governance, review of consent, applicant signs for full
responsibility for governance
• Disadvantages: No control of data once access is given, high barrier for
access – too high?
• Start planning your data needs early in your project
• When you find the data you need, start application
• Use Open Access data
How can I save time?
PRO Tip: If you use human genomic data, apply for the GRU
datasets in dbGaP, one application – access to all the GRU
datasets
• Some data is Open Access  requires specific consent
• OpenSNP.org (Bastian)
• Personal Genomes Projects
• Individuals who put their genomes online, e.g. Manuel Corpas
and his family “the Corpasome”
• http://manuelcorpas.com/about/
Not all data is restricted
• Some data is Open Access  requires specific consent
• Individuals who put their genomes online, e.g. Manuel Corpas
and his family “the Corpasome”
• http://manuelcorpas.com/about/
• OpenSNP.org
• Personal Genomes Projects
Not all data is restricted
Personal Genome Project
PGP Harvard PGP Canada PGP UK Genom Austria
Host institution Harvard Medical School
Boston
SickKids Toronto University College London CeMM Research Center for
Molecular Medicine
Principal Investigator George Church Steven Scherer Stephan Beck Christoph Bock & Giulio
Superti-Furga
Launch year 2005 2012 2013 2014
Geographic scope USA, mainly Boston Canada United Kingdom Mainly Austria
Enrollment eligibility At least 18 years old, able to make an informed decision, perfect score in the PGP enrollment exam, certain vulnerable groups
excluded
Data Generated Whole genome sequencing,
upload of additional data
possible
Mainly whole genome
sequencing
Whole genome sequencing,
DNA methylome sequencing,
RNA transcriptome sequencing
Mainly whole genome
sequencing
Number of genomes 100s 10s 10s 10s
Data access
http://personalgenomes.org/harvard/data
http://genomaustria.at/unser-
genom/#genome-der-
pionierinnen
Project funding Discretional funds and
corporate sponsoring
Institutional startup funds Discretional funds and
corporate sponsoring
Institutional startup funds
Areas of emphasis Integration with phenotypic data,
collaboration with other personal
omics initiatives
Genome donations, synergy with
massive-scale clinical genome
sequencing projects
Genomes and society, genetic
literacy, school projects,
education
Website http://personalgenomes.org/harvard/ http://personalgenomes.org/canada/ http://personalgenomes.org/uk/ http://genomaustria.at/
Summary of data access barriers
Data is uploaded
to repository
Data is discovered
by potential user
Data is accessed
by potential user
• “even when researchers are authorised to share data they
report reluctance to do so because of the amount of effort
required“ http://www.sciencedirect.com/science/article/pii/S2212066114000386
• “Clinical geneticists cited a lack of time because their main priority is
diagnosing patients. Industrial researchers cited a lack of time because of
the pressure to meet the deadlines in their job. Researchers in academia
cited both a concern about the potential loss of future publications once
unpublished data is shared, and the lack of time and incentive to share
data as this does not contribute to their publication record. Researchers
from all categories felt that they lacked sufficient resources to make their
data available.”
The barrier of making data available
But I do not want to share my data
• If you expect data to be available to you
– you have to make your data available too!
• Encourage collaborations: power by numbers
1. Get credit – publish and make your data available
2. Give credit – cite data sources
3. Understand consent – for all uses of clinical data
Best practices
• Use all available tools to make your life easier:
• Data publications  visibility and citations for your data, e.g.
GigaScience and Scientific Data
• Figshare, Zenodo, Dryad for sharing open access data
• PhenomeCentral, Matchmaker exchange for rare disease research
• Repositive for finding data across repositories and make your own
data discoverable
Best practices: use the tools
Does data sharing
matter at
grant proposal evaluation
Based on: Winning Horizon 2020 with Open Science,
http://dx.doi.org/10.5281/zenodo.12247
Best practices: Plan into your grant proposals
“Weakness: Involvement of non-
academic beneficiaries is limited”
“Weakness: highly focused on academic activities, and
lacks an advanced communication strategy”
“Weakness: limited exposure to
non-academic partners & infrastructures”
Excellence
Impact
Implementation
“data accessibility is unclear!”
“data storage & access not considered”
Best practices: Plan into your grant proposals
“Strengths: extensive dissemination of data to the
scientific community (open access, databases)”
“outreach activities to a broad audience”
“research software is freely available”
Impact:
Best practices: Plan into your grant proposals
Best practices: Plan into your grant proposals
Make the (research) world a better place by sharing in return 
Best practices: Share in return!
• Digital consent: towards automatic processing of applications
• Dynamic consent and power to the patient, e.g.
PatientsKnowBest
• Privacy-preserving access to datasets: preserving control and
governance with data custodian, lower barrier for access
What the future holds
4. Hands-on session using Repositive
What if finding data was as easy as finding a book on
Amazon, book a hotel on Expedia?
Repositive promotes best practices
Discover new data sources
EASY
SEARCH
Repositive promotes best practices
Make your data visible
SHARE
KNOWLEDGE
Repositive promotes best practices
Build a data community
BUILD
TRUST
Benefit for both sides of data collaboration
Data consumers Data producers
Find relevant data faster
Feedback from other users
through ratings and comments to
evaluate data quality
Find collaborators with data
Make your data visible
Build credibility as a trusted
provider of quality data
Find collaborators to analyse
your data
Live demo
http://discover.repositive.io
Use activation code: CamFindData
5. Summary and feedback
• Get credit – publish data
• Give credit – cite data
• Understand consent
Tell us your thoughts:
@repositiveio
@glyn_dk
And read more on http://repositive.io
Bugs and feedback to: Charlotte at Repositive.io
Thank you!

More Related Content

What's hot

RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation Jackie Wirz, PhD
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersRebecca Grant
 
Do Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer NatureDo Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer NatureRebecca Grant
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyFAIRDOM
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the futurePistoia Alliance
 
Empowering Data in Scholarly Publishing
Empowering Data in Scholarly PublishingEmpowering Data in Scholarly Publishing
Empowering Data in Scholarly PublishingCharleston Conference
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECAProject
 
Rii stock centerdir_aug9_2016
Rii stock centerdir_aug9_2016Rii stock centerdir_aug9_2016
Rii stock centerdir_aug9_2016Anita Bandrowski
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data ChallengesPhilip Bourne
 
From Tweetations to Citations: Social Media and the Researcher
From Tweetations to Citations: Social Media and the ResearcherFrom Tweetations to Citations: Social Media and the Researcher
From Tweetations to Citations: Social Media and the ResearcherSharon Karasmanis
 
Laurie Goodman: Overcoming Hurdles to Data Publication
Laurie Goodman: Overcoming Hurdles to Data PublicationLaurie Goodman: Overcoming Hurdles to Data Publication
Laurie Goodman: Overcoming Hurdles to Data PublicationGigaScience, BGI Hong Kong
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECAProject
 

What's hot (20)

Working with Quertle
Working with QuertleWorking with Quertle
Working with Quertle
 
Clinical Anatomy 9566
Clinical Anatomy 9566Clinical Anatomy 9566
Clinical Anatomy 9566
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career Researchers
 
Do Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer NatureDo Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer Nature
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems Biology
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
Empowering Data in Scholarly Publishing
Empowering Data in Scholarly PublishingEmpowering Data in Scholarly Publishing
Empowering Data in Scholarly Publishing
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
 
Rii stock centerdir_aug9_2016
Rii stock centerdir_aug9_2016Rii stock centerdir_aug9_2016
Rii stock centerdir_aug9_2016
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 
From Tweetations to Citations: Social Media and the Researcher
From Tweetations to Citations: Social Media and the ResearcherFrom Tweetations to Citations: Social Media and the Researcher
From Tweetations to Citations: Social Media and the Researcher
 
Laurie Goodman: Overcoming Hurdles to Data Publication
Laurie Goodman: Overcoming Hurdles to Data PublicationLaurie Goodman: Overcoming Hurdles to Data Publication
Laurie Goodman: Overcoming Hurdles to Data Publication
 
Some Early Thoughts
Some Early ThoughtsSome Early Thoughts
Some Early Thoughts
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...
 

Viewers also liked

CSIGE - Centro de Servicios Inteligentes para la Gestión Energética
CSIGE - Centro de Servicios Inteligentes para la Gestión EnergéticaCSIGE - Centro de Servicios Inteligentes para la Gestión Energética
CSIGE - Centro de Servicios Inteligentes para la Gestión EnergéticaCSIGE
 
Apresentação Endpoint Protector 4
Apresentação Endpoint Protector 4Apresentação Endpoint Protector 4
Apresentação Endpoint Protector 4whitehatportugal
 
Arquitectura digital prospectiva
Arquitectura digital prospectiva Arquitectura digital prospectiva
Arquitectura digital prospectiva OATA
 
SEJARAH BAHASA MELAYU Bab3
SEJARAH BAHASA MELAYU Bab3SEJARAH BAHASA MELAYU Bab3
SEJARAH BAHASA MELAYU Bab3Mira Sandrana
 
1 Tebrau designer suite slides kenneth
1 Tebrau designer suite slides kenneth1 Tebrau designer suite slides kenneth
1 Tebrau designer suite slides kennethGregory Low
 
Desarrollo android almacenamiento de datos
Desarrollo android    almacenamiento de datosDesarrollo android    almacenamiento de datos
Desarrollo android almacenamiento de datosFernando Cejas
 
Tema 1 : GENÈTICA
Tema 1 : GENÈTICATema 1 : GENÈTICA
Tema 1 : GENÈTICAEVAMASO
 
Plan eCommerce Hotelero. 1ª Parte: Diagnóstico
Plan eCommerce Hotelero. 1ª Parte: DiagnósticoPlan eCommerce Hotelero. 1ª Parte: Diagnóstico
Plan eCommerce Hotelero. 1ª Parte: DiagnósticoMindProject
 
Docentes y contenidos 1ª ed. Postgrado Gestión Empresarial de la Música
Docentes y contenidos 1ª ed.  Postgrado Gestión Empresarial de la MúsicaDocentes y contenidos 1ª ed.  Postgrado Gestión Empresarial de la Música
Docentes y contenidos 1ª ed. Postgrado Gestión Empresarial de la MúsicaPostmusicauv Universitat València
 
creating a trading zone around twitter srchives. case study: paris attacks
creating a trading zone around twitter srchives. case study: paris attackscreating a trading zone around twitter srchives. case study: paris attacks
creating a trading zone around twitter srchives. case study: paris attacksFIAT/IFTA
 
Vito Gamberale, F2i - Oltre la crisi: attori e infrastrutture per lo sviluppo
Vito Gamberale, F2i - Oltre la crisi: attori e infrastrutture per lo sviluppoVito Gamberale, F2i - Oltre la crisi: attori e infrastrutture per lo sviluppo
Vito Gamberale, F2i - Oltre la crisi: attori e infrastrutture per lo sviluppoVito Gamberale
 
Ecologia Yaneth Liera 602
Ecologia Yaneth Liera 602Ecologia Yaneth Liera 602
Ecologia Yaneth Liera 602Yaneth Liera
 
Content Marketing Masterclassing Week NYC with censhare
Content Marketing Masterclassing Week NYC with censhareContent Marketing Masterclassing Week NYC with censhare
Content Marketing Masterclassing Week NYC with censhareIO Integration
 
Empleados municipales como miembros de Juntas Intermedias de Escrutinio
Empleados municipales como miembros de Juntas Intermedias de EscrutinioEmpleados municipales como miembros de Juntas Intermedias de Escrutinio
Empleados municipales como miembros de Juntas Intermedias de Escrutinio23suma
 

Viewers also liked (20)

CSIGE - Centro de Servicios Inteligentes para la Gestión Energética
CSIGE - Centro de Servicios Inteligentes para la Gestión EnergéticaCSIGE - Centro de Servicios Inteligentes para la Gestión Energética
CSIGE - Centro de Servicios Inteligentes para la Gestión Energética
 
Gigathlon_CZ_2017
Gigathlon_CZ_2017Gigathlon_CZ_2017
Gigathlon_CZ_2017
 
Apresentação Endpoint Protector 4
Apresentação Endpoint Protector 4Apresentação Endpoint Protector 4
Apresentação Endpoint Protector 4
 
Arquitectura digital prospectiva
Arquitectura digital prospectiva Arquitectura digital prospectiva
Arquitectura digital prospectiva
 
Alecencio
AlecencioAlecencio
Alecencio
 
SEJARAH BAHASA MELAYU Bab3
SEJARAH BAHASA MELAYU Bab3SEJARAH BAHASA MELAYU Bab3
SEJARAH BAHASA MELAYU Bab3
 
BCG Year Endbook (2012)
BCG Year Endbook (2012)BCG Year Endbook (2012)
BCG Year Endbook (2012)
 
Leeme
LeemeLeeme
Leeme
 
1 Tebrau designer suite slides kenneth
1 Tebrau designer suite slides kenneth1 Tebrau designer suite slides kenneth
1 Tebrau designer suite slides kenneth
 
La zarzamora
La zarzamoraLa zarzamora
La zarzamora
 
Desarrollo android almacenamiento de datos
Desarrollo android    almacenamiento de datosDesarrollo android    almacenamiento de datos
Desarrollo android almacenamiento de datos
 
Tema 1 : GENÈTICA
Tema 1 : GENÈTICATema 1 : GENÈTICA
Tema 1 : GENÈTICA
 
Plan eCommerce Hotelero. 1ª Parte: Diagnóstico
Plan eCommerce Hotelero. 1ª Parte: DiagnósticoPlan eCommerce Hotelero. 1ª Parte: Diagnóstico
Plan eCommerce Hotelero. 1ª Parte: Diagnóstico
 
Docentes y contenidos 1ª ed. Postgrado Gestión Empresarial de la Música
Docentes y contenidos 1ª ed.  Postgrado Gestión Empresarial de la MúsicaDocentes y contenidos 1ª ed.  Postgrado Gestión Empresarial de la Música
Docentes y contenidos 1ª ed. Postgrado Gestión Empresarial de la Música
 
creating a trading zone around twitter srchives. case study: paris attacks
creating a trading zone around twitter srchives. case study: paris attackscreating a trading zone around twitter srchives. case study: paris attacks
creating a trading zone around twitter srchives. case study: paris attacks
 
Parque Nacional de Kenai (Alaska)
Parque Nacional de Kenai (Alaska)Parque Nacional de Kenai (Alaska)
Parque Nacional de Kenai (Alaska)
 
Vito Gamberale, F2i - Oltre la crisi: attori e infrastrutture per lo sviluppo
Vito Gamberale, F2i - Oltre la crisi: attori e infrastrutture per lo sviluppoVito Gamberale, F2i - Oltre la crisi: attori e infrastrutture per lo sviluppo
Vito Gamberale, F2i - Oltre la crisi: attori e infrastrutture per lo sviluppo
 
Ecologia Yaneth Liera 602
Ecologia Yaneth Liera 602Ecologia Yaneth Liera 602
Ecologia Yaneth Liera 602
 
Content Marketing Masterclassing Week NYC with censhare
Content Marketing Masterclassing Week NYC with censhareContent Marketing Masterclassing Week NYC with censhare
Content Marketing Masterclassing Week NYC with censhare
 
Empleados municipales como miembros de Juntas Intermedias de Escrutinio
Empleados municipales como miembros de Juntas Intermedias de EscrutinioEmpleados municipales como miembros de Juntas Intermedias de Escrutinio
Empleados municipales como miembros de Juntas Intermedias de Escrutinio
 

Similar to Find Genome Data for Your Research

Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Fiona Nielsen
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...Fiona Nielsen
 
Data dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data DiscoveryData dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data DiscoveryFiona Nielsen
 
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...GigaScience, BGI Hong Kong
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Susanna-Assunta Sansone
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataARDC
 
Public Databases for Radiomics Research: Current Status and Future Directions
Public Databases for Radiomics Research: Current Status and Future DirectionsPublic Databases for Radiomics Research: Current Status and Future Directions
Public Databases for Radiomics Research: Current Status and Future DirectionsCancerImagingInforma
 
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiaoIRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiaoIRIDA_community
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhilip Bourne
 
IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology William Hsiao
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Philip Bourne
 
2015 12 ebi_ganley_final
2015 12 ebi_ganley_final2015 12 ebi_ganley_final
2015 12 ebi_ganley_finalEmma Ganley
 
Data management profiles workshop
Data management profiles workshopData management profiles workshop
Data management profiles workshoplindahauck
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data managementrds-wayne-edu
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?Varsha Khodiyar
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Biosample exchanges – the past, the current and the future – how do we make i...
Biosample exchanges – the past, the current and the future – how do we make i...Biosample exchanges – the past, the current and the future – how do we make i...
Biosample exchanges – the past, the current and the future – how do we make i...Pistoia Alliance
 

Similar to Find Genome Data for Your Research (20)

Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 
Data dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data DiscoveryData dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data Discovery
 
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
Laurie Goodman at #aibsdata: Beyond Data Release Mandates - Helping Authors M...
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
 
Public Databases for Radiomics Research: Current Status and Future Directions
Public Databases for Radiomics Research: Current Status and Future DirectionsPublic Databases for Radiomics Research: Current Status and Future Directions
Public Databases for Radiomics Research: Current Status and Future Directions
 
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiaoIRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology IRIDA: Canada’s federated platform for genomic epidemiology
IRIDA: Canada’s federated platform for genomic epidemiology
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
Biosb2017_Repositive
Biosb2017_RepositiveBiosb2017_Repositive
Biosb2017_Repositive
 
2015 12 ebi_ganley_final
2015 12 ebi_ganley_final2015 12 ebi_ganley_final
2015 12 ebi_ganley_final
 
Data management profiles workshop
Data management profiles workshopData management profiles workshop
Data management profiles workshop
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Biosample exchanges – the past, the current and the future – how do we make i...
Biosample exchanges – the past, the current and the future – how do we make i...Biosample exchanges – the past, the current and the future – how do we make i...
Biosample exchanges – the past, the current and the future – how do we make i...
 

More from Fiona Nielsen

EICT Summer School August 2023 - Things I never knew I never knew - about bu...
EICT Summer School August 2023 - Things I never knew  I never knew - about bu...EICT Summer School August 2023 - Things I never knew  I never knew - about bu...
EICT Summer School August 2023 - Things I never knew I never knew - about bu...Fiona Nielsen
 
Challenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
Challenges with pre-clinical studies in immuno oncology - by Fiona NielsenChallenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
Challenges with pre-clinical studies in immuno oncology - by Fiona NielsenFiona Nielsen
 
AIDR2019 - standards - tools - incentives - what does it take to enable data ...
AIDR2019 - standards - tools - incentives - what does it take to enable data ...AIDR2019 - standards - tools - incentives - what does it take to enable data ...
AIDR2019 - standards - tools - incentives - what does it take to enable data ...Fiona Nielsen
 
Genomics for the public is coming - are you ready or not?
Genomics for the public is coming - are you ready or not?Genomics for the public is coming - are you ready or not?
Genomics for the public is coming - are you ready or not?Fiona Nielsen
 
Investing in innovation for genomic medicine - sept 5 2017
Investing in innovation for genomic medicine - sept 5 2017Investing in innovation for genomic medicine - sept 5 2017
Investing in innovation for genomic medicine - sept 5 2017Fiona Nielsen
 
Investing in innovation for genomic medicine - the journey of Repositive
Investing in innovation for genomic medicine - the journey of RepositiveInvesting in innovation for genomic medicine - the journey of Repositive
Investing in innovation for genomic medicine - the journey of RepositiveFiona Nielsen
 
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016Fiona Nielsen
 
ICG-11 - genomic data projects around the world - nov 5 2016
ICG-11 - genomic data projects around the world - nov 5 2016ICG-11 - genomic data projects around the world - nov 5 2016
ICG-11 - genomic data projects around the world - nov 5 2016Fiona Nielsen
 
Genome sharing projects around the world - Open Access is not enough
Genome sharing projects around the world - Open Access is not enough Genome sharing projects around the world - Open Access is not enough
Genome sharing projects around the world - Open Access is not enough Fiona Nielsen
 
From Bioinformatics Scientist to Entrepreneur
From Bioinformatics Scientist to EntrepreneurFrom Bioinformatics Scientist to Entrepreneur
From Bioinformatics Scientist to EntrepreneurFiona Nielsen
 
Session 3 - big (biomedical) data
Session 3 - big (biomedical) dataSession 3 - big (biomedical) data
Session 3 - big (biomedical) dataFiona Nielsen
 
The need to redefine genomic data sharing - moving towards Open Science Oct ...
The need to redefine genomic data sharing - moving towards Open Science  Oct ...The need to redefine genomic data sharing - moving towards Open Science  Oct ...
The need to redefine genomic data sharing - moving towards Open Science Oct ...Fiona Nielsen
 
DNAdigest Eagle Genomics Symposium March 27, 2014
DNAdigest Eagle Genomics Symposium March 27, 2014DNAdigest Eagle Genomics Symposium March 27, 2014
DNAdigest Eagle Genomics Symposium March 27, 2014Fiona Nielsen
 

More from Fiona Nielsen (13)

EICT Summer School August 2023 - Things I never knew I never knew - about bu...
EICT Summer School August 2023 - Things I never knew  I never knew - about bu...EICT Summer School August 2023 - Things I never knew  I never knew - about bu...
EICT Summer School August 2023 - Things I never knew I never knew - about bu...
 
Challenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
Challenges with pre-clinical studies in immuno oncology - by Fiona NielsenChallenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
Challenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
 
AIDR2019 - standards - tools - incentives - what does it take to enable data ...
AIDR2019 - standards - tools - incentives - what does it take to enable data ...AIDR2019 - standards - tools - incentives - what does it take to enable data ...
AIDR2019 - standards - tools - incentives - what does it take to enable data ...
 
Genomics for the public is coming - are you ready or not?
Genomics for the public is coming - are you ready or not?Genomics for the public is coming - are you ready or not?
Genomics for the public is coming - are you ready or not?
 
Investing in innovation for genomic medicine - sept 5 2017
Investing in innovation for genomic medicine - sept 5 2017Investing in innovation for genomic medicine - sept 5 2017
Investing in innovation for genomic medicine - sept 5 2017
 
Investing in innovation for genomic medicine - the journey of Repositive
Investing in innovation for genomic medicine - the journey of RepositiveInvesting in innovation for genomic medicine - the journey of Repositive
Investing in innovation for genomic medicine - the journey of Repositive
 
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
 
ICG-11 - genomic data projects around the world - nov 5 2016
ICG-11 - genomic data projects around the world - nov 5 2016ICG-11 - genomic data projects around the world - nov 5 2016
ICG-11 - genomic data projects around the world - nov 5 2016
 
Genome sharing projects around the world - Open Access is not enough
Genome sharing projects around the world - Open Access is not enough Genome sharing projects around the world - Open Access is not enough
Genome sharing projects around the world - Open Access is not enough
 
From Bioinformatics Scientist to Entrepreneur
From Bioinformatics Scientist to EntrepreneurFrom Bioinformatics Scientist to Entrepreneur
From Bioinformatics Scientist to Entrepreneur
 
Session 3 - big (biomedical) data
Session 3 - big (biomedical) dataSession 3 - big (biomedical) data
Session 3 - big (biomedical) data
 
The need to redefine genomic data sharing - moving towards Open Science Oct ...
The need to redefine genomic data sharing - moving towards Open Science  Oct ...The need to redefine genomic data sharing - moving towards Open Science  Oct ...
The need to redefine genomic data sharing - moving towards Open Science Oct ...
 
DNAdigest Eagle Genomics Symposium March 27, 2014
DNAdigest Eagle Genomics Symposium March 27, 2014DNAdigest Eagle Genomics Symposium March 27, 2014
DNAdigest Eagle Genomics Symposium March 27, 2014
 

Recently uploaded

Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfAtiaGohar1
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosZachary Labe
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...HafsaHussainp
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfSubhamKumar3239
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 

Recently uploaded (20)

Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenarios
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdf
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 

Find Genome Data for Your Research

  • 1. Genome sharing projects around the world – and how you find data for your research Cambridge, April 26 2016 Slides will be made available online  Tweets welcome #CamFindData
  • 2. We are on twitter: @glyn_dk @repositiveio @DNAdigest @CamOpenData Cambridge, April 26 2016 Slides will be made available online  Tweets welcome #CamFindData
  • 3. 1. What data are you looking for? And Why? 2. Data resources from around the world 3. Tips on how to find and access data 4. Hands-on using Repositive 5. Summary and feedback Workshop outline
  • 4. 1. What data are you looking for? This workshop will focus on finding and accessing human genomic data. … And why would you be looking for genomic data for your research? Are you researching cancer or genetic diseases?
  • 5. How much data do you need to publish a paper? 2001: 1 human genome 2012: 1000 Genomes (1092 genomes, since increased to ~2500) 2015: UK10K, Icelandic population (2,636 + 100k imputed), Cancer genome atlas ~11,000 genomes Exac consortium 65,000 exomes ?
  • 6. Statistically speaking, you still need 10s of thousands of samples for validation The more severe the phenotype and the more complete penetrance, the easier it will be for you to find your variant, but “As the genetic complexity of the disease increases (for example, reduced penetrance and increased locus heterogeneity), issues of statistical power quickly become paramount.” http://www.nature.com/nrg/journal/v15/n5/full/nrg3706.html But I am just looking at this one disease…
  • 7. What can I do? PRO TIP: involve a statistician early on in your study design!
  • 8. How can I determine significance? “One potentially powerful approach is to assess conservation across and within multiple species as whole-genome sequence data become more abundant.” Look at extreme phenotypes “Sampling cases or controls from the extremes of an appropriate quantitative distribution can often increase power” Look at non-SNP variants, they are more likely to have functional effects - “how to account for the technical features of sequencing, such as incomplete sequencing and biased coverage over the genome?”
  • 9. Think of how you can provide evidence that your result is not just a local technical variation or sampling bias e.g. data from same cell type, same seq technology, same alignment… How to account for bias? PRO TIP: include more reference data in your analysis
  • 10. • Know what data is available in your lab, your dept, your org • Survey from Qiagen showed that one of the main reasons researchers collaborate is to get access to data! How can I access more data for my research?
  • 11. How can I find collaborators? PRO TIP: Search for collaborators who have the data you need PRO TIP: Tell your colleagues and peers what type of data you have in your lab
  • 12. 2. Data resources from around the world Public repositories • some you apply for access, especially if data contains clinical info or whole genome PID • some are open access: GEO, SRA, PGP, OpenSNP, GigaDB, … • some are consented for general research use, some have specific consent
  • 13. 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Large amounts of data, but not accessible ≈ .5PB Sequence available 80+PB Sequenced every year WGS data available in public repos Exponential growth rate Under-utilised data has huge potential for medical research
  • 15. It may be confusing
  • 16. Hundreds of data sources …but they aren’t easy to find! 10 25 33 35 102 163 0 20 40 60 80 100 120 140 160 180 200 Jan-15 Mar-15 Jun-15 Sep-15 Dec-15 Mar-16 http://dx.doi.org/10.1371/journal.pbio.1002418First 30 data sources listed here:
  • 17. Data source content Assay Types Dedicated to…
  • 18. Number of samples in Data sources 1 10 100 1000 10000 100000 1000000 Sample#(Log10) Top 5: GEO (1.8M) PMI Cohort Program (1M) Auria Biopankki (1M) EGA (~0.6M) SRA (~0.5M)
  • 19. Data accessibility Can download the data straight away or after logging in. Need to apply for access to the data. Has both Open and Restricted access data within one repository.
  • 20. Online Data source ’types’ University – Affiliated to a university. Often only members of that university can upload/download to/from it. Catalogue – doesn’t have raw data but lists studies/datasets. Initiative/Consortium – Has a specific purpose/aim. Often focussed on a question or disease. Repository – Can download from, has data from multiple institutions. Often can also upload your own data there. Company – For profit organisation. Listing data is not their main purpose. Biobank – many have sequence data of their biological samples.
  • 21. Sequenced ethnicities Aboriginals African Americans Africans Australians Chinese Malays Indians Danish Dutch Estonian Russian European Ancestry Finnish Icelandic Japanese Korean Latin Americans Saudi Swedish
  • 22. Machines & Data sources 947 5600 88 660 26 68 50 62 3 25 0 0 23 International Interesting site to look at: http://omicsmaps.com/stats
  • 23. Main Repository funders BGI = 4 EBI = 9NIH = 10 NCBI = 9 The Broad = 8 Wellcome = 4 EBI total 104 services, 19 repositories http://www.ebi.ac.uk/services/all NCBI total 67 databases http://www.ncbi.nlm.nih.gov/guide/all/#databases_
  • 24. Biobanks as data sources - Biobanks are potential sources of genomic data - Most biobanks contain large collections of samples (thousands) - Some biobanks also contain data related to these samples - A fraction of this data is genomic data (usually genotyping) - Several biobanks (e.g. ToMMo biobank in Japan, UK biobank) have sequencing programs - Many biobanks do not consider sequencing as their priority but are willing to give their samples to researchers who would like to sequence them - Most biobanks are supposed to share their samples with bona fide researchers (exception – commercial biobanks, e.g. Abcodia) - In most cases, the best thing is to ask them directly whether they have samples/data that you need!
  • 25. Name: UK Biobank Type of data: genotyping URL: http://biobank.ctsu.ox.ac.uk/crystal/gsearch.cgi UK Biobank Name: ToMMo Biobank Type of data: genotyping, WGS URL: https://ijgvd.megabank.tohoku.ac.jp/ Name: Diabetes Biobank Brussels Type of data: data (including genomic; not specified) and clinical samples on >20.000 diabetic patients and their first degree relatives. URL: http://www.diabetesbiobank.org/ Name: Dutch biobanks (dozens of them!) Type of data: multiple URL: http://bit.ly/1XxPA6W Name: Auria Biobank Finland Type of data: There are roughly one million human biological samples stored in Auria Biobank, a considerable proportion of which are cancer samples. At the moment, there is only the catalogue of samples, no catalogue of data. In case a researcher needs to know what kind of data we have, he/she needs to contact us. URL: https://www.auriabiopankki.fi/?lang=en
  • 26. More information about data sources … in our recent paper: http://tinyurl.com/plos-biology-repositive
  • 27. • Case study: DNA data on Cancer 3. Tips to find and access data
  • 28. Case Study – DNA data on Cancer Repositories you have heard of: Ask around (word of mouth): Repository Data Type Access ArrayExpress Expression Open GEO Espression Open EGA Mixed Restricted dbGaP Mixed Restricted Encode Healthy Reference Open 1000 Genomes Healthy Reference Open Repository Data Type Access COSMIC Somatic mutations & WGS Open ClinVar Variant information Open ExAC Allele Freq. but not raw data Open SRA Individual sequences Open TCGA Clinical & high level data Open CGHub Low level data (DNA data) Restricted
  • 29. Case Study – DNA data on Cancer We have identified the first 27 cancer specific data sources  And many more that contain cancer data alongside other data types. Abcodia AmbryShare BRCA Exchange Breast Cancer Now Tissue Bank Broad Cancer programme datasets Cancer Moonshot 2020 CanGEM CGCI CGHub Chinese cancer genome consortium Chinese national human genome centre Follicular Lymphoma Genome Data G-DOC GenoMel ICGC National Mesothelioma Virtual Bank NCIP Hub Project GENIE Target TCGA Texa cancer research biobank NCI-60 CCLE COSMIC Fantom cancer methylome system Cancer therepeutics response portal
  • 30. 1. Register for eRA account 2. Request access to specific dataset of interest 3. Download data Registering for CGHub https://cghub.ucsc.edu/keyfile/newuser.html ‘Principle signing official’ registers Email to verify Email to confirm/deny access to website Email with temporary password Change password Electronic signature Login Fill in contact info, Complete ‘424’ form (research application form) Request reviewed by DAC Email to confirm/deny access to data Login Retrieve personal access token Download! 
  • 31. Often a long process Bottlenecks: • Finding relevant and usable data • Getting authorisation to access data • Formatting data • Storing and moving data We studied the problem by qualitative interviews followed by a survey of researchers in human genetics
  • 32. Often a long process T. A. van Schaik et al The need to redefine genomic data sharing: a focus on data accessibility, Applied & Translational Genomics, 2014 10.1016/j.atg.2014.09.013 Researchers spend months to find and access genomic data, and often choose to not access data at all
  • 34. Why the barrier? • Benefits: strict governance, review of consent, applicant signs for full responsibility for governance • Disadvantages: No control of data once access is given, high barrier for access – too high?
  • 35. • Start planning your data needs early in your project • When you find the data you need, start application • Use Open Access data How can I save time? PRO Tip: If you use human genomic data, apply for the GRU datasets in dbGaP, one application – access to all the GRU datasets
  • 36. • Some data is Open Access  requires specific consent • OpenSNP.org (Bastian) • Personal Genomes Projects • Individuals who put their genomes online, e.g. Manuel Corpas and his family “the Corpasome” • http://manuelcorpas.com/about/ Not all data is restricted
  • 37. • Some data is Open Access  requires specific consent • Individuals who put their genomes online, e.g. Manuel Corpas and his family “the Corpasome” • http://manuelcorpas.com/about/ • OpenSNP.org • Personal Genomes Projects Not all data is restricted
  • 38. Personal Genome Project PGP Harvard PGP Canada PGP UK Genom Austria Host institution Harvard Medical School Boston SickKids Toronto University College London CeMM Research Center for Molecular Medicine Principal Investigator George Church Steven Scherer Stephan Beck Christoph Bock & Giulio Superti-Furga Launch year 2005 2012 2013 2014 Geographic scope USA, mainly Boston Canada United Kingdom Mainly Austria Enrollment eligibility At least 18 years old, able to make an informed decision, perfect score in the PGP enrollment exam, certain vulnerable groups excluded Data Generated Whole genome sequencing, upload of additional data possible Mainly whole genome sequencing Whole genome sequencing, DNA methylome sequencing, RNA transcriptome sequencing Mainly whole genome sequencing Number of genomes 100s 10s 10s 10s Data access http://personalgenomes.org/harvard/data http://genomaustria.at/unser- genom/#genome-der- pionierinnen Project funding Discretional funds and corporate sponsoring Institutional startup funds Discretional funds and corporate sponsoring Institutional startup funds Areas of emphasis Integration with phenotypic data, collaboration with other personal omics initiatives Genome donations, synergy with massive-scale clinical genome sequencing projects Genomes and society, genetic literacy, school projects, education Website http://personalgenomes.org/harvard/ http://personalgenomes.org/canada/ http://personalgenomes.org/uk/ http://genomaustria.at/
  • 39. Summary of data access barriers Data is uploaded to repository Data is discovered by potential user Data is accessed by potential user
  • 40. • “even when researchers are authorised to share data they report reluctance to do so because of the amount of effort required“ http://www.sciencedirect.com/science/article/pii/S2212066114000386 • “Clinical geneticists cited a lack of time because their main priority is diagnosing patients. Industrial researchers cited a lack of time because of the pressure to meet the deadlines in their job. Researchers in academia cited both a concern about the potential loss of future publications once unpublished data is shared, and the lack of time and incentive to share data as this does not contribute to their publication record. Researchers from all categories felt that they lacked sufficient resources to make their data available.” The barrier of making data available But I do not want to share my data
  • 41. • If you expect data to be available to you – you have to make your data available too! • Encourage collaborations: power by numbers 1. Get credit – publish and make your data available 2. Give credit – cite data sources 3. Understand consent – for all uses of clinical data Best practices
  • 42. • Use all available tools to make your life easier: • Data publications  visibility and citations for your data, e.g. GigaScience and Scientific Data • Figshare, Zenodo, Dryad for sharing open access data • PhenomeCentral, Matchmaker exchange for rare disease research • Repositive for finding data across repositories and make your own data discoverable Best practices: use the tools
  • 43. Does data sharing matter at grant proposal evaluation Based on: Winning Horizon 2020 with Open Science, http://dx.doi.org/10.5281/zenodo.12247 Best practices: Plan into your grant proposals
  • 44. “Weakness: Involvement of non- academic beneficiaries is limited” “Weakness: highly focused on academic activities, and lacks an advanced communication strategy” “Weakness: limited exposure to non-academic partners & infrastructures” Excellence Impact Implementation “data accessibility is unclear!” “data storage & access not considered” Best practices: Plan into your grant proposals
  • 45. “Strengths: extensive dissemination of data to the scientific community (open access, databases)” “outreach activities to a broad audience” “research software is freely available” Impact: Best practices: Plan into your grant proposals
  • 46. Best practices: Plan into your grant proposals
  • 47. Make the (research) world a better place by sharing in return  Best practices: Share in return!
  • 48. • Digital consent: towards automatic processing of applications • Dynamic consent and power to the patient, e.g. PatientsKnowBest • Privacy-preserving access to datasets: preserving control and governance with data custodian, lower barrier for access What the future holds
  • 49. 4. Hands-on session using Repositive What if finding data was as easy as finding a book on Amazon, book a hotel on Expedia?
  • 50. Repositive promotes best practices Discover new data sources EASY SEARCH
  • 51. Repositive promotes best practices Make your data visible SHARE KNOWLEDGE
  • 52. Repositive promotes best practices Build a data community BUILD TRUST
  • 53. Benefit for both sides of data collaboration Data consumers Data producers Find relevant data faster Feedback from other users through ratings and comments to evaluate data quality Find collaborators with data Make your data visible Build credibility as a trusted provider of quality data Find collaborators to analyse your data
  • 55. 5. Summary and feedback • Get credit – publish data • Give credit – cite data • Understand consent
  • 56. Tell us your thoughts: @repositiveio @glyn_dk And read more on http://repositive.io Bugs and feedback to: Charlotte at Repositive.io

Editor's Notes

  1. Because interpretation requires LOTS of data And although data exists around the world, it is siloed, and even if available, it is not accessible This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer She needs data from other patients to compare and interpret Mabels DNA She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and data governance, e.g. vetting of users
  2. It has been shown that the combination of summary single-variant statistics from multiple data sets, rather than the joint analysis of a combined data set, does not result in an appreciable loss of information85, and that taking into account heterogeneity in effect size across studies can improve statistical power
  3. “Although they are harder to call and annotate, insertion or deletions, multinucleotide variants and structural variants (including copy-number variants, translocations and inversions) constitute a smaller set of variation (in terms of the number of discrete events an individual is expected to carry) relative to all SNVs and are more likely to have functional effects.”
  4. It has been shown that the combination of summary single-variant statistics from multiple data sets, rather than the joint analysis of a combined data set, does not result in an appreciable loss of information85, and that taking into account heterogeneity in effect size across studies can improve statistical power
  5. Because interpretation requires LOTS of data And although data exists around the world, it is siloed, and even if available, it is not accessible This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer She needs data from other patients to compare and interpret Mabels DNA She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and data governance, e.g. vetting of users
  6. Population scale genome sequencing projects have been launched all over the world More than 80PB of human genomic data is being sequenced Every year BUT To date only around .5PB of data available in public repositories
  7. Further confounded by the data being highly fragmented. Siloed in repositories and institutions around the world.
  8. There are many public repositories, but It can be hugely confusing to know where to look for the right kind of data
  9. Public repositories: default is apply for access -> full access Benefits: strict governance, review of consent, applicant signs for full responsibility for governance Disadvantages: No control of data once access is given, high barrier for access – too high? (researchers giving up, even patients can’t get access to their own data)
  10. ODP trained, EURO-BASIN manager, – a boring title, for a diverse job, in an exciting research domain. DIP into EACH step of the research cycle, from proposal formulation to providing the best return-on-investment to the funders. So I`d like to share with you some experiences from the last few years of OS advocacy in the Marine Science Community
  11. Excellence at your Research Subject is … excellent, but is it ENOUGH ? To be successful, a candidate will be judged on being complete. MESSAGE: FOSUC only on IF could expose you to risk
  12. ODP trained, EURO-BASIN manager, – a boring title, for a diverse job, in an exciting research domain. DIP into EACH step of the research cycle, from proposal formulation to providing the best return-on-investment to the funders. So I`d like to share with you some experiences from the last few years of OS advocacy in the Marine Science Community
  13. Our mission is to speed up research and diagnostics for genetic diseases by enabling efficient and ethical access to genomic research data
  14. Our mission is to speed up research and diagnostics for genetic diseases by enabling efficient and ethical access to genomic research data