CIBERER Exome Server (CES) The server of the Spanish Population Variability 
Joaquín Dopazo, PhD 
Department of Computatio...
Why is interesting to have a Spanish Exome Variant repository 
Rationale: Local variability is more important than previou...
Pipeline of data analysis 
Primary processing 
Initial QC 
FASTQ file 
Mapping 
BAM file 
Variant calling 
VCF File 
Knowl...
Use known variants and their population frequencies to filter out. 
•Typically dbSNP, 1000 genomes and the 6515 exomes fro...
Filtering with or without local variants 
Number of genes as a function of individuals in the study of a dominant disease ...
What do we know about the Spanish population Variability?
Using CIBERER families to create a first version of the database of local variability of Spanish population 
•In each fami...
Samples used 
UNIT 
n 
% 
U723 
12 
16 
U737 
11 
14,7 
U759 
2 
2,7 
U705 
10 
13,3 
U720 
12 
16 
U732 
1 
1,3 
U755 
3 ...
Variability spectrum of the 
Spanish population 
A total of 131.897 variant positions, unique in Spanish population, were ...
The CIBERER Exome Server (CES): the first repository of variability of the Spanish population 
Only another similar initia...
Information provided 
Genotypes in the different reference populations 
Genomic coordinates, variation, and gene. 
SNPid i...
Information provided 
PolyPhen and SIFT patogenicity indexes 
Phenotyphe, if available
Variants can also be seen in their genomic context 
GenomeMaps viewer (Medina et al., 2013, NAR) embedded in the applicati...
Occurrence of pathological variants in “normal” population 
Reference genome is mutated 
Nine carriers in 1000 genomes 
On...
Current usage options 
Query 
Configuration of the display 
Genomic context
Spanish variability database. FAQ 
What is stored in the database? 
ONLY frequencies of the genotypes observed in the posi...
Spanish variability database. FAQ 
Who can contribute? 
Anyone (especially if you are sequencing with public resources) 
W...
What’s next? 
•Strategic steps: 
–Populating the database with contributions of CIBERER and externals. Future project SPAN...
Table of Spanish Frequencies 
(TSF) 
DB of Spanish variants (DBSV) 
Chr 
Position 
Ref 
Alt 
0/0 
0/1 
1/1 
1 
1365313 
A ...
Future of the Database of variation in Spanish population 
CIBERER contributions 
SPANEx contributions
CIBERER 
76 samples 
Unaffected 
CES II 
76+269+X 
Mixed 
MGP 
269 samples 
Healthy controls 
Phase I Phase II Phase III 
...
Future utilization. Access via webservices 
Access to aggregated data of variation and genotype frequencies. Therefore, no...
NA19660 NA19661 
NA19600 NA19685 
BiERapp: the interactive filtering tool for easy candidate prioritization 
http://bierap...
Panel (real or virtual) manager 
Tool for defining panels 
New filter based on local population variant frequencies 
If no...
Take home message 
•Local variability is critical for distinguishing real pathologic variants from local polymorphisms 
•C...
The Computational Genomics Department at the Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain, and… 
...the...
Upcoming SlideShare
Loading in …5
×

The server of the Spanish Population Variability

692 views

Published on

DNA Day
Hospital Universitario La Paz, Madrid, Spain April 28th, 2014

The first server of the Spanish Population Variability.
Freely available: http://ciberer.es/bier/exome-server/
See alse related tools:
BiERapp: http://bierapp.babelomics.org (to help in the prioritization of disease genes)
TEAM: http://team.babelomics.org (to manage panels of genes for targeter resequencing based diagnostic)

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
692
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The server of the Spanish Population Variability

  1. 1. CIBERER Exome Server (CES) The server of the Spanish Population Variability Joaquín Dopazo, PhD Department of Computational Genomics, CIPF, Valencia Hospital Universitario La Paz, Madrid 28 de abril, 2014
  2. 2. Why is interesting to have a Spanish Exome Variant repository Rationale: Local variability is more important than previously thought. The existence of numerous local rare variants, many of them (apparently) deleterious hampers the prioritization of disease variants. Data recycling: CIBERER has accumulated a large number of samples that can be used as (pseudo)controls of normal population
  3. 3. Pipeline of data analysis Primary processing Initial QC FASTQ file Mapping BAM file Variant calling VCF File Knowledge-based prioritization Proximity to other known disease genes Functional proximity Network proximity Burden tests Other prioritization methods Secondary analysis (Successive filtering) Variant annotation Filtering by effect Filtering by MAF Filtering by family segregation Primary analysis Gene prioritization 1000 genomes EVS Local variants
  4. 4. Use known variants and their population frequencies to filter out. •Typically dbSNP, 1000 genomes and the 6515 exomes from the ESP are used as sources of population frequencies. •We selected 75 local controls to add and extra filtering step to the analysis pipeline Novembre et al., 2008. Genes mirror geography within Europe. Nature Comparison of Spanish controls to 1000g How important do you think is local information to detect disease genes?
  5. 5. Filtering with or without local variants Number of genes as a function of individuals in the study of a dominant disease Retinitis Pigmentosa autosomal dominant The use of local variants makes an enormous difference
  6. 6. What do we know about the Spanish population Variability?
  7. 7. Using CIBERER families to create a first version of the database of local variability of Spanish population •In each family we select two unrelated members (preferably the parents) •If there are no parents, then one of the unaffected children (unaffected, if possible) are selected •A total of 75, out of the 136 samples available among the families analyzed in the BiER, were initially selected. •Variant files (VCF) were obtained following the same pipeline (with missing values included) and merged. •Genotype proportions and MAFs were obtained for all the variable positions. ONLY this information is used in the web server.
  8. 8. Samples used UNIT n % U723 12 16 U737 11 14,7 U759 2 2,7 U705 10 13,3 U720 12 16 U732 1 1,3 U755 3 4 U746 9 12 U728 2 2,7 U729 3 4 U703 7 9,3 U718 1 1,3 U730 2 2,7 Total 75 100 DISEASE n % 3-Methylglutaconic aciduria 11 14,7 Atypical fracture 4 5,3 Autosomal DOMINANT non-syndromic hearing loss 1 1,3 Autosomal RECESSIVE non-syndromic hearing loss 1 1,3 BCKDK-deficiency disease 2 2,7 CMT 1 1,3 Congenital disorder of glycosylation types I and II 8 10,7 CoQ disease 3 4,0 CoQ10 deficiency and DNA depletion 3 4,0 CoQ10 deficiency 2 2,7 Inherited Metabolic Disease 2 2,7 MMD (Multiple deletion of mitochondrial DNA) 4 5,3 MSUD (Maple Syrup Urine Disease) 1 1,3 Opitz 8 10,7 Pelizaeus-like 2 2,7 RCD (Respiratory complexes deficiency) 8 10,7 Retinitis pigmentosa 11 14,7 Usher 3 4,0 Total 75 100,0 Gender Man Woman Phenotype Affected Healthy
  9. 9. Variability spectrum of the Spanish population A total of 131.897 variant positions, unique in Spanish population, were detected in all the 75 samples together. Approximately 90.000 were singletons. 51.295 variants are non-synonymous changes and 18.450 correspond to synonymous changes (singleton-driven pattern, opposite to variants shared with 1000g and EVS, from polymorphic positions).
  10. 10. The CIBERER Exome Server (CES): the first repository of variability of the Spanish population Only another similar initiative exists: the GoNL http://www.nlgenome.nl/ http://ciberer.es/bier/exome-server/
  11. 11. Information provided Genotypes in the different reference populations Genomic coordinates, variation, and gene. SNPid if any
  12. 12. Information provided PolyPhen and SIFT patogenicity indexes Phenotyphe, if available
  13. 13. Variants can also be seen in their genomic context GenomeMaps viewer (Medina et al., 2013, NAR) embedded in the application. GenomeMaps is the official genome viewer of the ICGC (http://dcc.icgc.org/)
  14. 14. Occurrence of pathological variants in “normal” population Reference genome is mutated Nine carriers in 1000 genomes One affect and 73 carriers in EVS
  15. 15. Current usage options Query Configuration of the display Genomic context
  16. 16. Spanish variability database. FAQ What is stored in the database? ONLY frequencies of the genotypes observed in the positions in which variants have been found in at least one individual. This information is obtained from Spanish unrelated individuals. What information is provided by the database? Aggregated information on the genotype frequencies of the variable position in the gene(s) requested. Is possible to know that a particular individual is stored in the database? No, unless you sequence the individual and check if the genotype frequencies are compatible with the database, but seems stupid because you already have the information pursued. Lets imagine that I am stupid and managed to know that the individual is in the database, can I retrieve her/his genome? No, it is impossible from the aggregated information
  17. 17. Spanish variability database. FAQ Who can contribute? Anyone (especially if you are sequencing with public resources) What do you need to submit? Anonymized files of variants (VCF: variant calling format) Why VCFs? Because we need to check that your contribution contains no relatives of the individuals in the database
  18. 18. What’s next? •Strategic steps: –Populating the database with contributions of CIBERER and externals. Future project SPANEx –Opening the database •Technical steps: –Automatic access to the local variability data via webservices –Use in gene discovery pipelines –Use for the interpretation of incidental findings in diagnostic panels
  19. 19. Table of Spanish Frequencies (TSF) DB of Spanish variants (DBSV) Chr Position Ref Alt 0/0 0/1 1/1 1 1365313 A T 75 0 0 1 1484884 G A 70 4 1 2 326252 T C 25 35 15 CES use Other countries CES input External Unrelated? (DBSV) VCFs Spanish? (TSF) YES YES NO NO Counts Internal Regional
  20. 20. Future of the Database of variation in Spanish population CIBERER contributions SPANEx contributions
  21. 21. CIBERER 76 samples Unaffected CES II 76+269+X Mixed MGP 269 samples Healthy controls Phase I Phase II Phase III CES II 1000+76+269+X Mixed More CIBERER samples SPANEX: 1000 exomes CIBERER CIBERER exome server roadmap 2014-June 2014 2015
  22. 22. Future utilization. Access via webservices Access to aggregated data of variation and genotype frequencies. Therefore, no confidentiality or privacy issues associated. Spanish variation database CellBase. (Bleda et al., 2012. NAR) Our data server system. Now at the EBI
  23. 23. NA19660 NA19661 NA19600 NA19685 BiERapp: the interactive filtering tool for easy candidate prioritization http://bierapp.babelomics.org
  24. 24. Panel (real or virtual) manager Tool for defining panels New filter based on local population variant frequencies If no diagnostic variants appear, then secondary findings can be studied Diagnostic mutations http://team.babelomics.org
  25. 25. Take home message •Local variability is critical for distinguishing real pathologic variants from local polymorphisms •CES will be populated with the SPANEX project (M.A. Moreno talk) •CES is the starting point of a more ambitious crowdsourcing project that aims at constructing a high-resolution map of the Spanish population variation •Contributions to CES are compliant with confidentially issues. No patient information is shared, only statistical information.
  26. 26. The Computational Genomics Department at the Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain, and… ...the INB, National Institute of Bioinformatics (Functional Genomics Node) and the CIBERER Network of Centers for Rare Diseases, and… ...the Medical Genome Project (Sevilla) @xdopazo @bioinfocipf

×