Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to human population genetics: from pathways to genotype networks

18,196 views

Published on

This is the presentation of my PhD thesis defence. It describes two applications of network theory to improve the methods to understand genetic adaptation in the human genome.

Published in: Health & Medicine, Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to human population genetics: from pathways to genotype networks

  1. 1. Applications of network theory to human population genetics: from pathways to genotype networks Giovanni Marco Dall'Olio Pompeu Fabra University, Barcelona Advisors: Jaume Bertranpetit and Hafid Laayouni
  2. 2. Acknowledgments ● I would like to thank: – My PhD supervisors, Jaume Bertranpetit and Hafid Laayouni – My committee: Dr. Mauro Santos, Dr. Ricard Solé, Prof. Guido Barbujani, Dr. Ferran Casals, Dra. Yolanda Espinosa – The Evolutionary Systems Biology group at UPF – The Institut of Biologia Evolutiva 2
  3. 3. Topics ● Context and motivations ● My research: – – Pathway approach on the N-Glycosylation pathway – The Genotype Network Approach – ● Annotating the N-Glycosylation pathway The Human Selection Browser and Biostar Conclusions 3
  4. 4. Context of the thesis ● ● The first anatomically modern humans appeared about 200,000 years ago How can we understand the signals of genetic adaptation in our genome, since then? 4
  5. 5. Factors that influenced recent human evolution New climates Diseases Agriculture 5
  6. 6. The opportunity ● ● We have access to large datasets of human sequences Better annotations on gene function and role 6
  7. 7. Contributions ● Find applications of network theory to understand genetic adaptation in the human species 7
  8. 8. Applications of network theory ● ● The Pathway approach The Genotype Network approach 8
  9. 9. Topics ● Context and motivations ● My research: – – Pathway approach on the N-Glycosylation pathway – The Genotype Network Approach – ● Annotating the N-Glycosylation pathway The Human Selection Browser and Biostar Conclusions 9
  10. 10. The Pathway approach ● ● Genes are organized in pathways Any eventual selection constraint will be distributed among all the genes of a pathway 10
  11. 11. Distribution of Selection forces in a pathway ● Some positions of the pathway will be more likely to have stronger signals of selection 11
  12. 12. Pathway Approach - outline ● ● ● Build a Network representation of a pathway Execute a test for positive selection on each gene Determine how the signals of selection are distributed on the network 12
  13. 13. Pathway approach on the N-Glycosylation pathway ● ● Asparagine N-Glycosylation is a metabolic pathway for a type of protein modification The structure of this pathway is easy to represent as a network 13
  14. 14. N-glycosylation - upstream part ● ● Produces a single sugar called “N-Glycan precursor” This sugar is required for the proper folding of most membrane proteins 14 Adapted from Stanley, P., Schachter, H., & Taniguchi, N. (2009). N-Glycans. Essentials of Glycobiology.
  15. 15. N-Glycosylation and protein folding ● The product of the upstream part of N-glycosylation is used as a signal to distinguish folded and unfolded proteins Folded protein Un-Folded protein 15
  16. 16. N-glycosylation - downstream part ● ● Complex pathway composed by thousands of reactions Produces multiple glycans, important for cell-to-cell interactions 16 Hossler, P., Mulukutla, B. C., & Hu, W.-S. (2007). Systems analysis of N-glycan processing in mammalian cells. PloS one, 2(1), e713. doi:10.1371/journal.pone.0000713
  17. 17. Glycans on the cell surface ● ● The surface of a cell is similar to a forest of glycosylated proteins Each organism and cell has a specific repertoire of glycans 17 A. Doeer, Glycoproteomics. Nature Methods, 2011. doi:10.1038/nmeth.1821
  18. 18. Annotating the N-Glycosylation pathway ● In order to build a correct network model for the N-Glycosylation pathway, we annotated it first in the Reactome database 18
  19. 19. The N-Glycosylation pathway in Reactome 19
  20. 20. The KEGG entry for N-Glycosylation is incomplete Downstream N-Glycosylation in KEGG Real representation of downstream N-Glycosylation 20
  21. 21. Another error for N-Glycosylation in KEGG 21
  22. 22. Erroneous annotation in String ● There are two genes with the symbol ALG2: – – ● ALG2 (Asparagine Linked Glycosylation 2) ALG-2 (Apoptosis Linked Gene – 2) In String, these two were confused 22
  23. 23. Ambigous interpretation of the term N-Glycosylation in GO N-Glycosylated pathway Merged N-Glycosylated protein 23
  24. 24. Annotating the N-Glycosylation pathway ● Annotated ~100 reactions in Reactome ● Fixed ~50 Gene Ontology terms ● Fixed key errors in String and KEGG 24
  25. 25. Network structure of N-Glycosylation pathway 25
  26. 26. Dataset used ● The CEPH-HGDP 650,000 Illumina chip dataset ● 940 individuals, from 50 human populations 26
  27. 27. Methods used ● ● The FST index → measure of population differentiation The iHS test → identification of signals of recent positive selection 27
  28. 28. FST – Population differentiation ● ● FST is a measure of population differentiation If the FST between two population is 1, it means that the two populations are fixed for different alleles 28
  29. 29. Signatures of population differentiation in the N-Glycosylation pathway FST signals are concentrated in the downstream part, and in the substrates biosynthesis 29
  30. 30. Population Differentiation and network position ● ● Node degree correlates with the distribution of FST signals Genes with high FST are generally more connected 30
  31. 31. IHS and Long range haplotypes ● ● A selective sweep may cause the appearance of long homozygous haplotypes at a high frequency Example: a long homozygous haplotype present in the LCT gene in North-European populations Vitti et al, Trends in genetics, 2012 31
  32. 32. IHS and Long range haplotypes: iHS: Compares the Extended Haplotype Homozygosity decay (EHH decay) between ancestral and derived allele Voight et al., PLoS Genetics 2006 32
  33. 33. Signatures of selection in the N-Glycosylation pathway No difference in the distribution of iHS signals between upstream and downstream 33
  34. 34. Signatures of selection in the N-Glycosylation pathway GCS1: redirects to protein folding quality control MGAT3: redirects to Hybrid Glycans MAN2A1: redirects to Complex Glycans 34
  35. 35. Pathway approach on N-Glycosylation ● There is a difference in the patterns of population differentiation between the two parts of the N-Glycosylation pathway ● Signals of positive selection are more likely on key genes ● One of the few works applying the pathway approach on human genetics 35
  36. 36. Topics ● Context and motivations ● My research: – – Pathway approach on the N-Glycosylation pathway – The Genotype Network Approach – ● Annotating the N-Glycosylation pathway The Human Selection Browser and Biostar Conclusions 36
  37. 37. The Genotype Network approach ● Genotype Networks have been used to study the “innovability” and evolvability of a genetic system 37
  38. 38. The Genotype Network approach ● ● Genotype Networks have been used to study the “innovability” and evolvability of a genetic system Never applied to population genetics data, because they require too much data! 38
  39. 39. Genotype Networks - theory ● John Maynard-Smith: the concept of a Protein Space, which is explored by populations 39
  40. 40. Genotype Networks - theory ● John Maynard-Smith: the concept of a Protein Space, which is explored by populations “if evolution by natural selection is to occur, functional proteins [or DNA sequences] must form a continuous network which can be traversed by unit mutational steps without passing through nonfunctional intermediates” 40
  41. 41. Neutralism and Selectionism ● ● Neutralism: most mutations are neutral or deleterious Selectionism: positive mutations drive evolution 41
  42. 42. Genotype Networks help recoincile Neutralism and Selectionism ● ● Cycles of Neutral evolution, alterned by cycles of Selection Even neutral or negative mutations can beneficial on the long run, because they allow to explore the genotype space 42
  43. 43. The Genotype Network - definitions ● ● The Genotype Space of a region of 5 SNPs can be represented as a network Each node is a possible genotype, and edge connect nodes with only one difference 43
  44. 44. The Genotype Network - definitions ● ● Green nodes are sequences observed in a population This is the Genotype Network of a population 44
  45. 45. Average Path Length of a Genotype Network ● ● This figure represents two populations The yellow one has an higher Average Path Length than the blue one 45
  46. 46. Average Degree ● ● ● ● This population has an high Average Degree It is more robust to mutations This population has a low Average Degree Mutations are more likely to fall outside the Genotype Network 46
  47. 47. Dataset analyzed ● ● 1000genomes data, phase 1 850 individuals genotyped, grouped into three continental groups (AFR, EUR and ASN) 47
  48. 48. The VCF2Space library ● ● ● Suite of Python scripts to calculate Genotype Networks from a VCF file ~400,000 lines of code ~350 unit tests 48
  49. 49. Splitting the genome into windows of 11 SNPs ● ● Less than 11 SNPs -> networks are too small and condensed More than 11 SNPs -> networks are too large and sparse Small network Large network 49
  50. 50. Why windows of 11 SNPs? 50
  51. 51. Genotype Network properties of the human genome http://genome.ucsc.edu/cgi-bin/hgTracks? db=hg19&hubUrl=http://bioevo.upf.edu/~gdallolio/genotype_space/hub.txt 51
  52. 52. Coding & Non-Coding regions ● Coding regions have higher average path length and degree than non coding regions 52
  53. 53. Genotype Networks and Selection (simulated data) Selection Neutral 53
  54. 54. ● ● ● Coding networks: high average path lenght and degree Non coding networks: low average path lenght and degree Recent selection: lower average path lenght and degree 54
  55. 55. Genotype Network: currently under review.. 55
  56. 56. Topics ● Context and motivations ● My research: – – Pathway approach on the N-Glycosylation pathway – The Genotype Network Approach – ● Annotating the N-Glycosylation pathway The Human Selection Browser and Biostar Conclusions 56
  57. 57. Other works: The Human Selection Browser ● We applied 21 tests for positive selection to the 1,000 Genomes dataset – ● FST, CLR, iHS, etc... This dataset will be published and made freely available as a genome browser 57
  58. 58. Other works: Biostar ● An online forum for bioinformatics ● About 150,000 visits per month ● Helped thousands of bioinformaticians! 58
  59. 59. Topics ● Context and motivations ● My research: – – Pathway approach on the N-Glycosylation pathway – The Genotype Network Approach – ● Annotating the N-Glycosylation pathway The Human Selection Browser and Biostar Conclusions 59
  60. 60. Conclusions (I) ● ● ● ● We developed two applications of network theory to the study of human population genetics. We produced a network model of the N-Glycosylation pathway, contributing it to the Reactome database and improving the annotations in other databases. We showed that the downstream part of the N-Glycosylation pathway shows more signatures of genetic differentiation than the upstream part. This is compatible with the role and structure of this part of the pathway. We showed that key genes of the N-Glycosylation pathway, such as GCS1, MGAT3 and MAN2A1, show signatures of recent positive selection in human populations. 60
  61. 61. Conclusions (II) ● ● ● We produced a suite of Python scripts, called VCF2Space, to apply the concept of Genotype Networks to Single Nucleotide Polimorphism data Our genome-wide application of Genotype Networks showed that coding regions tend to have networks with higher average degree and path length than non-coding regions We contributed positively to the bioinformatics community, providing resources such as the 1000 Genomes Selection Browser and Biostar 61
  62. 62. 63
  63. 63. Figures credits ● ● ● Slide 5: humans: http://blogs.ancestry.com/ancestry/ star trek: http://en.wikipedia.org/wiki/Star_Trek:_The_Original_Series Slide 6: Malaria: http://science.psu.edu/news-and-events/2012-news/Read7-2012 Climates: http://www.ancienteco.com/2012/03/climate-change-drives-human-evolution.html Agriculture: http://en.wikipedia.org/wiki/History_of_agriculture Slide 7: – ● Slide 14: – ● Cover of Science, 23 March 2001 Slide 15: – ● 1000 Genomes, CEPH-HGDP panel, UK10K, Hapmap websites Adapted from Stanley, P., Schachter, H., & Taniguchi, N. (2009). N-Glycans. Essentials of Glycobiology. Slide 17: – Glycosylation, downstream: Hossler, P., Mulukutla, B. C., & Hu, W.-S. (2007). Systems analysis of N-glycan processing in mammalian cells. PloS one, 2(1), e713. doi:10.1371/journal.pone.0000713 64
  64. 64. Figures credits ● ● ● ● Slide 27: http://www.cephb.fr/en/hgdp/diversity.php/ Slide 29: http://www.rationalskepticism.org Slide 32 Adapted from Vitti et al, 2012 Slide 42: – wikipedia 65
  65. 65. The Pathway approach Stronger Selection on Genes with high connectivity or upstream of a pathway 66
  66. 66. N-glycosylation – how does it work ● All the N-glycans are generated from a single sugar with a very conserved structure, called N-glycan precursor N-glycan precursor Signal for folded proteins Millions of different 67 glycans
  67. 67. The FST test Almost all the highest signals of FST are in genes of the downstream part 68
  68. 68. The iHS test GCS1 in EUR MAN2A1 in SSAFR and EASIA MGAT3 in EASIA 69
  69. 69. Combining p-values ● ● ● From Peng et al, Eur J Hum Genet. 2010 Fisher's combination test ZF follows a χ2(2K) distribution SNPs from the same gene may violate the assumption of independency, but still the method is robust to errors 70
  70. 70. Comparing upstream and downstream N-Glycosylation ● χ2 test comparing the number of events observed in the each part of the pathway, against what is the number expected if there were no pathway structure 71
  71. 71. How to convert genotypes to networks ● Two haplotypes per individual ● Reference allele → 0; Alternative allele → 1 Individual 1 AC AC AA GG TT TG CA TG Ancestral alleles: A A A G T T C T haplotype a 00000000 haplotype b 11000111 72

×