10 Liu, Dajiang

Statistical Genetics Using Sequence Data Dajiang J. Liu Department of Statistics

Why We Study Statistical Genetics Statistics is originated from genetics R.A. Fisher: “ The Correlation Between Relatives on the Supposition of Mendelian Inheritance” Introduced the concept of variance in this article Francis Galton : Regression of human height toward the mean: Introduced correlation and regression Karl Pearson: “ Mendelism and the problem of mental defect” “ Tuberculosis, heredity and environment ” Why don’t we seek our roots? In order to find disease genes in the genome, statistics is a must

Statistical Genetics Disease gene mapping : The determination of the sequence of genes and their relative distances from one another on a specific chromosome Technology driven field : Mendel’s era: Segregation Analysis - Patience : peas, fruit fly: inbreeding is necessary Experimental Design

Statistical Genetics Modern era: Microsatellite Markers: Genetic linkage analysis Extremely successful for mapping and identifying Mendelian traits Single nucleotide polymorphism (SNP) marker Case control studies: Genome Wide Association Studies: To identify common variants involved in complex traits Computational Techniques for likelihood in Pedigrees Statistics play a major role

Statistical Genetics Sequencing Era: Study of diseases due to rare variants is emerging ABI SOLiD sequencer Statistics is ALL for sequencing data

Statistical Genetics Data we work with Human Genome Project Hap Map Project 1000 Genome Project

Multi-facotorial Disease Etiology Hypothesis Common Disease Common Variants Hypothesis (CD/CV) hypothesis: Common diseases are caused by a few common variants with moderate effect E.g. Age-related Macular Degeneration: Common variants are likely to have lower odds ratio than rare variants:

Multi-facotorial Disease Etiology Hypothesis Common Disease Rare Variants Hypothesis: Common diseases are caused by multiple rare variants with large effect size: The discovery of rare variants will have high impact on public health since they will aid in risk prediction and treatment E.g. Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol E.g. Colorectal Adenomas

Challenges on Statistical Methodologies Variants misclassification: Non-causal variants Included: Huge number of mutations on the genome: Most of them are not causing the disease under study Causal Variants Excluded: Intronic mutations: Intergenic regions: Unknown patterns of interactions: Within gene interactions: e.g. Hirschsprung’s disease (RET gene) Gene x gene interactions: e.g. breast cancer genes (BRCA 1 BRCA2 x CHEK2) Adaptive methods are needed 1. 2. x

Kernel Based Adaptive Clustering Combine variant classification with association testing into a coherent framework Applicable to population based case/control studies using unrelated individuals Robust against variants misclassifications Can handle gene x gene interactions and gene x environment interactions

10 Liu, Dajiang

More Related Content

What's hot

Similar to 10 Liu, Dajiang

More from Hadley Wickham

Recently uploaded

10 Liu, Dajiang