Experimental design and statistical analysis
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Experimental design and statistical analysis

  • 811 views
Uploaded on

Lecture of Dr Jiankang Wang about statistical analysis and QTL mapping

Lecture of Dr Jiankang Wang about statistical analysis and QTL mapping

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
811
On Slideshare
702
From Embeds
109
Number of Embeds
6

Actions

Shares
Downloads
21
Comments
0
Likes
0

Embeds 109

http://foodcroplecture.blogspot.com 91
http://foodcroplecture.blogspot.ru 14
http://foodcroplecture.blogspot.de 1
http://foodcroplecture.blogspot.cz 1
http://foodcroplecture.blogspot.mx 1
http://foodcroplecture.blogspot.kr 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Lecture 12Genetic Linkage Analysis and Map Construction 1
  • 2. 2
  • 3. Experiments with Plant Hybrids (1866) Seed shape: 5474 round vs 1850 wrinkled Cotyledon color: 6022 yellow vs 2001 green Seed coat color: 705 grey-brown vs 224 white Pod shape: 882 inflated vs 299 constricted Unripe pod color: 428 green vs 152 yellow Flower position: 651 axial vs 207 terminal Stem length: 787 long (20-50cm) vs 277 short (185-230cm)Rediscovered in 1900
  • 4. 4
  • 5. Ear length of maize (East 1911)P1: 7cm; P2: 17cmOne locus a=(17-7)/2=5; F2: 1/4 aa (7) + 2/4 Aa (12) + 1/4 AA (17)Two locus a=(17-7)/4=2.5 F2: 1/16 (7) + 4/16 (9.5) + 6/16 (12) + 4/16 (14.5) +1/16 (17) 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. 1 2VF 2 2 ka
  • 10. P1 ka P2 ka 2 (P1 P2 )k 1 8[VF2 2 (VP1 VP2 )]
  • 11. 1 2 1 2VA 2 a 2 ka 2 (P1 P )2k 8V A
  • 12. 12
  • 13. Mendel and FisherAnnuals of Science 1:115-close to the values that Mendel expected under his theorythat there must have been some manipulation, oromission, of dataDominant trait: 1/3 AA + 2/3 Aa Family size: 10 Non-segregating (AA) : Segregating (Aa) = 1:2 (Mendel) Fisher: Pro {Aa family classified as AA} = 0.75^10=0.0563 Pro {Non-segregating (AA)} =2/3*(1-0.0563)=0.6291 Non-segregating (AA) : Segregating (Aa) = 0.3709 : 0.6291 = 1 : 1.6961 13
  • 14. 14
  • 15. Genetic populations and pair- wise linkage analysis 15
  • 16. Populations handled in QTL IciMapping Parent P1 Parent P2 Legends Hybridization F1 Selfing 1. P1BC1F1 7. F2 2. P2BC1F1 Repeated selfing9. P1BC2F1 13. P1BC1F2 8. F3 14. P2BC1F2 10. P2BC2F1 Doubled haploids15. P1BC2F2 16. P2BC2F211. P1BC2RIL 5. P1BC1RIL 4. F1RIL 6. P2BC1RIL 12. P2BC2RIL BC3F1, BC4F1 etc. P1BC2F1 P1BC1F1 F1 P2BC1F1 P2BC2F1 Marker-assisted selection19. P1BC2DH 17. P1BC1DH 3. F1DH 18. P2BC1DH 20. P2BC2DH CSS lines or Introgression lines P1 × CP P2 × CP P3 × CP Pn × CP CP=common parent RIL family 1 RIL family 2 RIL family 3 RIL family i RIL family n One NAM population
  • 17. Example: 10 RILs in a rice population (Linkage map of Chr. 5) GrainMarker C263 R830 R3166 XNpb387 R569 R1553 C128 C1402 XNpb81 C246 R2953 C1447 width (mm)Position 0.0 3.5 8.5 19.5 32.0 66.6 74.1 78.6 81.8 91.9 92.7 96.8(cM)RIL1 0 0 0 0 0 0 0 0 0 0 0 0 2.33RIL2 2 2 2 2 2 0 0 0 0 2 2 2 1.99RIL3 0 2 2 2 2 2 2 2 2 2 2 2 2.24RIL4 0 0 0 0 0 0 2 2 2 2 2 2 1.94RIL5 0 0 0 0 0 2 2 0 0 0 0 0 2.76RIL6 0 0 0 2 2 2 2 2 2 2 2 2 2.32RIL7 0 0 0 0 0 0 0 0 0 0 0 0 2.32RIL8 2 2 0 2 2 0 0 0 0 2 2 2 2.08RIL9 0 0 0 0 2 2 0 0 0 0 0 0 2.24 17RIL10 0 0 0 0 2 2 0 0 0 0 0 0 2.45
  • 18. Genetic markers in linkage analysis Morphological traits hybridization experiments Cytogenetic and bio-chemistry markers (e.g. isozyme) DNA molecular markers RFLP, SSR, SNP etc.
  • 19. The four gametes (haplotypes) of an F1 P1: AABB P2: aabb A B a b A B a b F1: AaBb A B a b MeiosisA B A b a B a b (1-r)/2 r/2 r/2 (1-r)/2 19Parental type Recombinant Recombinant Parental type type type
  • 20. Expected genotypic frequency in backcross and DH populations P1: AABB; P2: aabb 20
  • 21. MLE of recombination frequencyLikelihood function n1 n2 n3 n4 n! 1 1 1 1L (1 r ) r r (1 r ) C (1 r ) n1 n4 ( r ) n2 n3 n1!n2 !n3! n4 ! 2 2 2 2Logarithm of likelihoodln L ln C (n1 n4 ) ln(1 r ) (n2 n3 ) ln r n2 n3 n2 n3 rMLE of r n1 n2 n3 n4 nFisher information d 2 ln L n1 n4 n2 n3 n I E( 2 ) E d r (1 r ) 2 r2 r (1 r )Variance of estimated r Vr 1 r (1 r ) I n
  • 22. Significance test of linkageNull hypothesis H0: r = 0.5 (no genetic linkage, orlocus A-a and B-b are independent)Alternative hypothesis HALikelihood ratio test (LRT) or LOD score L(r 0.5) 2LRT 2 ln[ ]~ (df 1) L(r ) L(r )LOD L(r 0.5)
  • 23. An example P1BC1 populationGenotypes of two inbred parents P1 and P2are AABB and aabbObserved samples of the four genotypes inP1BC1 AABB 162 AABb 40 AaBB 41 AaBb 158 40 41 81 r 20.20% 162 40 41 158 401 r (1 r ) 4Vr 4.02 10 23 n
  • 24. Test of linkageNull hypothesis H0: r = 0.5Alternative hypothesis HA L( r ) (1 r ) n1 n4 r n2 n3 6.3 10153L( r 0.5) ( 1 ) n1 n2 n3 n4 4Likelihood ratio test (LRT) (P<0.0001) and LODscore L( r )LRT 2 * ln[ ] 708.27 L( r 0.5) L(r )LOD log[ ] 153.80 24 L (r 0.5)
  • 25. Genotypic frequencies in RIL populations, compared with DHDH Theoretical RIL Theoreticalpopulation frequency population frequencyAABB f1=(1-r)/2 AABB f1=(1-R)/2AAbb f2=r/2 AAbb f2=R/2aaBB f3=r/2 aaBB f3=R/2aabb f4=(1-r)/2 aabb f4=(1-R)/2 25 R=2r/(1+2r)
  • 26. Parent type orRIL Marker 1 Marker 2 recombinant C263 XNpb387 n1=6RIL1 0 or A 0 or A P1 type n2=2 n3=0RIL2 2 or B 2 or B P2 type n4=2RIL3 0 or A 2 or B RecombinantRIL4 0 or A 0 or A P1 type R=2/10=0.2RIL5 0 or A 0 or A P1 type r=0.125RIL6 0 or A 2 or B RecombinantRIL7 0 or A 0 or A P1 type LRT=17.72 (P=2.56 10-5)RIL8 2 or B 2 or B P2 type LOD=3.85RIL9 0 or A 0 or A P1 typeRIL10 0 or A 0 or A P1 type
  • 27. Expected genotypicfrequencies in F2 populations
  • 28. MLE of r in F2: dominant markers 2 Logarithm of the likelihood ratio k (1 r )ln L C n1 ln(3 2r r 2 ) (n3 n7 ) ln(2r r2) n9 ln(1 2r r2) C n1 ln(2 k ) (n3 n7 ) ln(1 k ) n9 ln k MLE of r 2 ( 2n 3n1 n9 ) ( 2n 3n1 n9 ) 2 n n9k (1 r ) 2n Variance of the estimated r (1 k )(2 k ) (2r r 2 )(3 2r r 2 )Vr 2n(1 2k ) 2n(3 4r 2r 2 )
  • 29. MLE of r in F2: co-dominant markers (Newton-Raphson algorithm) Log-likelihood function ln L ln C (2n1 2n9 n2 n4 n6 n8 ) ln(1 r ) ( n2 n4 n6 n8 2n3 2n7 ) ln r n5 ln(1 2r 2r 2 ) The first-order derivative of LogL f(r) ) d dr L 2n 2n n 1n n n n n n rn 2n 2n 1n (24rr 22r) ln r 1 9 2 4 6 8 2 4 6 8 3 7 5 2 The second-order derivative of LogL 2 2 d ln L 2 n 2n n n n n n n n n 2n 2n n ( 4r 4r ) f(r) d r ) 2 ( r 1) 1 9 r 2 2 4 (1 2r 2r ) 6 8 2 4 6 2 8 3 7 5 2 2 The iteration algorithm: ri+1 = ri - f(ri)/f(ri)
  • 30. MLE of r in F2: co-dominant markers (EM algorithm)EM for expectation and maximizationE-step: for an initial r0, calculate the probability ofcrossover in each marker typeM-step: Update r, and repeat from the E-step 1 r n nk Pk ( R | G) k
  • 31. Expected probability of crossover r= [n1 0+ n2 0.5+ n3 1 n8 0.5+ n9 0]/n
  • 32. Estimated r after 3 EM iterations (r0=0.5)
  • 33. Estimated r after 3 EM iterations (r0=0.25)
  • 34. Estimated r after 3 EM iterations (r0=0.0)
  • 35. Co-dominant markers in other populations R=2r/(1+2r)
  • 36. More populations (e.g. BC1F2, F3 etc): Generation transition matrix of
  • 37. Distortion has little effect on linkage analysis!DH pop Theo. Freq. Distortion Freq. in distortionAABB f1=(1-r)/2 (1-r)/2 (1-r)/(1+s)AAbb f2=r/2 r/2 r/(1+s)aaBB f3=r/2 s r/2 r s/(1+s)aabb f4=(1-r)/2 s (1-r)/2 (1-r) s/(1+s)Sum 1 (1+s)/2 1 r r /(1 s) r s /(1 s) r (1 s) /(1 s) r
  • 38. Three-point analysis and linkage map construction 38
  • 39. Linkage analysis of three markers r13 r12 r23 21 r12 r23When 0 interference), (no (1 r13 ) (1 r12 )(1 r23 ) r12 r23 r13 r12 (1 r23 ) (1 r12 ) r23 r12 r23 2r12 r23When 1 (complete interference), r13 r12 r23The order of the three loci can be determined afterlinkage analysis (3!/2=3 potential orders) 39 1 2 3, or 1 3 2, or 2 1 3
  • 40. Mapping distance and recombination frequencyMapping distance m13 m12 m23Unit of mapping distance M (Morgan) or cM (centi-Morgan), 1M=100cMThe function of mapping distance onrecombination frequency (Mappingfunction): m f (r ) 40
  • 41. Common mapping functionsMorgan function (complete interference) In M: m =r (M) In cM: m =r 100 (cM)Haldane function (no interference) 1 2m In M: m f (r ) 2 ln(1 2r ) r 1 2 (1 e ) m / 50 In cM: m f (r ) 50 ln(1 2r ) r 1 2 (1 e )Kosambi function (interference depends on length of interval) 4m In M: m 1 1 2r ln r 1 e 1 4m 4 1 2r 2 e 1 m / 25 1 2r 1e 1 m 25 ln r 41 In cM: 1 2r 2 em / 25 1
  • 42. Comparison of the three functionsMapping distance (cM) (M) 42 Recombination frequency
  • 43. Three steps in linkage map constructionStep 1: Grouping. Grouping can be based on (i) a threshold of LOD score (ii) a threshold of marker distance (cM) (iii) anchor informationStep 2: Ordering. Three ordering algorithms are (i) SER: SERiation (Buetow and Chakravarti, 1987. Am J Hum Genet 41:180 188) (ii) RECORD: REcombination Counting and ORDering (Van Os et al., 2005. Theor Appl Genet 112: 30 40) (iii) nnTwoOpt: nearest neighbor was used for tour construction, and two-opt was used for tour improvement, similar to Travelling Salesman Problem (TSP) (Lin and Kernighan, 1973. Oper. Res. 21: 498 516.
  • 44. Three steps in linkage map constructionDue to the large number of markers (n), it is impossibleto compare all possible orders (say n=50, possibleorders are n!/2=1.52x1064). Orders from the abovealgorithms are regional optimizations.Step 3: Rippling. Five rippling criteria are (i) SARF (Sum of Adjacent Recombination Frequencies) (ii) SAD (Sum of Adjacent Distances) (iii) SALOD (Sum of Adjacent LOD scores) (iv) COUNT (number of recombination events)
  • 45. The MAP functionality in QTL IciMapping 45
  • 46. Interface of the MAP functionality
  • 47. A. Map of one chromosome B. Map of all chromosomes Map outputs:Linkage map for eachchromosome (A) or all chromosomes (B)
  • 48. An example map of sevenchromosomes or groups 48
  • 49. Linkage map and physical mapSpecies Size of haploid Size of linkage kb/cM genome (kb) map (cM)Yeast 2.2 104 3700 6Neurospora 4.2 104 500 80Arabidopsis 7.0 104 500 140Drosophila 2.0 105 290 700Tomato 7.2 105 1400 510Human 3.0 106 2710 1110Wheat 1.6 107 2575 6214Rice 4.4 105 1575 279 49Corn 3.0 106 1400 2140
  • 50. What is QTL Mapping?The procedure to map individual genetic factorswith small effects on the quantitative traits, tospecific chromosomal segments in the genomeThe key questions in QTL mapping studies are: How many QTL are there? Where are they in the marker map? How large an influence does each of them have on the trait of interest?
  • 51. GrainMarker C263 R830 R3166 XNpb387 R569 R1553 C128 C1402 XNpb81 C246 R2953 C1447 width (mm)Position 0.0 3.5 8.5 19.5 32.0 66.6 74.1 78.6 81.8 91.9 92.7 96.8(cM)RIL1 0 0 0 0 0 0 0 0 0 0 0 0 2.33RIL2 2 2 2 2 2 0 0 0 0 2 2 2 1.99RIL3 0 2 2 2 2 2 2 2 2 2 2 2 2.24RIL4 0 0 0 0 0 0 2 2 2 2 2 2 1.94RIL5 0 0 0 0 0 2 2 0 0 0 0 0 2.76RIL6 0 0 0 2 2 2 2 2 2 2 2 2 2.32RIL7 0 0 0 0 0 0 0 0 0 0 0 0 2.32RIL8 2 2 0 2 2 0 0 0 0 2 2 2 2.08RIL9 0 0 0 0 2 2 0 0 0 0 0 0 2.24RIL10 0 0 0 0 2 2 0 0 0 0 0 0 2.45
  • 52. Bi-parental mapping populations (linkagemapping) Temporary population: F2 and BC Permanent population: RIL, DH, CSSL Secondary populationAssociation mapping Natural populations: human and animals
  • 53. Single marker analysis (Sax 1923; Soller et al. 1976)The single marker analysis identifies QTLs based on the differencebetween the mean phenotypes for different marker groups, but cannotseparate the estimates of recombination fraction and QTL effect. Interval mapping (IM) (Lander and Botstein 1989)IM is based on maximum likelihood parameter estimation and providesa likelihood ratio test for QTL position and effect. The majordisadvantage of IM is that the estimates of locations and effects of QTLsmay be biased when QTLs are linked. Regression interval mapping (RIM) (Haley and Knott 1992; Martinez and Curnow 1992 )RIM was proposed to approximate maximum likelihood interval mappingto save computation time at one or multiple genomic positions.
  • 54. Composite interval mapping (CIM) (Zeng 1994)CIM combines IM with multiple marker regression analysis,which controls the effects of QTLs on other intervals orchromosomes onto the QTL that is being tested, and thusincreases the precision of QTL detection. Multiple interval mapping (MIM) (Kao et al. 1999)MIM is a state-of-the-art gene mapping procedure. Butimplementation of the multiple-QTL model is difficult, since thenumber of QTL defines the dimension of the model which isalso an unknown parameter of interest. Bayesian model (Sillanpää and Corander 2002)In any Bayesian model, a prior distribution has to beconsidered. Based on the prior, Bayesian statistics derives theposterior, and then conduct inference based on the posteriordistribution. However, Bayesian models have not been widelyused in practice, partially due to the complexity ofcomputation and the lack of user-friendly software.
  • 55. mm Mm MM mm Mm MMA. B. QTL QTL
  • 56. Backcrosses (P1BC1 and P2BC1) of P1: MMQQ and P2: mmqq BC1 BC2 Genotypic GenotypicGenotype Frequency Genotype Frequency value value 1 1MMQQ 2 (1 r ) m+a MmQq 2 (1 r ) m+d 1 1MMQq 2 r m+d Mmqq 2 r m-a 1 1MmQQ 2 r m+a mmQq 2 r m+d 1 1 MmQq 2 (1 r ) m+d mmqq 2 (1 r ) m-a
  • 57. Two marker types: MM (1 r ) MMQQ r MMQq (1 r )(m a) r (m d ) m (1 r )a rd Mm r MmQQ (1 r ) MmQq r (m a) (1 r )(m d ) m ra (1 r )dDifference in phenotype between the two types MM Mm (1 2r )(a d )
  • 58. Linear model (j=1 2 n) yi b0 b* x* e j jb* represent QTL effect x * is the indicator jvariable (0 or 1) for QTL genotypeLikelihood profileSupport interval: One-LOD interval
  • 59. P1: Mi Q Mi +1 P2: mi q mi +1 Mi Q Mi +1 mi q mi +1 F1: Mi Q Mi +1 P1: Mi Q Mi +1 mi q mi +1 Mi Q Mi +1Mi Q Mi +1 Mi Q Mi +1 Mi Q Mi +1 Mi Q Mi +1Mi Q Mi +1 Mi Q mi +1 mi q Mi +1 mi q mi +1 Mi Q Mi +1 Mi Q Mi +1 Mi q mi +1 mi Q Mi +1 1 4
  • 60. Assumption: No more than one QTLper chromosome or linkage groupLarge confidence intervalBiased effect estimationComposite interval mapping (CIM)(Zeng 1994)
  • 61. In the algorithm of CIM, both QTL effect at thecurrent testing position and regression coefficientsof the marker variables used to control geneticbackground were estimated simultaneously in anexpectation and maximization (EM) algorithm. Thus, this algorithm could not completely ensurethat the effect of QTL at current testing intervalwas not absorbed by the background markervariables and therefore may result in biasedestimation of the QTL effect.
  • 62. Theoretical basis of ICIM m G ajg j aa jk g j g k j 1 j k E ( g j | X) j xj j xj 1E( g j gk | X) j k x j xk j k x j xk 1 j k x j 1xk j k x j 1xk 1 m 1 yi b0 b j xij b jk xij xik ei j 1 j k
  • 63. One-dimensional scanning (interval mapping) yi yi b j xij j k ,k 1Two-dimensional scanning (interval mapping) yi yi br xir brs xir xis r j , j 1,k ,k 1 r j, j 1 s k ,k 1
  • 64. 40 2 1.5 30LOD score 1 0.5 Effect 20 0 10 -0.5 11111111111222222222233333333334444444444 -1 0 -1.5 11111111111222222222233333333334444444444 -2 Scanning posoition along the genome Scanning posoition along the genome 80 3 2 60LOD score 1 Effect 40 0 -1 11111111111222222222233333333334444444444 20 -2 0 -3 11111111111222222222233333333334444444444 -4 Scanning posoition along the genome Scanning posoition along the genome 70 1.5 60 1LOD score 50 40 0.5 Effect 30 0 20 -0.5 11111111111222222222233333333334444444444 10 0 -1 11111111111222222222233333333334444444444 -1.5 Scanning posoition along the genome Scanning posoition along the genome
  • 65. Detectingepistasis wherethe interactingsignificantadditive effects
  • 66. One-locus model in F2One-locus model: G aw dvwhere is mean of the two homozygousgenotypes QQ and qq, a is the additiveeffect, d is the dominance effect . w andv are the indicators for genotypes at theQTL, valued at 1 and 0 for QQ, 0 and 1for Qq, and -1 and 0 for qq, respectively
  • 67. The expected genotypic value of an individual with known marker typesE (G | x1 , x2 , y1 , y2 ) a E ( w | x1 , x2 , y1 , y2 ) d E (v | x1 , x2 , y1 , y2 )
  • 68. Probability of the three QTLgenotypes under given marker typesLeft Right QQ (w=1, v=0) Qq (w=0, v=1) qq (w=-1, v=0)marker marker (m+a) (m+d) (m-a) 2 2 1 1 2 2AA BB 1 4 (1 r1 ) (1 r2 ) 2 1 r (1 r1 )r2 (1 r2 ) r r 4 1 2 2AA Bb 1 2 (1 r1 ) 2 r2 (1 r2 ) 1 r (1 r1 )(1 r2 ) 2 1 2 1 r (1 r1 )r2 2 1 2 1 r r (1 r2 ) 2 1 2 1 2 2 (1 r1 ) r2 1 1 2AA bb 4 r (1 r1 )r2 (1 r2 ) 2 1 r (1 r2 ) 2 4 1
  • 69. Estimation of marker class mean IndicatorMarker for marker E (w | x1 , x2 , y1 , y2 ) E (v | x1 , x2 , y1 , y2 ) Genetic mean n Frequency class of the class x1 x2 y1 y2AABB n1 1 4 (1 r ) 2 1 1 0 0 f1 g1 f1a g1d 1 f2a g2dAABb n2 2 r (1 r ) 1 0 0 1 f2 g2 1 2AAbb n3 4 r 1 -1 0 0 f3 g3 f 3a g3d 1 2r1r2 /(1 r ) f1 2r1 (1 r1 )r2 (1 r2 ) /(1 r ) 2 g1 [(1 2r1 )r2 (1 r2 )] /( r r ) 2 f2 r1 (1 r1 )(1 2r2 2r22 ) /( r r 2 ) g2 (r2 r1 ) / r f3 2r1 (1 r1 )r2 (1 r2 ) / r 2 g3
  • 70. Relationship between marker class mean and marker effect (including marker interactions)f1a g1d 1 1 1 0 0 1 0 0 0 (d ) df 2a g 2d 1 1 0 0 1 0 1 0 0 (a ) A1f 3a g 3d 1 1 1 0 0 1 0 0 0 (a ) A2f 4a g 4d 1 0 1 1 0 0 0 1 0 (d ) D1g5d 1 0 0 1 1 0 0 0 1 (d ) D2f 4a g 4d 1 0 1 1 0 0 0 1 0 (d ) AA12f 3a g 3d 1 1 1 0 0 1 0 0 0 AD12f 2a g 2d 1 1 0 0 1 0 1 0 0 DA12f1a g1d 1 1 1 0 0 1 0 0 0 (d ) DD12
  • 71. Relationship between marker effects and QTL effects 1 (d ) d 2 ( g1 g3 )d(a) A1 f2a 1(a) A2 2 ( f1 f 3 )a 1 1(d ) D1 ( g 2 1 2 g3 g 4 )d 1 1(d ) D2 ( g g2 2 1 2 g 3 )d(d ) AA12 1 2 ( g1 g 3 )dAD12 0DA12 0(d ) DD12 ( 1 g1 g 2 2 1 2 g 3 g 4 g 5 )d
  • 72. The linear model of genotypic values on markers in F2E(w | x1 , x2 , y1 , y2 ) x 1 1 2 2 xE (v | x1 , x2 , y1 , y2 ) 1 1 y 2 y2 xx 12 1 2 yy 12 1 2
  • 73. The linear model of genotypic values on markers in F2E (G | x1 , x2 , y1 , y2 ) (a) A1 x1 (d ) D1 y1 (a) A2 x2 (d ) D2 y2 (d ) AA12 x1 x2 (d ) DD12 y1 y2
  • 74. Properties of the linear model in F2 The additive and dominance effects of the flanked QTL are completely absorbed by the six variables in the model above. Interactions between marker variables may be declared as interaction between QTL by mistake when using ANOVA. But from our analysis, interactions between marker variables can be caused simply by dominance effects of QTL .
  • 75. Multiple QTL model in F2For multiple QTL, assume there are mQTL located on m intervals defined bym+1 markers on one chromosome, thenthe genotypic value of an F2 individual isdefined as: m G [a j w j d jv j ] j 1
  • 76. The linear model in F2 under multiple QTL The genotypic value of an F2individual with known marker typescan be re-organized as: m 1 m 1E (G ) j xj j yj j 1 j 1 m m j, j 1 xjxj 1 j, j 1 yj yj 1 j 1 j 1
  • 77. The linear model for QTL mapping in F2 m 1 m 1P E (G ) j xj j yj j 1 j 1 m m j, j 1 xjxj 1 j, j 1 yj yj 1 j 1 j 1
  • 78. Property of the linear model for QTL mapping in F2
  • 79. ICIM (Inclusive Composite Interval Mapping) in F2Pi Pi [ j xij j yij ] j k ,k 1 [ j , j 1 ijx xi , j 1 j, j 1 yij yi , j 1 ] j k
  • 80. Hypothesis test of QTL mapping in F2The two hypotheses used to test the existenceof QTL at the scanning position are:vs. H 0 : 1 2 3 H A : at least two of 1 , 1 and 3 are not equalThe logarithm likelihood under HA is 9 3 2 LA log[ jk f ( Pi ; k , )] j 1 i Sj k 1 where S j denotes individuals belonging to the j th marker class (j=1, th jk k=1, 2, 3) is the proportion of the k QTL genotype in ththe j class, and f ( ; k , 2 ) is the density function of the normal 2distribution N ( k , ) .
  • 81. EM algorithm of QTL mapping in F2Use EM algorithm to get the estimationof 1 , 2 and 3So the genetic effects in G aw dvwere therefore estimated by 1 1 2 ( 1 3 ) a 2 ( 1 3 ) d 2
  • 82. EM algorithm of QTL mapping in F2 Parameters under H0 were calculated as: n n 1 2 1 2 0 n Pi 0 n ( Pi 0 ) i 1 i 1 From which the maximum likelihood under H0, and the LOD score between HA and H0 can be calculated.
  • 83. QTL distribution models in simulation
  • 84. QTL distribution models in simulation
  • 85. QTL distribution models in simulationF2 populations were simulated bythe genetics and breedingsimulation tool of QuLine.QTL mapping using ICIM wasimplemented by the software QTLIciMapping.
  • 86. Theoretical marker effects in thegenetic model used in simulationThe expected additive, dominance,additive by additive, and dominance bydominance effects of the two flankingmarkers associated with each QTL isshown in the following table.It indicated that the dominance of a QTLcould complicate the coefficients of thetwo markers flanking a QTL, and causethe interactions between markers.
  • 87. The expected marker effects in simulation InteractionQTL (d) d (a) A1 (a) A2 (d ) D1 (d ) D2 (d ) AA12 (d ) DD12 variation (%)QTL1 0.000 0.498 0.498 0.000 0.000 0.000 0.000 0.0QTL2 0.253 0.000 0.000 0.248 0.248 -0.248 0.243 21.8QTL3 0.253 0.498 0.498 0.248 0.248 -0.248 0.243 5.7QTL4 -0.253 0.498 0.498 -0.248 -0.248 0.248 -0.243 5.7QTL5 0.379 0.498 0.499 0.371 0.371 -0.371 0.364 9.6QTL6 -0.379 0.498 0.498 -0.371 -0.371 0.371 -0.364 9.6
  • 88. QTL mapping in simulated F2 populations
  • 89. QTL LOD PVE True Est. True Est. add. True Est. score (%) Position Position add. effect dom. dom. (cM) (cM) effect effect effectQTL distribution model IQTL1 16.52 6.67 25 28 1 0.88 0 -0.11QTL2 7.67 3.27 55 53 0 0.03 1 0.85QTL3 25.11 11.28 25 24 1 0.86 1 1.08QTL4 35.46 16.43 55 57 1 0.74 -1 -1.58QTL5 37.12 16.74 25 26 1 1.05 1.5 1.38QTL6 28.44 13.16 55 55 1 0.84 -1.5 -1.22
  • 90. 180 individualsThe cross was made in Chengdu, China,in July 2002 between the indica ricevariety and Nipponbare.137 SSR markers.The whole genome was of 2046.2 cM, andthe average marker distance was 17.1 cM.A number of agronomic traits wereinvestigated in the field.
  • 91. QTL mapping in the actual F2 population
  • 92. QTL distributionTrait R2 of R2 of Absolute degree of dominance (|d/a|) Total additive additive and dominance (%) <=0.25 (0.25, 0.75] (0.75, 1.25] >1.25 (%)PH 25.84 51.56 2 1 1 5 9HD 16.12 41.37 1 1 1 3 6PL 25.58 61.26 5 3 1 8 17FL 20.86 40.00 0 2 0 3 5SPK 25.64 27.09 1 1 1 1 4TKW 20.11 20.11 2 0 2 1 5DP 19.45 24.87 1 1 0 1 3GL 30.69 41.96 1 1 0 0 2GW 26.63 26.63 2 2 0 0 4RLW 37.63 45.70 1 3 1 1 6 Total 16 15 7 23 61
  • 93. PVE distribution 20 18Frequency across traits 16 14 12 10 8 6 4 2 0 Phenotypic variation explained(%)
  • 94. Trait QTL Chr Distance to Add Dom LOD PVE(%) left markerPlant QPh1-1 1 12 -0.57 -7.98 8.04 12.03height QPh1-2 1 19.5 -8.59 0.59 15.54 25.57(Ph) QPh3-1 3 16.9 4.35 -4.86 6.51 13.30 QPh3-2 3 11.4 -4.69 -1.00 5.04 6.84 QPh4 4 13.7 -3.56 -2.09 4.61 5.53 QPh5 5 13 -0.44 -4.48 3.13 3.86 QPh6 6 6.2 -0.79 -5.05 3.17 4.96 QPh7 7 7 0.26 6.48 5.27 7.56 QPh12 12 2.4 -1.66 3.93 3.98 5.44Heading QHd1 1 22.1 1.74 -0.30 3.65 7.27date (Hd) QHd3 3 19.9 0.88 -3.70 6.04 21.09 QHd4 4 0.2 -0.77 1.85 3.58 5.24 QHd8 8 5.7 -1.41 -1.46 4.79 8.20 QHd10 10 0.3 -1.78 -0.80 4.85 7.21 QHd11 11 6.2 0.15 -3.03 5.71 11.70
  • 95. Conclusions m 1 m 1P E (G ) j xj j yj j 1 j 1 m m j, j 1 xjxj 1 j, j 1 yj yj 1 j 1 j 1
  • 96. Six methods in BIPSMA: single marker analysis (Soller et al., 1976. Theor.Appl. Genet. 47: 35-39)IM-ADD: the conventional simple interval mapping(Lander and Botstein, 1989. Genetics 121: 185-199)ICIM-ADD: inclusive composite interval mapping ofadditive (and dominant) QTL (Li et al., 2007. Genetics175: 361-374. Zhang et al., 2008. Genetics 180: 1177-1190)IM-EPI: interval mapping of digenic epistatic QTLICIM-EPI: inclusive composite interval mapping ofdigenic epistatic QTL (Li et al., 2008. Theor. Appl.Genet. 116: 243-260)SGM: selective genotyping mapping (Lebowitz et al.,1987. Theor. Appl. Genet. 73: 556 562)
  • 97. Interface of the BIP functionality
  • 98. LOD profile of ICIM additive mapping (ICIM-ADD)
  • 99. Figures of interacting QTL from ICIM epistatic mapping (ICIM-EPI)