Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF
MARKER-ASSISTED SELECTION
IN A WHEAT BREEDING
PROGRAM
Narelle Lee Kruger
B.Agr.Sc (Hons I)
The University of Queensland
A thesis submitted for the degree of Doctor of Philosophy
The University of Queensland
Australia
School of Land and Food Sciences
February 2005

Declaration of Originality
This thesis is the original work of the author, except as otherwise indicated.
It has not been submitted previously for a degree at any University.
Narelle Lee Kruger

ACKNOWLEDGEMENTS v
Acknowledgements
I would like to thank my supervisors Mark Cooper, Kaye Basford and Dean
Podlich. They have provided countless hours of direction, guidance, assistance and
support to me throughout this research and I appreciate the time they have given up to
see this work through to the end. Thank you also to Mark and Dean’s families who let
me into their homes while I was visiting them in the USA.
I thank Chris Winkler at Pioneer Hi-bred International and Pioneer Hi-bred In-
ternational for accommodating me on my visits to Des Moines, USA.
I would like to thank all the QTL detection analysis software programmers who
helped me via email and especially to Friedrich Utz who helped to ensure PLABQTL
would run on our computer systems.
Thankyou to the Australian Grains Research and Development Corporation for
financial support as a Grains Research Scholar. The Graduate School Research Travel
Award from The University of Queensland was invaluable as a mechanism for visiting
Mark and Dean in the USA to ensure this work was completed.
Thanks to my good friends and colleagues Nicole Jensen, Jo Stringer, Kevin
Micallef, Hunter Laidlaw, Ky Mathews and Allan Rattey, who made studying at UQ
immensely enjoyable. You have all provided me with invaluable advice in your areas of
expertise, and have either been through, or are presently immersed in the PhD process.
To Chris, who I truly love, for without this thesis we would never have met.
Thank you for everything.
Finally, thanks to Mum, Dad, Shane, Karen and Debra and the rest of my family
who supported me through the whole process, even when the light seemed to be moving
away faster than I was travelling. I love and miss you all.

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMvi

ABSTRACT vii
Abstract
The wheat Germplasm Enhancement Program, managed from the University of
Queensland, was developed to provide a source of high yielding and high quality wheat
germplasm to the pedigree breeding programs run by the Leslie Research Centre at
Toowoomba and the Plant Breeding Institute of the University of Sydney at Narrabri.
Investigating the feasibility of introducing marker-assisted selection into the Germ-
plasm Enhancement Program was considered an important step in an attempt to
increase genetic gains for this breeding program. Implementing and testing marker-
assisted selection in the Germplasm Enhancement Program as an empirical experiment
would be costly and time consuming. By examining through simulation the impact of
marker-assisted selection in combination with S1 family (the current approach) and
doubled haploid line selection strategies, it was feasible to determine their ability to
contribute towards accelerated rates of response to selection.
The aim of most wheat breeding programs is to develop commercially viable
cultivars that are superior in performance (quality and yield stability) to those presently
being grown in the target production system. Until recently, producing a superior
cultivar has been based on a combination of experiences, quantitative genetic theory
predictions and the outcomes of the laborious work involved in empirical studies.
Empirical experimentation will always be essential, however, simulation provides a
methodology to extend the basic quantitative genetics theoretical prediction equations
by relaxing some key assumptions applied to make the mathematical equations
tractable. The simulation work in this thesis was conducted using the QU-GENE
(QUantitative-GENEtics) simulation platform developed at the University of Queen-
sland (Podlich and Cooper 1998). To ensure that the simulation model was an accurate
extension of the theory, it was important to test the consistency and convergence of the
different strategies for deriving expectations of selection. It was found that under simple
additive models, the simulation accurately modelled multi-genic recombination and
produced the same results as the prediction equations. It was also observed that
departure from the simple additive model frequently invalidated the normality assump-

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMviii
tion held by theory and caused the expectations from the prediction equations to over-
estimate the response compared to the simulations.
Reliable detection of quantitative trait loci (QTL) is a critical step in conducting
marker-assisted selection in a breeding program. After comparing a number of
programs, PLABQTL (Utz and Melchinger 1996) was selected as the QTL detection
analysis program to be used throughout this thesis. The modelling of multiple QTL
scenarios for a simulated wheat genome was examined to determine the extent to which
the wheat genome needed to be represented in the simulation experiment to examine the
reliability of the detection of QTL. Representing the full wheat genome did not change
the conclusions compared to simulations based on a reduced genome model. For
example, it was found that a model based on 12 chromosomes, 12 QTL and two flanking
markers per QTL could be used in place of a 21 chromosome, 12 QTL, and eight
flanking markers per QTL model. An advantage of the cutdown in genome size in the
simulation experiments represented a saving in the time taken for the QTL analysis to
complete. As approximately 45 million simulation experiments were analysed in this
thesis, this accounted for a significant saving in time.
Mapping population size, heritability and per meiosis recombination fraction
between a marker and a quantitative trait locus each influenced the detection of QTL.
The number of QTL detected in this study generally increased as the heritiability
increased, the per meiosis recombination fraction became smaller, the mapping
population size was increased or when two or more of these variables were combined.
This work has reinforced that the recommended threshold mapping population size of
500 to 1000 individuals is required for confidence in the power of the mapping study for
QTL detection (Beavis 1998, Ober and Cox 1998, Holland 2004).
Complexities were simulated through the addition of epistasis and genotype-by-
environment (G×E) interaction into the genetic models to determine their impact on the
detection of QTL and on response to selection. These interactions have been shown
experimentally to be important factors influencing grain yield variation in the reference
population of the Germplasm Enhancement Program. Digenic epistatic networks were
found to have no effect on the detection of QTL under the models tested, while more

ABSTRACT ix
complex epistatic networks involving a large number of genes did have an effect.
Genotype-by-environment interactions were found to influence the detection of QTL in a
mapping population due to the complications they can cause in the phenotyping of
individuals, and were particularly influential where QTL had different effects on trait
phenotypes in different environmental conditions. Epistasis and G×E interactions were
also found to cause a decrease in the response to selection for the breeding strategies
when they were included in the genetic models.
For the range of quantitative trait genetic models considered, marker-assisted
selection produced a greater response to selection than phenotypic selection and
marker selection. The result of this simulation study indicated that a breeding strategy
based on a combination of doubled haploid lines and marker-assisted selection was
likely to produce the greatest response to selection for quantitative traits across a wide
range of simple to complex genetic models.

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMx

LIST OF PUBLICATIONS xi
List of Publications
Principal Author
Kruger NL, Cooper M and Podlich DW (2002) Comparison of phenotypic, marker and
marker-assisted selection strategies in an S1 family recurrent selection strategy.
In: JA McComb (ed.) 'Plant Breeding for the 11th Millennium'. Proceedings of
the 12th Australasian Plant Breeding Conference, 15-20 September 2002. Perth,
W. Australia: Australasian Plant Breeding Association Inc. pp. 696-701.
Kruger NL, Cooper M, Podlich DW, Jensen NM and Basford KE (2001) The effect of
population size on QTL detection in recombinant inbred lines. In: G Hollamby,
T Rathjen, R Eastwood and N Gororo (eds). Wheat Breeding Society of Austra-
lia Inc.10th Assembly Proceedings. Mildura, Australia. pp. 194-196.
Kruger NL (1999) Simulation analysis of doubled haploids in a wheat breeding
program. The University of Queensland, School of Land and Food Sciences,
Plant Improvement Group Research Report No.5.
Kruger NL, Podlich DW and Cooper M (1999) Comparison of S1 and doubled haploid
recurrent selection strategies by computer simulation with applications for the
Germplasm Enhancement Program of the Northern Wheat Improvement Pro-
gram. In: P Williamson, P Banks, I Haak, J Thompson and AW Campbell (eds).
Proceedings of the Ninth Assembly Wheat Breeding Society of Australia - Vision
2020. Toowoomba: The University of Southern Queensland. pp. 216-219.
Co-author
Cooper M, Podlich DW, Micallef KP, Smith OS, Jensen NM, Chapman SC and Kruger
NL (2001) Complexity, quantitative traits and plant breeding: a role for simula-
tion modeling in the genetic improvement of crops. In: MS Kang (ed.) Quantita-
tive Genetics, Genomics and Plant Breeding. CAB International: Wallingford,
UK. pp. 143-166.

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMxii

TABLE OF CONTENTS xiii
Table of Contents
ACKNOWLEDGEMENTS .............................................................................................................................V
ABSTRACT............................................................................................................................................. VII
LIST OF PUBLICATIONS........................................................................................................................... XI
TABLE OF CONTENTS............................................................................................................................ XIII
LIST OF TABLES.................................................................................................................................... XIX
LIST OF FIGURES ................................................................................................................................XXIII
LIST OF ABBREVIATIONS ..................................................................................................................XXXIII
PART I BACKGROUND ..........................................................................................................................1
CHAPTER 1 INTRODUCTION...............................................................................................................3
CHAPTER 2 REVIEW OF LITERATURE ..........................................................................................11
2.1 INTRODUCTION ........................................................................................................................11
2.2 PLANT BREEDING PROGRAMS: A REVIEW OF TRADITIONAL AND MOLECULAR SELECTION
TECHNIQUES ...........................................................................................................................................12
2.2.1 Traditional selection...........................................................................................................12
2.2.2 Indirect selection ................................................................................................................14
2.2.2.1 Recombination and linkage .............................................................................................................. 14
2.2.2.2 Generating genetic maps .................................................................................................................. 18
2.2.2.3 Detecting QTL.................................................................................................................................. 19
2.2.2.4 Statistical methods used to detect QTL ............................................................................................ 21
2.2.2.5 Statistical issues to consider when detecting QTL............................................................................ 23
2.2.2.6 Marker-assisted selection ................................................................................................................. 25
2.3 THE GERMPLASM ENHANCEMENT PROGRAM...........................................................................29
2.4 GENOTYPE-ENVIRONMENT FACTORS INFLUENCING RESPONSE TO SELECTION..........................36
2.4.1 Introduction........................................................................................................................36
2.4.2 Epistasis..............................................................................................................................38
2.4.3 G×E interactions................................................................................................................43
2.5 A ROLE FOR COMPUTER SIMULATION IN THE ANALYSIS OF GENETIC SYSTEMS .........................48
2.5.1 Background.........................................................................................................................48
2.5.2 The QU-GENE simulation platform ...................................................................................52
2.6 SYNOPSIS FROM LITERATURE ...................................................................................................55
CHAPTER 3 MODELLING METHODOLOGY .................................................................................57
3.1 INTRODUCTION .................................................................................................................................57
3.2 ITERATIVE MODELLING PROCESS ......................................................................................................57
3.2.1 Propose the relevant questions ................................................................................................58
3.2.2 Define the proposed simulation experiment or module............................................................59
3.2.3 Develop and test the QU-GENE software................................................................................59
3.2.4 Finalise the design of the simulation experiment.....................................................................59
3.2.5 Implementation of the simulation experiment..........................................................................60
3.2.6 Compilation of results of the simulation experiment................................................................61
3.2.7 Analysis and interpretation of the simulation experiment........................................................61
3.2.8 Evaluate the results of the simulation experiment in relation to the questions posed..............61
3.3 QUESTIONS PROPOSED FOR THE THESIS.............................................................................................61

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMxiv
PART II SIMULATION AS A MODELLING APPROACH ..............................................................63
CHAPTER 4 EXAMINING THE CONSISTENCY BETWEEN PREDICTIONS FROM
QUANTITATIVE GENETIC EQUATIONS AND QU-GENE SIMULATIONS OF KEY
GENETIC PROCESSES REQUIRED FOR MODELLING SELECTION RESPONSE .................65
4.1 INTRODUCTION ........................................................................................................................65
4.2 RECOMBINATION PREDICTION EQUATIONS .......................................................................................68
4.2.1 Materials and Methods ............................................................................................................69
4.2.1.1 Recombination and linkage disequilibrium ...................................................................................... 69
4.2.1.2 Theory underlying the breaking of linkage....................................................................................... 69
4.2.1.3 QU-GENE simulation of recombination .......................................................................................... 70
4.2.2 Results......................................................................................................................................72
4.2.2.1 Recombination and linkage disequilibrium ...................................................................................... 72
4.3 RESPONSE TO SELECTION PREDICTION EQUATIONS ...........................................................................74
4.3.1 Materials and Methods ............................................................................................................75
4.3.1.1 Theoretical prediction equations for mass, S1 family, and DH line selection methods..................... 75
4.3.1.1.1 Basic response to selection prediction equation ....................................................................... 75
4.3.1.1.2 Comstock’s response to selection prediction equations............................................................ 77
4.3.1.2 Simulating mass, S1 family and DH line selection methods............................................................. 81
4.3.1.2.1 Investigating convergence of expectation from prediction theory and simulation ................... 83
4.3.1.2.2 Verifying the number of generations of random mating required to reach linkage equilibrium84
4.3.2 Results......................................................................................................................................85
4.3.2.1 Response to selection prediction equations ...................................................................................... 85
4.3.2.1.1 Investigating convergence of expectation from prediction theory and simulation ................... 85
4.3.2.2 Verifying the number of generations of random mating required to reach linkage equilibrium....... 91
4.4 DISCUSSION......................................................................................................................................95
4.5 CONCLUSION ....................................................................................................................................98
CHAPTER 5 COMPARING QTL DETECTION ANALYSIS PROGRAMS AND SIMULATING
THE WHEAT GENOME IN QU-GENE ...............................................................................................99
5.1 INTRODUCTION .................................................................................................................................99
5.2 SELECTING A QTL DETECTION PROGRAM TO BE USED IN THIS THESIS ............................................100
5.2.1 Materials and Methods ..........................................................................................................101
5.2.1.1 Genetic models............................................................................................................................... 102
5.2.1.2 Creating the mapping population and generating the linkage groups............................................. 104
5.2.1.3 Conducting the QTL detection analysis.......................................................................................... 105
5.2.2 Results....................................................................................................................................105
5.2.3 Discussion..............................................................................................................................107
5.2.4 Conclusion .............................................................................................................................108
5.3 MODELLING THE WHEAT GENOME FOR QTL DETECTION ANALYSIS USING PLABQTL ..................110
5.3.1 Materials and Methods ..........................................................................................................112
5.3.1.1 Genetic models............................................................................................................................... 112
5.3.1.2 Creating the mapping population and generating the linkage groups............................................. 113
5.3.1.3 Conducting the QTL detection analysis.......................................................................................... 114
5.3.2 Results....................................................................................................................................114
5.3.3 Discussion..............................................................................................................................115
5.3.4 Conclusion .............................................................................................................................116

TABLE OF CONTENTS xv
PART III FACTORS AFFECTING THE POWER OF QTL DETECTION...................................117
CHAPTER 6 EFFECT OF MAPPING POPULATION SIZE, PER MEIOSIS RECOMBINATION
FRACTION AND HERITABILITY ON QTL DETECTION ...........................................................119
6.1 INTRODUCTION ...............................................................................................................................119
6.2 MATERIALS AND METHODS............................................................................................................121
6.2.1 Genetic models.......................................................................................................................121
6.2.2 Creating the mapping population and generating the linkage groups...................................121
6.2.3 Conducting the QTL detection analysis.................................................................................122
6.2.4 Conducting the statistical analyses........................................................................................122
6.3 RESULTS.........................................................................................................................................123
6.4 DISCUSSION....................................................................................................................................127
6.5 CONCLUSION ..................................................................................................................................129
CHAPTER 7 THE EFFECT OF GENOTYPE-BY-ENVIRONMENT INTERACTIONS AND
DIGENIC EPISTATIC NETWORKS ON QTL DETECTION ........................................................131
7.1 INTRODUCTION ...............................................................................................................................131
7.2.1 Genetic models.......................................................................................................................133
7.2.1.1 Core model..................................................................................................................................... 133
7.2.1.2 Digenic epistatic models; E(NK) = 1(10:1) .................................................................................... 134
7.2.1.3 G×E interaction models; E(NK) = 1(10:0), 2(10:0), 5(10:0), 10(10:0)........................................... 137
7.2.2 Creating the mapping population and generating the linkage groups...................................138
7.3 RESULTS.........................................................................................................................................140
7.3.1 Genetic Models: Additive and Epistatic.................................................................................140
7.3.2 Genetic Models: Additive and G×E interaction ....................................................................142
7.4 DISCUSSION....................................................................................................................................148
7.5 CONCLUSION ..................................................................................................................................152
PART IV SIMULATION OF PHENOTYPIC, MARKER AND MARKER-ASSISTED
SELECTION IN THE WHEAT GERMPLASM ENHANCEMENT PROGRAM.........................155
CHAPTER 8 SELECTION RESPONSE IN THE GERMPLASM ENHANCEMENT PROGRAM
FOR ADDITIVE GENETIC MODELS...............................................................................................157
8.1 INTRODUCTION ...............................................................................................................................157
8.2 MATERIALS AND METHODS.............................................................................................................161
8.2.1 Genetic models.......................................................................................................................161
8.2.2 Creating the mapping population and generating linkage groups ........................................162
8.2.3 Assigning marker profiles......................................................................................................164
8.2.5 Simulating phenotypic selection, marker selection and marker-assisted selection for S1
families in the Germplasm Enhancement Program ........................................................................166
8.2.6 Conducting the statistical analysis.........................................................................................169
8.3 RESULTS.........................................................................................................................................171
8.3.1 Number of QTL detected........................................................................................................171
8.3.2 Response to selection: phenotypic selection, marker selection, and marker-assisted selection
........................................................................................................................................................174
8.4 DISCUSSION....................................................................................................................................183
8.5 CONCLUSION ..................................................................................................................................187

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMxvi
CHAPTER 9 SELECTION RESPONSE IN THE GERMPLASM ENHANCEMENT PROGRAM
FOR COMPLEX GENETIC MODELS...............................................................................................189
9.1 INTRODUCTION ...............................................................................................................................189
9.2.1 Genetic models.......................................................................................................................194
9.2.2 Creating the mapping population and generating linkage groups ........................................197
9.2.3 Assigning marker profiles......................................................................................................197
9.2.5 Simulating phenotypic selection, marker selection, and marker-assisted selection for S1
families and DH lines in the Germplasm Enhancement Program ..................................................201
9.2.6.1 QTL detection analysis................................................................................................................... 203
9.2.6.2 Response to selection ..................................................................................................................... 204
9.3 RESULTS.........................................................................................................................................207
9.3.1 Analysis of the QTL detection results over all genetic models...............................................207
9.3.1.1 Percent of QTL segregating............................................................................................................ 207
9.3.1.2 Percent of QTL detected................................................................................................................. 207
9.3.1.3 Percent of QTL detected of those segregating................................................................................ 209
9.3.1.4 Percent of QTL detected with incorrect marker-QTL allele associations....................................... 211
9.3.2 Analysis of the trait mean value (response to selection)........................................................215
9.3.2.1 Analysis over 10 cycles of selection of the Germplasm Enhancement Program ............................ 215
9.3.2.2 Analysis conducted at cycle five of the Germplasm Enhancement Program.................................. 217
9.3.3 Detailed analysis of the trait mean value for specific genetic models ...................................219
9.3.3.1 Case 1: No G×E interaction, no epistasis; E(NK) = 1(12:0) ........................................................... 219
9.3.3.2 Case 2: G×E interaction present, no epistasis; E(NK) = 10(12:0)................................................... 222
9.3.3.3 Case 3: No G×E interaction, epistasis present; E(NK) = 1(12:5).................................................... 225
9.3.3.4 Case 4: G×E interactions and epistasis present; E(NK) = 10(12:5) ................................................ 229
9.3.4 General trends across E(NK) models ....................................................................................232
9.4 DISCUSSION....................................................................................................................................233
9.4.1 QTL detection analysis ..........................................................................................................233
9.4.2 Response to selection: S1 and DH with phenotypic selection, marker selection and marker-
assisted selection strategies ............................................................................................................238
9.5 CONCLUSION ..................................................................................................................................243
PART V GENERAL DISCUSSION AND CONCLUSIONS..............................................................245
CHAPTER 10 GENERAL DISCUSSION............................................................................................247
BIBLIOGRAPHY ..................................................................................................................................261
APPENDICES ........................................................................................................................................285
APPENDIX 1 ADDITIONAL INFORMATION ASSOCIATED WITH CHAPTER 4...................287
A1.1 ADDITIONAL INFORMATION FOR THE RESPONSE TO SELECTION PREDICTION EQUATIONS.............287
A1.1.1 Gene action definitions for different prediction equations..................................................287
A1.1.2 Alternate S1 family prediction equations .............................................................................287
A1.1.3 Effect of inbreeding on the variance components coefficient ..............................................288
A1.2 QUANTITATIVE GENETICS THEORY ASSUMPTIONS........................................................................290
A1.3 ASSUMPTION OF NORMALITY IN THE BASE POPULATION DOES NOT HOLD WHEN DOMINANCE IS
INCLUDED.............................................................................................................................................291

TABLE OF CONTENTS xvii
A2.1 GENERATING A LINKAGE MAP AND ITS ASSOCIATION WITH MAPPING POPULATION SIZE ..............299
A2.1.1 Model 1 - one chromosome, one QTL, two flanking markers..............................................300
A2.1.2 Model 2 - two chromosomes, three QTL per chromosome, two flanking markers per QTL301
A2.1.3 Model 3 - 10 chromosomes, one QTL per chromosome, two flanking markers per QTL....302
A2.1.4 Model 4 - 10 chromosomes, two QTL per chromosome, four flanking markers per QTL...303
A2.1.5 Conclusion...........................................................................................................................304
A2.2 QU-GENE INPUT FILES FOR QTL DETECTION ANALYSIS PROGRAMS...........................................305
A2.2.1 Model 1 - one chromosome, one QTL, two flanking markers..............................................305
A2.2.2 Model 2 - two chromosomes, three QTL per chromosome, two flanking markers per QTL305
A2.2.3 Model 3 - 10 chromosomes, one QTL per chromosome, two flanking markers per QTL....306
A2.2.4 Model 4 - 10 chromosomes, two QTL per chromosome, four flanking markers per QTL...307
A3.1 NUMBER OF QTL DETECTED........................................................................................................311
A3.2 RESPONSE TO SELECTION: PHENOTYPIC SELECTION, MARKER SELECTION, AND MARKER-ASSISTED
SELECTION............................................................................................................................................312
APPENDIX 4 ANALYSES OF VARIANCE FOR FACTORS AFFECTING THE DETECTION
OF QTL AND RESPONSE TO SELECTION ....................................................................................317
A4.1 FACTORS AFFECTING QTL SEGREGATION AND DETECTION..........................................................317
A4.2 ANALYSIS OF RESPONSE TO SELECTION .......................................................................................323
A4.3 RESPONSE TO SELECTION RESULTS ..............................................................................................331

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMxviii

LIST OF TABLES xix
List of Tables
Table 2.1 Estimated variance components (±s.e.) relative to F2 for grain yield (t ha-1
) of
recombinant inbred line derived from 11IBSWN50/Vasco and Hartog/Vasco
crosses tested in Queensland in 1989. Extract of Table 3 (Fabrizius et al.
1997) ..............................................................................................................................43
Table 2.2 Estimates of genetic parameters for grain yield (t ha-1) of 49 wheat lines
tested in six environments in Queensland. Extract of Table 10.1 (Cooper et al.
1996b) ............................................................................................................................46
Table 2.3 Estimated variance components (±s.e.) for grain yield (t ha-1
) of recombinant
inbred lines derived from two crosses, 11IBSWN50/Vasco and Hartog/Vasco,
tested at three sites in Queensland in 1989. Extract of Table 2 (Fabrizius et al.
1997) ..............................................................................................................................46
Table 2.4 Characterisation of the genetic architecture of a trait according to heritability
level and some of the factors affecting complexity. Adapted from (Cooper and
Hammer 1996) ...............................................................................................................54
Table 4.1 Experimental variable levels defined in the PEQ module to compare the
response to selection from simulation and expectations from prediction equa-
tions................................................................................................................................84
Table 4.2 Experimental variable levels used in the PEQ module to verify linkage
equilibrium results from Section 4.2 ..............................................................................85
Table 4.3 Average number of generations of random mating (RM) required to reach
linkage equilibrium (observed recombination fraction, R = 0.5) for three per
meiosis recombination fractions (based on linkage in coupling over 500 runs).
Results from Figure 4.3..................................................................................................85
Table 5.1 Experimental variables used to define each genetic model for the QUGENE
input file. Chr = chromosome, c = per meiosis recombination fraction and h2
=
heritability of trait on an observational unit, MP-LG = mapping population
size used to determine the linkage groups and MP-QTL = QTL detection
mapping population size...............................................................................................102
Table 5.2 QTL detection analysis results for a QTL mapping population size of 100
individuals: if QTL detected, if QTL not detected, IM = interval mapping
and CIM = composite interval mapping. NC = not conducted.....................................106
Table 5.6 Experimental variables used to define each genetic model for the QUGENE
input file. chr = chromosome, c = per meiosis recombination fraction and h2
=
heritability of trait on an observational unit, MP-QTL = QTL detection map-
ping population size .....................................................................................................112

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMxx
Table 6.1 Analysis of variance for the number of QTL detected. Degrees of freedom
(DF) and F values are shown for per meiosis recombination fraction (c),
heritability (h2
), and mapping population size (MP) and first-order interac-
tions. σ2
= error mean square........................................................................................123
Table 6.2 Number of QTL detected (averaged over 100 runs) for a simulated Germplasm
Enhancement Program mapping study for four mapping population sizes
(MP), two heritability levels (h2
) and three per meiosis recombination frac-
tions (c) between a marker and QTL. Percentage of QTL detected out of the
total number of polymorphic QTL also shown in parentheses.....................................126
Table 7.1 Experimental variable levels used to specify the core genetic models studied ............134
Table 7.2 The percentage of additive ( )2
Aσ , dominance ( )2
Dσ and epistatic ( )2
Kσ
variance of the total genotypic ( )2
Gσ variance for each of the models........................135
Table 7.3 The matrix of gene codes in each environment-type. A 0 indicates no G×E
interaction as the gene has no effect, a 1 indicates the gene follows m = mid-
point, a = additive, d = dominance values, a -1 indicates a crossover effect.
This table is set out so that as the number of environment-types increases the
level of complexity in the system increases as more genes are interacting with
the environment-type....................................................................................................138
Table 7.4 Degrees of freedom (DF) and F values shown for per meiosis recombination
fraction (c), heritability (h2
), mapping population size (MP), epistatic model
(B), and first-order interactions affecting the number of QTL detected. σ2
= er-
ror mean square............................................................................................................141
), mapping population size (MP), number of envi-
ronment-types (E), and first-order interactions affecting the number of QTL
detected. σ2
= error mean .............................................................................................143
Table 8.1 Experimental variable levels used to specify the core genetic models studied ............162
Table 8.2 Experimental variable levels utilised in the GEPMAS module. METs = multi-
environment trials, GEP = Germplasm Enhancement Program. ..................................166
Table 8.3 Number of polymorphic QTL for each bi-parental mapping population
replication and the number of QTL detected for each of the 36 genetic models.
Average across replications is also presented. c = per meiosis recombination
fraction between QTL and marker, h2
= heritability, MP = mapping population
size ...............................................................................................................................172
), mapping population size (MP), gene frequency
(GF), and first-order interactions affecting the number of QTL detected. σ2
=
error mean square.........................................................................................................173
), mapping population size (MP), gene frequency
(GF), Selection strategy (SS), cycles (cyc) and first-order interactions affect-
ing the response to selection. σ2
= error mean square ..................................................175
Table 9.1 Experimental variable levels defined in the QU-GENE engine to create the
genotype-environment genetic models.........................................................................196

LIST OF TABLES xxi
Table 9.2 Experimental variable levels utilised in the QTL detection analysis............................197
Table 9.3 Experimental variable levels utilised in the GEPMAS module....................................198

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMxxii

LIST OF FIGURES xxiii
List of Figures
Figure 1.1 Outline of the structure of investigations conducted to simulate the different
breeding strategies considered for the Germplasm Enhancement Program in
this thesis. Blue indicates the definition of genetic models and construction of
reference and base populations for the Germplasm Enhancement Program.
Yellow indicates the simulation of mapping and QTL experiments and the
green indicates the simulation of the breeding strategies of interest. The part
numbers indicate within which Parts of the thesis these phases are addressed ................8
Figure 2.1 Genetic map of the group 1 chromosomes of Triticeae (Vandeynze et al.
1995). The centromere of the chromosome is indicated by the bold letter C.................16
Figure 2.2 QTL detection analysis for a single chromosome with six markers (equally
spaced 0.2 Morgans apart) and three segregating QTL. The mapping popula-
tion size was 200. All six markers were significant for QTL effects using sin-
gle marker analysis (single marker). Interval mapping (IM) detected four sig-
nificant QTL peaks. Composite interval mapping (CIM) detected three signifi-
cant QTL peaks and multiple interval mapping (MIM) detected four signifi-
cant QTL peaks. Detection of false QTL may be a result of low population
size. The likelihood ratio threshold was set at 11.5. These simulated data were
generated using QU-GENE, the analyses were conducted in QTL CARTOG-
RAPHER (Basten et al. 1994, 2001)..............................................................................22
Figure 2.3 Outline of the wheat growing areas in Australia and the northern grains region.
Adapted from Montana Wheat & Barley Committee (2002) .........................................30
Figure 2.4 Components and pathways of germplasm transfer for yield improvement in the
Australian Northern Wheat Improvement Program: LRC-QDPI represents the
Queensland Department of Primary Industries pedigree breeding programs lo-
cated in Toowoomba at the Leslie Research Centre; PBI-US represents the
University of Sydney pedigree breeding programs located in Narrabri; and the
Germplasm Enhancement Program is conducted by the University of Queen-
sland (Cooper et al. 1999a) ............................................................................................31
Figure 2.5 Outline of the activities involved in the S1 family and doubled haploid (DH)
line breeding strategies over one cycle of the Germplasm Enhancement Pro-
gram. The S1 activities are adapted from (Fabrizius et al. 1996). MET = multi-
environment trial ............................................................................................................34
Figure 2.6 Example of additive×additive interaction. Shows favourable allelic combina-
tions aabb and AABB give the highest genotypic value ................................................40
Figure 2.7 Classification of genotype-by-environment (G×E) interactions, A and B are
two genotypes and lines represent the responses of the genotypes in two envi-
ronments; type 1 parallel response (no G×E interaction), type 2 non-crossover
response, type 3 crossover response...............................................................................45
Figure 2.8 Number of articles published in the last 34 years with “simulation” and either
“genetic*” or “plant breeding” as words anywhere in the AGRICOLA (1970-
12/2003), CAB (1984-1/2004), and Biological Abstracts (1984-12/2003) data-
bases. Note: some article duplication may have occurred. * represents all ex-
tensions of genetic. Each category contains five years, except the last which
contains 4 years..............................................................................................................49

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMxxiv
Figure 2.9 Number of articles published in the last 34 years with “marker assisted” or
“marker assisted and simulation” as words anywhere in the AGRICOLA
(1970-12/2003), CAB (1984-1/2004), and Biological Abstracts (1984-
12/2003) databases. Note: some article duplication may have occurred. Each
category contains five years, except the last, which contains 4 years ............................51
Figure 2.10 Schematic outline of the QU-GENE simulation software. The central ellipse
shows the engine and the surrounding boxes show the application modules
(Podlich and Cooper 1997, 1998)...................................................................................52
Figure 3.1 Iterative modelling methodology process used to design simulation
experiments for this thesis..............................................................................................58
Figure 4.1 Schematic outline of the LINKEQ module. Two opposing extreme inbred
individuals with two genes in coupling phase linkage were crossed to form the
F1, which was selfed to form the F2 population. The F2 population was sub-
jected to a number of generations of random mating until the observed fre-
quency of recombinant gametes reaches R ≥ 0.4. After each cycle of random
mating if the observed frequency of recombinant gametes R < 0.4, the F2
population is randomly mated until R ≥ 0.4 ...................................................................71
Figure 4.2 Number of generations of random mating required to reach an observed
recombination fraction of R = 0.4 between two genes for the simulation (with
standard deviation bars) using QU-GENE and the theoretical values calculated
from Equation (4.1) for a range of per meiosis recombination fractions. The
smaller the per meiosis recombination fraction, the tighter the linkage and the
more generations of random mating required to break the linkage ................................72
Figure 4.3 Number of generations of random mating required to reach an observed
recombination fraction of R = 0.5 between two genes for the simulation (with
standard deviation bars) using QU-GENE for a range of per meiosis recombi-
nation fractions. The smaller the per meiosis recombination fraction, the
tighter the linkage and the more generations of random mating required to
break this linkage ...........................................................................................................73
Figure 4.4 Schematic outline of the PEQ module, (a) mass selection strategy, (b) S1
family (self) and DH line (double) strategy. This example shows a two gene
model in coupling with a base population size of 1000 individuals...............................82
Figure 4.5 Response to selection for the mass selection strategy for the simulation (Sim),
with standard deviation bars, Basic prediction equation (Basic, Equation 4.3)
and Comstock prediction equation (Com, Equation 4.9). Response was as-
sessed in one environment (E = 1) with three gene levels (N = 2, 10, 50) and
no epistasis (K = 0), with a reference F2 population size of 1000, additive gene
action, and linkage equilibrium......................................................................................87
Figure 4.6 Response to selection for the S1 family selection strategy for the simulation
(Sim), with standard deviation bars, Basic prediction equation (Basic, Equa-
tion 4.4) and Comstock prediction equation (Com, Equation 4.11). Response
was assessed in one environment (E = 1) with three gene levels (N = 2, 10, 50)
and no epistasis (K = 0), with a reference S0 population size of 1000, additive
gene action, and linkage equilibrium. f is the number of progeny tested per S0
plant (level of replication) and b is the number of reserve seed intermated to
create the reference population after selection ...............................................................88
Figure 4.7 Response to selection for the DH line selection strategy for the simulation
(Sim), with standard deviation bars, Basic prediction equation (Basic, Equa-
tion 4.5) and Comstock prediction equation (Com, Equation 4.12). Response

LIST OF FIGURES xxv
was assessed in one environment (E = 1) with three gene levels (N = 2, 10, 50)
and no epistasis (K = 0), with a reference S0 population size of 1000, additive
gene action, and linkage equilibrium. f is the number of progeny tested per S0
plant (level of replication) and b is the number of reserve seed intermated to
create the reference population after selection ...............................................................90
Figure 4.8 Random mating reduced the effect of linkage disequilibrium for a per meiosis
recombination fraction of c = 0.05 to reach an observed linkage equilibrium of
R = 0.5 for the response to selection of the simulation (Sim) for the mass se-
lection strategy. Response to selection for the Basic (Basic) and Comstock
(Com) prediction equations are the same across all plots and assume linkage
equilibrium. A one environment (E = 1), 10 gene (N = 10) and no epistasis (K
= 0) genetic model was tested. A reduction in linkage equilibrium was ob-
served for both coupling and repulsion phase linkage....................................................92
R = 0.5 for the response to selection of the simulation (Sim) for the S1 family
selection strategy. Response to selection for the Basic (Basic) and Comstock
R = 0.5 for the response to selection of the simulation (Sim) for the DH line
selection strategy. Response to selection for the Basic (Basic) and Comstock
Figure 5.1 The three step process to follow allowing a QTL detection analysis to be
conducted on a simulated population ...........................................................................102
Figure 5.2 Schematic outline of the Model 1, 2, 3 and 4 linkage groups. For Model 1 and
2 the markers are spaced at 11 cM (c = 0.1) from each QTL or marker. For
Model 3 the markers are spaced at 5.2 cM (c = 0.05) from the QTL and for
Model 4 the markers are spaced at 5.2 cM (c = 0.05) from a marker and 2.5
cM (c = 0.025) from a QTL. The per meiosis recombination fraction was con-
verted to using the Haldane mapping function (Haldane 1931)...................................103
Figure 5.3 Schematic outline of artificially zooming in on regions of the wheat genome
containing QTL contributing towards a trait of interest. Simulation of the
wheat genome progressed from the genetic map of wheat (a), which may con-
tain 12 QTL of interest and can be represented for simulation using 21 linkage
groups, each with eight markers, and 12 linkage groups with one QTL (b), this
can be reduced to 12 chromosomes each containing a QTL (c) and then to 12
chromosome each with one QTL and two flanking markers (d). The Haldane
mapping function (Haldane 1931) was used to convert from per meiosis re-
combination fractions. Wheat genome figures (Nelson et al. 1995a, Nelson et
al. 1995b, Nelson et al. 1995c, Vandeynze et al. 1995, Marino et al. 1996) ...............111
Figure 6.1 A sample of articles (86) on plant QTL analysis was assessed on the basis of
the mapping population size used to find QTL and the number of QTL de-
tected per trait. The filled bars indicate the percentage of papers that reported a
mapping population size in the indicated range. The error bars indicate the

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMxxvi
minimum and maximum number of QTL per trait, with the filled circle indi-
cating the average. 51% of the papers used a mapping population size between
60 and 140 individuals .................................................................................................120
Figure 6.2 Schematic outline of the simulated linkage groups. Ten chromosomes, each
with one QTL and two flanking markers. The example here has the markers
spaced at 11 cM from the QTL, or a per meiosis recombination fraction of c =
0.1 on either side of the QTL when converted using the Haldane mapping
function (Haldane 1931)...............................................................................................121
Figure 6.3 Percent of QTL detected (averaged over 100 runs) for each significant
experimental variable from the analysis of variance. All levels within experi-
mental variable factors were significantly different. All 10 QTL were segre-
gating............................................................................................................................124
Figure 6.4 Significant first-order interactions from the analysis of variance for the
number of QTL detected. h2
= heritability, c = per meiosis recombination frac-
tion, MP = mapping population size.............................................................................125
Figure 7.1 Genotypic values for the six genetic models considered: (a) an additive model,
(b-d) are the random digenic epistatic networks and (e-f) are the McMullen
(2001), maysin and 3-deoxyanthocyanin digenic epistatic networks, respec-
tively.............................................................................................................................136
Figure 7.2 Number of QTL detected as a percentage of the total runs are shown for four
digenic epistatic models (E(NK) = 1(10:1)) with a heritability of h2
= 0.1, per
meiosis recombination fraction of c = 0.01(a-c) and c = 0.1 (d) with four map-
ping population sizes (MP = 100, 200, 500, 1000). Presence of false QTL oc-
curs when 11 QTL were detected.................................................................................142
Figure 7.3 Percent of QTL detected (averaged over 100 runs) for the number of
environment-types (a) and significant first-order interactions (b-c). h2
= herita-
bility, MP = mapping population size and E = number of environment-types.............143
Figure 7.4 Number of QTL detected as a percentage of the total runs are shown for
genetic models with no epistasis and either (a) one: E(NK) = 1(10:0), (b) two:
E(NK) = 2(10:0), (c) five: E(NK) = 5(10:0), or (d) 10: E(NK) = 10(10:0) envi-
ronment-types in the target population of environments with a heritability of
h2 = 0.25, per meiosis recombination fraction of c = 0.01 and four mapping
population sizes (MP = 100, 200, 500, 1000)...............................................................145
E(NK) = 2(10:0), (c) five: E(NK) = 5(10:0), or (d) 10: E(NK) = 10(10:0) envi-
E(NK) = 2(10:0), (c) five: E(NK) = 5(10:0), or (d) 10: E(NK) = 10(10:0), en-
vironment-types in the target population of environments with a heritability of
E(NK) = 2(10:0), (c) five: E(NK) = 5(10:0), or (d) 10: E(NK) = 10(10:0), envi-

LIST OF FIGURES xxvii
h2
= 1.0, per meiosis recombination fraction of c = 0.1 and four mapping
Figure 8.1 Schematic outline of the sequence of computer programs used to determine
response to selection in the GEP. QUGENE is the QU-GENE engine,
GEXPV2 used the output from QUGENE to create input data for PLABQTL.
PLABQTL then conducts the QTL detection analysis. GEPMAS is a QU-
GENE module that conducts S1 recurrent selection by phenotypic selection
and using the QTL detected by analysis using PLABQTL also conducts
marker selection and marker-assisted selection............................................................161
Figure 8.2 Schematic outline of the sequence of procedures used to simulate the creation
of the mapping population (for QTL detection analysis) and Germplasm En-
hancement Program base population. The orange arrows show the information
from the QTL detection utilised in marker selection (MS) and marker-assisted
selection (MAS) strategies. The two parents used to create the mapping popu-
lation are also included in the 10 parent structure used to create the half diallel
population of the Germplasm Enhancement Program S1 recurrent selection
breeding program (see Figure 8.3). PS = phenotypic selection, RIL = recombi-
nant inbred line.............................................................................................................163
Figure 8.3 Schematic outlines of the simulation of phenotypic selection (PS), marker
selection (MS), and marker-assisted selection (MAS) procedures in the S1 re-
current selection module (GEPMAS) used to simulate the Germplasm En-
hancement Program. For phenotypic selection, 1 indicates random mating of
the reserve seed from the seed increase after multi-environment trials (METs)
have been performed, for marker selection, the 2 indicates random mating of
the selected plants from the space plant population based on their marker pro-
file and for marker-assisted selection, 3 indicates random mating of the reserve
seed from the seed increase after marker profiles and multi-environment trials
have been performed. The three strategies of the Germplasm Enhancement
Program simulated here can be compared to the more detailed description of
the Germplasm Enhancement Program given in Chapter 2, Figure 2.5 .......................167
Figure 8.4 Significant main effects from the analysis of variance for the number of QTL
detected. All effect levels were significantly different except for those indi-
cated by the same letter ................................................................................................173
Figure 8.5 Significant main effects from the analysis of variance for response to
selection. Response to selection expressed relative to the maximum potential
response to selection (%TG) where TG = target genotype. All effect levels
were significantly different except for those indicated by the same letter ...................176
Figure 8.6 Significant first-order interactions from the analysis of variance for the
response to selection. Response to selection expressed relative to the maxi-
mum potential response to selection (%TG) where TG = target genotype. SS =
selection strategy, c = per meiosis recombination fraction, h2
= heritability, GF
= gene frequency, MP = mapping population size.......................................................177
Figure 8.7 Response to selection expressed as percentage of target genotype (average of
the five bi-parental mapping population replicates) for phenotypic selection
(PS), marker selection (MS) and marker-assisted selection (MAS) over 10 cy-
cles of the Germplasm Enhancement Program. E(NK) = 1(10:0), GF = 0.1, h2
= 0.25 (a-c) and h2
= 1.0 (d-f), c = 0.01, and three mapping population sizes
(MP = 200, 500, 1000). TG = target genotype.............................................................179

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMxxviii
= 0.25 (a-c) and h2
cles of the GEP. E(NK) = 1(10:0), GF = 0.5, h2
= 0.25 (a-c) and h2
= 1.0 (d-f),
c = 0.01, and three mapping population sizes (MP = 200, 500, 1000). TG =
target genotype.............................................................................................................181
= 0.25 (a-c) and h2
Figure 9.1 Outline of the structure of investigations of the thesis towards the simulation
of different breeding strategies. Blue indicates the definition of genetic models
and construct reference and base populations for the Germplasm Enhancement
Program. Yellow indicates the simulation of mapping and QTL experiments
and the green indicates the simulation of the breeding strategies of interest.
The part numbers indicate which parts of the thesis these phases are addressed
in (Replication of Chapter 1, Figure 1.1; included here for ease of reference) ............193
Figure 9.2 Schematic outline of the linkage groups. There were 12 chromosomes each
with one QTL and two flanking markers. The example has the markers spaced
at 11 cM from the QTL, equivalent to a per meiosis recombination fraction of
c = 0.1 on either side of the QTL using the Haldane mapping function
(Haldane 1931).............................................................................................................196
Figure 9.3 Schematic outline of the simulation of phenotypic selection (PS), marker
selection (MS) and marker-assisted selection (MAS) procedures in the DH line
recurrent selection module (GEPMAS) used to simulate the Germplasm En-
hancement Program. For PS, 1 indicates random mating of the reserve seed
from the seed increase after multi-environment trials have been performed, for
marker selection, 2 indicates random mating of the selected plants from the
space plant population based on their marker profile, and for marker-assisted
selection, 3 indicates random mating of the reserve seed from the seed in-
crease after marker profiles and multi-environment trials have been performed.
The implementation of DH line recurrent selection in the Germplasm En-
hancement Program can be compared to the S1 family implementation in
Chapter 8, Figure 8.3....................................................................................................202
Figure 9.4 Significant main effects from the analysis of variance for the percent of QTL
segregating. All effect levels were significantly different except for those indi-
detected. All effect levels were significantly different except for those indi-
Figure 9.6 Significant first-order interactions from the analysis of variance for the percent
of QTL detected. All effect levels were significantly different except for those

LIST OF FIGURES xxix
indicated by the same letter. GF = starting gene frequency, K = epistasis level,
E = number of environment-types, c = per meiosis recombination fraction, and
h2
= heritability.............................................................................................................209
detected of those segregating. All effect levels were significantly different ex-
cept for those indicated by the same letter ...................................................................210
of QTL detected of those segregating. All effect levels were significantly dif-
ferent except for those indicated by the same letter. GF = starting gene fre-
quency, K = epistasis level, E = number of environment-types, and h2
=
heritability ....................................................................................................................211
Figure 9.9 Significant main effects from the analysis of variance for the percent of
incorrect marker-QTL allele associations. All effect levels were significantly
different except for those indicated by the same letter.................................................212
of QTL detected with incorrect marker-QTL allele associations. All effect lev-
els were significantly different except for those indicated by the same letter.
GF = starting gene frequency, K = epistasis level, E = number of environment-
types and h2
= heritability.............................................................................................213
Figure 9.11 Percent of QTL detected with incorrect marker-QTL allele associations (IAA)
against the percent of QTL detected, and the percent of replications containing
those combinations for (a) a simple additive case, E(NK) = 1(12:0), (b) in-
creasing epistasis value E(NK) = 1(12:5), (c) increasing the number environ-
ment-types E(NK) = 10(12:0), and (d) increasing both epistasis and environ-
ment-types E(NK) = 10(12:5) for a per meiosis recombination fraction of c =
0.05, gene frequency of GF = 0.1 and heritability of h2
= 1.0.....................................214
Figure 9.12 Significant main effects from analysis of variance conducted over 10 cycles of
the Germplasm Enhancement Program. All experimental variable levels were
significantly different except epistasis where levels of zero and two were not
significantly different. All effect levels were significantly different except for
those indicated by the same letter.................................................................................216
Figure 9.13 Significant first-order interactions from the analysis of variance conducted
over 10 cycles of the Germplasm Enhancement Program. K = epistasis level, E
= number of environment-types, SS = selection strategy, PT = population type .........217
Figure 9.14 Significant main effects from analysis of variance conducted at cycle five of
the Germplasm Enhancement Program. All experimental variable levels were
significantly different...................................................................................................218
Figure 9.15 Average percent of QTL segregating (Seg), detected (Det), detected of
segregating (D/S) and incorrect marker-QTL allele associations (IAA), with
corresponding trait mean value response as a percent of the target genotype for
phenotypic selection (PS), marker selection (MS) and marker-assisted selec-
tion (MAS) of S1 families and DH lines for a E(NK) = 1(12:0) model with
gene frequency (GF) of 0.1, two per meiosis recombination fractions (c) 0.05
and 0.1 and two heritabilities (h2
) 0.1 and 1.0..............................................................220

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMxxx
) 0.1 and 1.0..............................................................221
Figure 9.17 400 replications of the response to selection for DH and S1 families for the
three selection strategies (phenotypic selection (PS), marker selection (MS)
and marker-assisted selection (MAS)), E(NK) = 1(12:0) model with gene fre-
quency of 0.1, per meiosis recombination fraction of 0.1 and heritability of
1.0. Corresponds to the set of graphs in Figure 9.15b..................................................222
) 0.1 and 1.0..............................................................223
) 0.1 and 1.0..............................................................224
) 0.1 and 1.0..............................................................226
) 0.1 and 1.0..............................................................228

LIST OF FIGURES xxxi
) 0.1 and 1.0..............................................................230
) 0.1 and 1.0..............................................................231

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMxxxii

LIST OF ABBREVIATIONS xxxiii
List of Abbreviations
α Critical value
ANOVA Analysis of variance
c Per meiosis recombination fraction
cM centiMorgans
Chr Chromosome
CIM Composite interval mapping
CIMMYT The International Center for Maize and Wheat Improvement
D/S Percent of QTL detected of those segregating
Det Percent of QTL detected
DF Degrees of freedom
DH Doubled haploid
DNA Deoxyribonucleic acid
E Number of environment-types as per the E(NK) model
E(NK) Number of environment-types (E), number of genes (N) and the
level of epistasis (K)
Fn Filal generation n
F value Calculated F statistic value to be compared to a threshold in the F
distribution
GEP Germplasm Enhancement Program
GEXP Genetic Experiments (QU-GENE module)
GEPMAS QU-GENE module used to conduct simulation experiments of the
Germplasm Enhancement Program with phenotypic selection,
marker selection and marker-assisted selection
GF Gene frequency
G×E Genotype-by-environment
h2
Heritability of trait on an observational unit basis
IAA Incorrect marker-QTL allele association (Type III QTL detection
error)
IM Interval mapping
K Level of epistasis as per the E(NK) model
LG Linkage group
LINKEQ QU-GENE module used to conduct the linkage equilibrium
experiments
LOD log10 likelihood odds ratio
lsd Least significant difference
M Morgans
MAS Marker-assisted selection
MET Multi-environment trial
MP Mapping population size
MS Marker selection
N Number of genes as per the E(NK) model
NWIP Northern Wheat Improvement Program
PEQ QU-GENE module used to compare simulation against theoretical
prediction equations

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAMxxxiv
PLABQTL QTL detection analysis software (PLAnt breeding and Biology
QTL)
PS Phenotypic selection
QCC QU-GENE computing cluster
QTL Quantitative trait loci
QTL×E Quantitative trait loci-by-environment
QUGENE QU-GENE genotype-environment system engine
QU-GENE Genetic analysis simulation software
RIL Recombinant inbred line
RM Random mating
S1 Self-pollinated for one generation following an inter-individual
cross
Seg Percent of QTL segregating
TG Target genotype
TPE Target population of environments

PART I BACKGROUND 1
PART I
BACKGROUND

2 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

CHAPTER 1 INTRODUCTION 3
CHAPTER 1
INTRODUCTION
The motivation for and focus of the research reported in this thesis was based on
the need for strategic research to support the continued evolution of a breeding strategy
for yield improvement of wheat in the northern grains region of Australia (Northern
Wheat Improvement Program). There has, and continues to be a long-term commitment
to the improvement of yield potential, adaptation and stability of performance of wheat
within the context of the complex target populations of environments (TPE: Comstock
1977) in this dryland farming region (e.g. Brennan and Byth 1979, Brennan et al. 1981,
Cooper et al. 1996a). This historical long-term wheat breeding effort, and the associated
research, has provided a large body of empirical data on the important factors that can
impact yield performance of wheat in this region. The evolution to a pedigree breeding
strategy that was in place in the 1990s was an outcome of empirically evaluating
modifications and suggestions for improvements, and where evidence dictated,
adjustments were made to the breeding program. Strengths and weaknesses of the
incumbent pedigree breeding strategy were recognised and the overall breeding effort
was altered to incorporate backcross breeding. This was targeted at incorporating genes
for specific traits, and recurrent selection methodology, to enhance the pool of locally
adapted inbred lines used as parents in the pedigree breeding program.
During the 1990s the impetus for further enhancements to the overall breeding
effort grew with the availability of molecular marker technology (e.g. restriction
fragment length polymorphisms (RFLP), randomly amplified polymorphic deoxyribo-
nucleic acid (RAPD), amplified fragment length polymorphisms (AFLP) and simple
sequence repeat (SSR); Nadella 1998, Susanto 2004) and doubled haploid (DH) line

production technology (e.g. Jensen and Kammholz 1998). It was recognised that
empirical evaluation of all potential modifications to the incumbent breeding strategy
was impractical for reasons of cost and ability to conduct sufficiently large experiments
to evaluate the power of suggested alternative breeding strategies. Therefore, to support
the empirical research underway on the genetic architecture of yield and the impact of
alternative breeding strategies on improving yield, an investment was made to develop
computer simulation technologies that would enable realistic modelling of the impact
and power of alternative breeding strategies (Podlich and Cooper 1998, Podlich 1999).
This simulation approach gave rise to a co-ordinated research effort with goals to: (i)
obtain empirical results on the genetic control of variation for important traits and their
contributions to yield; (ii) investigate appropriate theoretical models for quantitative
traits; (iii) develop simulation software and high performance computing infrastructure;
and (iv) use these in combination to conduct the strategic research necessary to evolve
the wheat breeding strategies used in the northern grains region.
This thesis is one component of the larger strategic research effort. As such, the
work reported here relies heavily on the empirical genetic research conducted by others
(Cooper et al. 1997, Fabrizius et al. 1997, Nadella 1998, Peake 2002, Jensen 2004,
Susanto 2004) and the simulation infrastructure and methodology developed by others
(Podlich and Cooper 1998, Micallef et al. 2001, Cooper and Podlich 2002). The specific
focus of this thesis, was on the use of computer simulation to evaluate the opportunity
to enhance the rate of genetic gain for quantitative traits within the recurrent selection
Germplasm Enhancement Program component of the Northern Wheat Improvement
Program. The technologies of interest to this evaluation were molecular markers, to
enable marker-assisted selection, and DH production, to rapidly generate inbred lines
for evaluation in multi-environment trials. This thesis reports the results of the computer
simulation investigations that were undertaken to make recommendations on how these
two breeding technologies could be used to enhance the long-term genetic gain from the
Germplasm Enhancement Program. A parallel series of investigations have been
undertaken for other components of the Northern Wheat Improvement Program (e.g.
Jensen 2004).

The current structure of the Germplasm Enhancement Program is a S1 (self-
pollinated for one generation following an inter-individual cross) recurrent selection
program operating as a parent building component of the Northern Wheat Improvement
Program of Australia (Fabrizius et al. 1996). Recurrent selection programs are con-
ducted to achieve medium and long-term genetic improvement by increasing the
frequency of favourable alleles for genes and gene combinations (Hallauer and Miranda
1988). Optimising the allocation of resources to activities within the Germplasm
Enhancement Program to achieve its role in the Northern Wheat Improvement Program
is a complex problem. There is interest in how effectively markers can be used to
enhance the current phenotypic selection strategy. Any modified breeding strategy will
need to be robust for multiple traits that differ in their genetic architecture, ranging from
simple additive to more complex situations including epistatic and genotype-by-
environment (G×E) interactions. The importance and influence of G×E interactions and
epistasis in the northern grains region, and specifically for the germplasm of relevance
to the Germplasm Enhancement Program, have been outlined in many studies (Brennan
and Byth 1979, Brennan et al. 1981, Cooper et al. 1994a, 1994b, Cooper and DeLacy
1994, Cooper et al. 1996b, Fabrizius et al. 1997, Basford and Cooper 1998, Peake 2002,
Jensen 2004) and are considered as components for the genetic models investigated in
this thesis.
Marker-assisted selection is a recent technological advancement in wheat breed-
ing programs (Howes et al. 1998). Many species now have a sufficient number of
markers to create dense maps and localise associated QTL (Moreau et al. 2000).
Theoretical studies have shown that marker-assisted selection is capable of improving
the efficiency of selection (Lande and Thompson 1990, Lande 1992, Dudley 1993), and
much of the mapping / marker-assisted selection literature reports that knowing the
position of QTL regions and markers will enable breeders to increase the rate of
response of a breeding program. However, moving from these general statements and
evaluating the impact of marker-assisted selection within an applied breeding program
context is not a simple task. The cost of conducting marker-assisted selection experi-
ments in the past has been an expensive venture for a relatively unknown benefit,
resulting in examples of marker-assisted selection rarely being empirically evaluated in

large field experiments (Young 1999, Moreau et al. 2000). The ability to use computer
simulation to model a plant breeding program and conduct in silico, many cycles of
breeding, provides a tool that allows a breeder to determine the impact of a selection
strategy on a breeding program with relatively less time and cost involved than in the
case for field experiments. Computer simulation has been evolving over the past 40+
years (e.g. Fraser 1957a, Kempthorne 1988, Podlich and Cooper 1998), and with the
increase in modern computer speeds, simulation has the potential to be a useful tool in
exploring the response to selection of a breeding program and to help with the decision
making process. Computer simulation research methodologies are also widely applied
outside of the discipline of genetics and plant breeding (e.g. Casti 1997a, Schrage 1999,
Wolfram 2002).
The computer simulation platform QU-GENE, was designed for the quantitative
analysis of genetic models and can be used to model plant breeding programs (Podlich
and Cooper 1998). The two-stage architecture of QU-GENE allows many independent
modules, representing alternative breeding strategies, to be attached to multiple genetic
models of a genotype-environment system defined in the QU-GENE engine. These
modules have the ability to explore a range of breeding strategies, construct mapping
populations and produce multiple breeding population structures. QU-GENE has the
ability to simulate generic genetic model problems, but it can also be used to model
specific breeding programs (e.g. Fabrizius et al. 1996, Jensen 2004).
The question posed at the initiation of this thesis was: “Is there a difference in
the expected response to selection of the Germplasm Enhancement Program for S1
families and DH lines when either phenotypic selection, marker selection or marker-
assisted selection is implemented and both G×E interaction and epistasis influence the
trait of interest?” To answer this question using quantitative genetics theory would be
difficult as the algebraic equations needed to model these systems are intractable as they
would require relaxing many assumptions. To answer this question empirically is not
feasible as it would require many years of field experimentation and significant
resources that are well beyond the scope of the breeding program. Following prelimi-
nary studies (Kruger 1999), and experiences gained from other projects (Fabrizius et al.

1996, Jensen 2004), simulation was identified as an appropriate platform on which to
seek answers to this question and was used for this thesis.
A schematic outline (Figure 1.1) presents an overview of how each part of the
thesis is interrelated. It was important to undertake the work completed in each of the
proceeding parts to enable the thesis to develop an answer to the key question posed
above. Part I provides the foundation knowledge underlying the concepts examined in
this thesis (not shown on figure). Part II investigates the convergence of simulation and
theory to acquire experience with simulation methods and to determine whether
simulation was an appropriate extension of quantitative genetics theory for the objec-
tives of this thesis. Part II also includes investigations into which QTL detection method
and analysis program to use and to determine whether a reduced genome model could
be used instead of the full wheat genome model for the simulation of a QTL detection
experiment. Part III investigates how QTL detection would be implemented in the
Germplasm Enhancement Program, and how linkage maps would be created. Part III
also evaluates the influence of population size, heritability, per meiosis recombination
fraction, epistasis and G×E interactions on the detection of QTL. This section was
important for the thesis as there was a need to determine the most efficient method for
mapping QTL, conducting a QTL detection analysis using an additional stand alone
program, and incorporating these results back into QU-GENE to simulate the breeding
strategies considered. In Part IV, the work completed in the previous parts allowed a
detailed investigation to be conducted of the opportunities to implement marker-assisted
selection for S1 families and DH lines into the Germplasm Enhancement Program.

Modelling
Methodology:
Defining & validating a
modelling approach
Base Population
Mapping
Population MS & MAS
QTL
analysis
alogithms
QTL
information
GermplasmEnhancement Program
MASMS
⊗
PS
PS
⊗
Part II
Part III
Part IV
Figure 1.1 Outline of the structure of investigations conducted to simulate the different
breeding strategies considered for the Germplasm Enhancement Program in this thesis.
Blue indicates the definition of genetic models and construction of reference and base
populations for the Germplasm Enhancement Program. Yellow indicates the simulation of
mapping and QTL experiments and the green indicates the simulation of the breeding
strategies of interest. The part numbers indicate within which Parts of the thesis these
phases are addressed (Part I refers to the background literature and is not shown in figure)
This thesis is structured into the following five parts:
Part I: Background (Chapters 1-3): Within this section the foundation and background
to the study is given with the relevant literature reviewed.
Part II: Simulation as a modelling approach (Chapters 4 and 5): The objective of this
section was to introduce the concepts behind the quantitative genetic theory used in
plant breeding programs and how they apply in a computer simulation environment.
This was done by first exploring the convergence between quantitative theory and
computer simulation as two ways of encoding a breeding system into a formal mathe-
matical system for analysis by quantitative methods (Casti 1997a). To focus this
comparison selected topics relevant to this thesis were considered. Simulation experi-
ments were extended from simple genetic models to more complex genetic models for
mass selection, S1 family and DH line population types. Recombination was examined

in greater detail because of its importance in modelling QTL detection and marker-
assisted selection. Preliminary exploration was conducted on how recombination is
modelled in simulation and the effect of generation time on breaking linkages, an
important concept in long-term marker-assisted selection. A comparison between QTL
detection analysis programs to determine their reliability and the ease with which they
could be run in batch mode was also conducted. PLABQTL (Utz and Melchinger 1996),
was selected as the program to be used for this thesis. An experiment was also con-
ducted to determine whether the detection of QTL was affected by the size of the wheat
genome represented in the simulation experiments. A comparison was made between a
12 chromosome, 12 QTL, two flanking markers per QTL genome model as opposed to a
21 chromosome, 12 QTL, eight flanking markers per QTL wheat genome model
representation.
Part III: Factors affecting the power of QTL detection (Chapters 6 and 7): The objective
of this section was to test a range of factors that may affect the detection of QTL in the
mapping studies underway for the Germplasm Enhancement Program (Nadella 1998,
Cooper et al. 1999a, Susanto 2004). The factors included in this study were mapping
population size, heritability, per meiosis recombination fraction, epistasis, and G×E
interaction. By testing these factors, their influence on QTL detection was determined
and recommended values were established for the variables such as population size,
marker density (defined in terms of per meiosis recombination rate between adjacent
markers) and target heritability for phenotyping. The influence of epistasis and G×E
interactions on QTL detection was also determined.
Part IV: Simulation of phenotypic, marker, and marker-assisted selection in the wheat
Germplasm Enhancement Program (Chapters 8 and 9): The objective of this section
was to apply the outcomes of Parts II and III to a simulation of an applied breeding
situation and determine the effect of marker-assisted selection versus phenotypic
selection and pure marker selection in the Germplasm Enhancement Program. The
response to selection of the Germplasm Enhancement Program for a range of genetic
models, including effects of epistasis and G×E interactions, was examined. The

prospect of using marker-assisted selection to enhance the outcomes of the Germplasm
Enhancement Program for both S1 families and DH lines was determined.
Part V: General discussion and conclusions (Chapter 10): This final section of the
thesis integrates the main findings and developments from Parts I to IV and discusses
issues associated with the design of marker-assisted selection strategies in plant
breeding and the recommendations for the inclusion of marker-assisted selection in the
Germplasm Enhancement Program.

CHAPTER 2 REVIEW OF LITERATURE 11
CHAPTER 2
REVIEW OF LITERATURE
2.1 Introduction
This review is structured to give a balance of considerations of the literature
relevant to modelling marker-assisted selection in a plant breeding program. These
considerations provide much of the background for the design of the series of simula-
tion experiments conducted in the following Chapters of this thesis. Conventional
selection techniques presently utilised in plant breeding programs are outlined, with an
overview of molecular markers, QTL detection and marker-assisted selection also
given. The Germplasm Enhancement Program goals and strategy are provided as the
specific wheat breeding program case study under investigation. Epistasis, G×E
interaction, and per meiosis recombination fraction are discussed as important factors
that may influence marker-assisted selection as they can introduce potential complica-
tions that can affect the ability to detect true QTL (i.e. QTL that do exist), and define
favourable genotypes for multiple QTL models of traits. This is followed by a review of
computer simulation in genetics, including an overview of the QU-GENE software, the
simulation platform used throughout this thesis. While these review sections build a
foundation for the concepts and experiments used in this thesis, additional relevant
literature is introduced as necessary in the following Chapters.

2.2 Plant breeding programs: a review of traditional
and molecular selection techniques
2.2.1 Traditional selection
For centuries farmers have been improving crop germplasm by visually selecting
plants with the preferred phenotype and using the selected plants to produce seed for the
next generation of cropping. This system of phenotypic selection is commonly referred
to as mass selection. More recently, beginning in the late part of the 19th
century and
early part of the 20th
century, universities, public institutions, private companies and
corporations have taken over this role by designing and managing plant breeding
programs to produce and supply improved genotypes to farmers. Through this evalua-
tion of breeding strategies, plant breeding programs have evolved from simple mass
selection procedures to sophisticated formal plant breeding programs.
The success of a breeding program can be estimated by monitoring the differ-
ence between the mean phenotypic value of the offspring and the parental generation
before selection (Falconer and Mackay 1996). Any change in the mean genetic value of
a population due to the influence of selective forces is termed the realised response to
selection or genetic gain. The basic principle of any plant breeding program is the
continuous improvement of the target species, achieved by maintaining the long-term
response to selection while sustaining new cultivar development using the short-term
response to selection (Hallauer 1981).
For a given trait, predicted response to selection ( )ΔG quantifies the expected
genetic gain achievable in any cycle of selection. Equally, realised response to selection,
measured by comparing the performance of successive cycles of selection, indicates
how much of a prediction was obtained in practice (Duvick et al. 2004). The plant
breeder’s role is to control the intensity and speed of this genetic improvement by
changing the genetic structure of a population (Williams 1964). By understanding the
underlying concepts of the components of the direct response to selection prediction
equation for a trait y,
2
yy y y pG i h σΔ = , (2.1)

Narelle Kruger PhD thesis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Narelle Kruger PhD thesis

Similar to Narelle Kruger PhD thesis (20)

Narelle Kruger PhD thesis