GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore

1,112 views
866 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,112
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
34
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore

  1. 1. Rajeev K. Varshney and Abhishek Rathore Email: r.k.varshney@cgiar.org a.rathore@cgiar.org Genome-Wide Selection Update GCP General Research Meeting Session IX 30th September 2013
  2. 2. ISMU 1.0 Challenges in SNP Detection • Mostly Command-line based Linux Tools • Multiple steps involved • Difficult pre-processing & cleaning of raw data • Specialized skills required to process the job • Developing genotyping assays (GoldenGate and KASPar) • Very few user-friendly software
  3. 3. Solution: ISMU 1.0 Pipeline Features: – Multicore Architecture – One stop shop for SNP detection – Graphical User Interface – Automated Cleaning of Data – Integration of various popular alignment tools – Customized operation of tools for advanced users – Available in Online and Standalone versions – Easy Installation – Works on CentOS, RHEL & Fedora – Visualization of SNP and Alignment (TABLE/FLAPJACK)
  4. 4. Raw Reads Reference ISMU V1.0 Assemble& Align Raw Reads Mine SNPs Generate Marker Matrix Visualizein TABLET and FLAPJACK Export in FLAT Files • Assemble & Align Raw Reads • Mine SNPs • Generate Marker Matrix • Automated Visualize in TABLET and FLAPJACK • Developing genotyping assays • Export in FLAT Files ISMU V1.0
  5. 5. ISMU 1.0 Standalone Edition Selection of Alignment Tool & SNP Approach
  6. 6. ISMU 1.0 Standalone Edition Results
  7. 7. Locus Forward Polymorphism Reverse TC00001_1272 CGCTCAAGAGAACCAGTGTTGGAATGGTGGCGGCGATGGCTGTATTTCCA A/T GAAAAGTAAGGGACTAGAAG TC00075_852 T GAGATGTTCCTATCACCAATGCAAATATCAGGGCAAATGCACTAACATA C/T TTGAGTAAATTTCCCATCTT TC00118_13765 AATTAAGTTAGTAATGACTGGACGAAACCAAGAAATAACTACTTACGTGC T/G AAATTATAGAAGGTCTCCTG TC00130_2668 GTTGTTGATCGAAAGAAAATTTAATTTCTTGTTCGACTGATCACCTTGCT G/A GGTTCCAACTATTCTAAAGT TC00191_3430 TTAATGAATTTGCTTCATCGTCCAAGGTTTACCATTTAGGTGGGTAGAGC T/C ACAGAAATTAAGTATCTGGT TC00212_866 CCCATGTCAATCATCCCAATTTTCTTGCATAAATTATCCTTAAATGGATA G/T CTTTACGTATGATGCTGATC TC00295_2234 AGCCAGTGGAAGCTCCACCAGCAGCAGTAGCAGAAGTTCCAATTGAGACT C/T CTGAAGCTTAGACCAATGGA TC00329_2112 GAGGCGTGAAAAGAAAAAGGCAAAGGAGGAGAGGGAGAAGCAAATAAGGG A/C TGCTGAGGAAAGACTACTGG TC00336_3122 CTGAAATGGAGTGTTTTTATACAAGTTGTAAATAGTGATGTTTTGTACAT C/T TTTCTGGAAGATGATTCATG [HEADING] Customer_Name Company_Name Email_Address Platform_Type GGGT Format_Type Gene; Region; Sequence; Identity; ExistingDesign; or Score [select one] Design_iteration prelim Species Number_of_Assays [DATA] Locus_name,Target_Type,Sequence,Chromosome,Coordinate,Genome_Build_Version,Source,Source_Version,Sequence_Orienta TC00001_1272,SNP,TACTTCATCCCGCTCAAGAGAACCAGTGTTGGAATGGTGGCGGCGATGGCTGTATTTCCA[A/T]GAAAAGTAAGGGACTAGAAGGGCAGAGTGGA 72,0,0,0,Forward,Plus TC00075_852,SNP,TTGTCGACATTGAGATGTTCCTATCACCAATGCAAATATCAGGGCAAATGCACTAACATA[C/T]TTGAGTAAATTTCCCATCTTCATTTGCACAAA ,0,0,0,Forward,Plus TC00118_13765,SNP,ATCTAAAAATAATTAAGTTAGTAATGACTGGACGAAACCAAGAAATAACTACTTACGTGC[T/G]AAATTATAGAAGGTCTCCTGTAAGATCCAA 3765,0,0,0,Forward,Plus TC00130_2668,SNP,TGCGGTCATTGTTGTTGATCGAAAGAAAATTTAATTTCTTGTTCGACTGATCACCTTGCT[G/A]GGTTCCAACTATTCTAAAGTAATACAGGCAT 68,0,0,0,Forward,Plus KASPar ILLUMINA
  8. 8.  MABC, MARS and GS approaches seem to most promising for crop improvement  Need to have genomic resources and cost- effective genotyping platforms  Breeders-friendly pipelines and decision support tools required for prediction of phenotype Novel breeding approaches for developing countries MBDT MBDT OptiMAS GS ?
  9. 9. Breeding Cycle Crossing Field evaluation Line Selection y ir R A t σ = genetic gain over time years per cycle selection intensity selection accuracy genetic variance NEW cheaper to genotype = larger populations for same $$ make selections in ‘off target’ years maintain favorable rare alleles Select years earlier on single plant basis Inbreeding Multi-location, Multi-year testing Seed Increase Based on discussions with several colleagues e.g. Jesse Poland, J-L Jannink, Gary Atlin
  10. 10. GS-Models • Usually involves relatively high number of markers • To meet the challenges, statistical methods that can handle high-dimensional data have been developed • However, their respective properties are still not fully understood, • Causing considerable uncertainty about the choice of models for genomic prediction • Factors affecting GS are also not very clear
  11. 11. GS ISMU V2 Raw Reads Reference Assemble& Align Raw Reads Mine SNPs Generate Marker Matrix Visualizein TABLET and FLAPJACK Export in FLAT Files GDMS Genotypic Matrix & QTLs Lines selected for further crossing in GS External Genotyping Platforms Called SNPs ISMU V2.0
  12. 12. GS-Models • To meet the challenges, statistical methods that can handle high-dimensional data have been developed • However, their respective properties are still not fully understood • Causing considerable uncertainty about the choice of models for genomic prediction • Factors affecting GS are also not very clear
  13. 13. Factors Affecting GS-Models • Marker density, genome size and structure • Size of the training population • Historical effective population size • Trait heritability • Relationship between training population & selection candidates • Number of genes and distribution of their effects • Method used for the estimation of marker effects • GxE
  14. 14. Validation Studies • Fit available models • Cross Validation • Prepare a matrix of validation scores • Compare over the multiple environments • Select Final model Training set Testing set Cross Validation K(=5) - fold cross-validation
  15. 15. ISMU 2.0 Pipeline Analysis Capabilities to ISMU 1.0 • GUI for Genomic Selection • Multicore Support • R and Fortran Libraries for GS • Project Mode Development • IDE Supports • Multiple Method & Traits at once • Platform Support – Windows x64 and x32 – CentOS x64 and Ubuntu x64 – MAC (Under Testing…) In collaboration with J L Jannink, John Hickey and Aaron Lorenz
  16. 16. • Data Diagnostics – Graphical Summary – Tabular Summary • Subset Data – Missing % – MAF – PIC • Genomic Selection – RR-BLUP – Kinship Gauss – Bayesian LASSO – BayesB and BayesCπ – Random Forest Regression (RFR) • HTML & PDF Output ISMU 2.0 Pipeline Analysis Capabilities to ISMU 2.0
  17. 17. ISMU 2.0
  18. 18. ISMU 2.0
  19. 19. Browse Data
  20. 20. Data in ISMU2.0
  21. 21. Calculation of Marker Summary
  22. 22. Summary Plots
  23. 23. Various Statistics
  24. 24. Export to MS-Excel (Windows)
  25. 25. GS Methods
  26. 26. GS Methods
  27. 27. GS Results
  28. 28. GS Results
  29. 29. Export to PDF
  30. 30. Export to High Quality Graphics 300DPI
  31. 31. Future Plans • Customized Parameters for GS Scripts • Integrating more Algorithms • Implementation of Cross Validation • Linking with IBWS • Data Import/Export Module • Online Version of ISMU 2.0 • Linking with Agricultural Genomics Network • Making available on more OS • Average GEBVs • Multi-trait GS • Capacity building in NARS Partners – 4th International Workshop on Next Generation Genomics and Integrated Breeding for Crop Improvement, Feb 19th -21st 2014
  32. 32. Acknowledgements Many Friends & Collaborators
  33. 33. Thanks…
  34. 34. Thanks…

×