ISMU pipeline for NGS dataanalysis and facilitatingmolecular breedinghttp://hpc.icrisat.cgiar.org/NGS/
• Short read length of sequences• Availability of many tools• Platform dependency and command line driven• No direct ways ...
ISMU version 1• SNP discovery from NGS data– Pipeline for mapping / assembling– Calling SNPs between genotypes– Visualisat...
ISMU version 2• Application of identified SNPs to breeding
• Benchmark available open source short readsassembly and downstream analysisprograms/software.• Assembly and polymorphism...
Control FlowchartICRISATCROPSYesNoInput Data & validationUpload Reference& dataMapping (Maq,Novo)Mapped readsAssembly Visu...
Genotype 1 Genotype 2Chrom1 Pos RefAllele Gtyp1 Gtyp25 303 A G ?Maq NovoProgrammeSNP Bet GenotypesStandard MethodologyMapp...
Customized Methodology(Consensus Base Calling-cc)ccMaq ccNovoSNP CallingGenotype 1 Genotype 2ProgrammeInhouse ScriptADT sc...
Consensus Base CallingParameters (Default)• Max number of mismatches <= 7• Sum of mismatches score <=60• Min mapping quali...
What if more than 2 genotypes?Genotype1Genotype2Genotype3Genotype4G1 G2 G3G1 0 1 1G2 0 0 1G3 0 0 0Combination of genotypes...
• Reads formatfna and qual(Standard/Sanger)FastqSCARF fomatSolexa fastq, Solexa exportAB SOLiD read formatFASTA• Reference...
NGS pipeline (Input 1)http://hpc.icrisat.cgiar.org/NGS/
NGS pipeline (Input 2)
NGS pipeline (Help page)
NGS pipeline (Results)
NGS pipeline (Visualisation)
Available in 2 Editions1. Server Edition2. Desktop EditionPipeline Editions
• User friendly web interface– Installation on following Linux platform• Fedora 13• Cent OS 5• Clients can be any OS with ...
Desktop Edition• All functionalities of Server Edition on a Desktop• Supported OS• Fedora 13• RHEL 5• Single command insta...
Future plans•Consideration of new tools to integrate /update eg: BWA, Bowtie•Implementation of the extension to thepipelin...
• Identification ofappropriate modules forMARS, GWS and GBS• Integration of MARS andGWS module• Linking of ISMU pipelinewi...
InternetArchitectureReferenceSequencesVelvetPerl ProgMaqNovoCGISNP DatabaseFilesdownloadingDynamicQueryingAssemblyVisualiz...
• Rajeev K. Varshney• Abhishek Rathore• Jayashree B• Vivek Thakur• R. Pradeep• A. Bhanu Prakash• Sarwar Azam• G.Meenakshi•...
Upcoming SlideShare
Loading in …5
×

GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

785 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
785
On SlideShare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

  1. 1. ISMU pipeline for NGS dataanalysis and facilitatingmolecular breedinghttp://hpc.icrisat.cgiar.org/NGS/
  2. 2. • Short read length of sequences• Availability of many tools• Platform dependency and command line driven• No direct ways for prediction of SNPs betweengenotypes• Quality scores vary depending on version andtechnologyChallenges
  3. 3. ISMU version 1• SNP discovery from NGS data– Pipeline for mapping / assembling– Calling SNPs between genotypes– Visualisation
  4. 4. ISMU version 2• Application of identified SNPs to breeding
  5. 5. • Benchmark available open source short readsassembly and downstream analysisprograms/software.• Assembly and polymorphism detection betweengenotypes and visualization• Assay design (Illumina GoldenGate Assay), genotypecalling and visualization and analysis of SNPgenotyping and haplotype data• Identify and use parental lines for using in MABC orMARS• Discovery of SNP markers for use in foreground andbackground selection of MABC or MARS.• Documentation of the pipeline and the integratedsoftware.Objectives of NGS Pipeline
  6. 6. Control FlowchartICRISATCROPSYesNoInput Data & validationUpload Reference& dataMapping (Maq,Novo)Mapped readsAssembly VisualizationConsensus callingReport SNPs• Extract sequences with SNPs• Design primers• In silico validation by SNP2CAPSDatabaseADT ScoreG.G AssayBead StudioFlapjack
  7. 7. Genotype 1 Genotype 2Chrom1 Pos RefAllele Gtyp1 Gtyp25 303 A G ?Maq NovoProgrammeSNP Bet GenotypesStandard MethodologyMapping MappingAssemblySNP Callingag. ReferenceADT ScoringReportingRemoveduplicatesCheck the inversecombinationCompare allele betweengenotypesBase calling in 2nd genotypePredicted SNPs against Reference
  8. 8. Customized Methodology(Consensus Base Calling-cc)ccMaq ccNovoSNP CallingGenotype 1 Genotype 2ProgrammeInhouse ScriptADT scoringGenotype 2fmaj=21/28=0.75Genotype 1fmaj =38/40=0.95Mapping Mapping
  9. 9. Consensus Base CallingParameters (Default)• Max number of mismatches <= 7• Sum of mismatches score <=60• Min mapping quality =>0• Read depth threshold =>5• Major base frequency threshold => 0.75
  10. 10. What if more than 2 genotypes?Genotype1Genotype2Genotype3Genotype4G1 G2 G3G1 0 1 1G2 0 0 1G3 0 0 0Combination of genotypes = (n2–n)/2
  11. 11. • Reads formatfna and qual(Standard/Sanger)FastqSCARF fomatSolexa fastq, Solexa exportAB SOLiD read formatFASTA• Reference sequenceChickpea transcript assemblyPearl millet transcript assemblyPigeonpea transcript assemblyMedicago genomeSorghum genomeNGS pipeline input data
  12. 12. NGS pipeline (Input 1)http://hpc.icrisat.cgiar.org/NGS/
  13. 13. NGS pipeline (Input 2)
  14. 14. NGS pipeline (Help page)
  15. 15. NGS pipeline (Results)
  16. 16. NGS pipeline (Visualisation)
  17. 17. Available in 2 Editions1. Server Edition2. Desktop EditionPipeline Editions
  18. 18. • User friendly web interface– Installation on following Linux platform• Fedora 13• Cent OS 5• Clients can be any OS with a web browser• Communication resources• SMTP (Email)• Session specific job processing- Avoid file over writingServer Edition
  19. 19. Desktop Edition• All functionalities of Server Edition on a Desktop• Supported OS• Fedora 13• RHEL 5• Single command installation• Available in Installable CD
  20. 20. Future plans•Consideration of new tools to integrate /update eg: BWA, Bowtie•Implementation of the extension to thepipeline•Evaluate cloud computing and highperformance computing cluster options•Initiatives such as iPlant (discoveryenvironment – genotype to phenotype)
  21. 21. • Identification ofappropriate modules forMARS, GWS and GBS• Integration of MARS andGWS module• Linking of ISMU pipelinewith DMS of IBP• Documentation & Trainingof ISMU pipelineFuture Plans: ISMU v 2
  22. 22. InternetArchitectureReferenceSequencesVelvetPerl ProgMaqNovoCGISNP DatabaseFilesdownloadingDynamicQueryingAssemblyVisualizationInput datavalidationNGS Data Analysis pipeline at ICRISATApache ServerHosting WebPagesSMTPServer
  23. 23. • Rajeev K. Varshney• Abhishek Rathore• Jayashree B• Vivek Thakur• R. Pradeep• A. Bhanu Prakash• Sarwar Azam• G.Meenakshi• David Marshall• Iain MilneContributors• Jonathan Jones• David Studholme• Greg May• Andrew Farmer• Jimmy Woodward• Dave Edwards

×