Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Church sfaf13

3,633 views

Published on

Published in: Technology
  • Be the first to comment

Church sfaf13

  1. 1. Keep CalmAndCarry on SequencingDeanna M. ChurchStaff Scientist, NCBI@deannachurch
  2. 2. http://genomereference.orgValerie Schneider, NCBI
  3. 3. Photograph: Paul Popper/Popperfoto/Getty Images
  4. 4. GRCh38 is coming(September, 2013)
  5. 5. http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
  6. 6. 05,000,00010,000,00015,000,00020,000,00025,000,00030,000,00035,000,00040,000,00045,000,000GRCh37p12 CHM1.0 HuRef HsapALLPATHS1 YH1Con g N50Con g N50050,000100,000150,000200,000250,000300,000350,000400,000CHM1.0 HuRef HsapALLPATHS1 YH1Con g N50Con g N50
  7. 7. 0100000020000003000000400000050000006000000GRCh37p12 CHM1.0 HuRef HsapALLPATHS1 YH1Number of Con gsNumber of Con gs01000020000300004000050000600007000080000GRCh37p12CHM1.0HuRefHsapALLPATHS1Number of Con gsNumber of Con gs
  8. 8. http://www.bioplanet.com/gcat
  9. 9. http://genomereference.org
  10. 10. Dennis et al., 20121q32 1q21 1p211p21 patch alignment to chromosome 1
  11. 11. http://www.ncbi.nlm.nih.gov/variation/tools/1000genomesCDC271KG Phase 1 Strict accessibility maskSNP (all)SNP (not 1KG)
  12. 12. Sudmant et al., 2010
  13. 13. Kidd et al, 2007APOBEC clusterPart of chr22 assemblyAlternate locus for chr22White: InsertionBlack: Deletion
  14. 14. http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
  15. 15. Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320129S6/SVEvTac tiling pathAlignment to C57BL/6J chr1B6 Genes129S6/SvEvTac Genes+ 32Kb in 129S6/SvEvTac
  16. 16. Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320NM_031192.3: transcript from C57BL/6JNM_031193.2: transcript from FVB/N129S6/SvEvTac Alt Locus Alignment (allelic)FVB/N Transcript Alignment (paralog)
  17. 17. 129S6/SvEvTac Ren1FVB Ren2 TxParalogousdiffSNP +ParalogousdiffMouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320NM_031192.3: transcript from C57BL/6JNM_031193.2: transcript from FVB/N
  18. 18. An assembly is a MODEL of the genome
  19. 19. Assembly Model
  20. 20. BAC insertBAC vectorShotgun sequenceAssembleGAPSFinishing
  21. 21. http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-21NCBI36 (hg18)GRCh37(hg19)
  22. 22. NCBI35 (hg17)GRCh37 (hg19)AL139246.20AL139246.21
  23. 23. Daly et al., 2013
  24. 24. http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1012
  25. 25. http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1321
  26. 26. Fixing Rare/Incorrect Bases
  27. 27. Fixing Rare/Incorrect Bases
  28. 28. GRCh37B Sites for Update: n=1164Sites with unique successful ctg 1148 (98.6%)Avg Length 448 bpMin/Max Success Length 51/791 bpAvg Coverage 80xRead Source (all contigs)High coverage 32%Low coverage 57%Exome 10%Fixing Rare/Incorrect Bases
  29. 29. Build sequence contigs based on contigsdefined in TPF (Tiling Path File).Check for orientation consistenciesSelect switch pointsInstantiate sequence for further analysisSwitch pointRepresentative chromosomesequence
  30. 30. RP11-34P13 64E8 RP4-669L17 RP5-857K21 RP11-206L10 RP11-54O7Gaps
  31. 31. NCBI36
  32. 32. nsv832911 (nstd68) Submitted on NCBI35 (hg17)
  33. 33. NCBI35 (hg17) Tiling PathGRCh37 (hg19) Tiling PathGap InsertedMoved approximately 2 Mbdistal on chr15NC_0000015.8 (chr15)NC_0000015.9 (chr15)Removed from assemblyAdded to assemblyhttp://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-24
  34. 34. Sequences from haplotype 1Sequences from haplotype 2Old Assembly model: compress into a consensusNew Assembly model: represent both haplotypes
  35. 35. AC074378.4AC079749.5AC134921.2AC147055.2AC140484.1AC019173.4AC093720.2AC021146.7NCBI36NC_000004.10 (chr4) Tiling PathXue Y et al, 2008TMPRSS11E TMPRSS11E2GRCh37NC_000004.11 (chr4) Tiling PathAC074378.4AC079749.5AC134921.1AC147055.2AC093720.2AC021146.7TMPRSS11EGRCh37: NT_167250.1 (UGT2B17 alternate locus)AC074378.4AC140484.1AC019173.4AC226496.2AC021146.7TMPRSS11E2nsv532126 (nstd37)
  36. 36. Adding Novel Sequence1000G ph1 decoy sequence, viewed by:• GenBank alignment• Percent Repeat Masker• Repeat Masker type• Sequence Source (HTG, HuRef, ALLPATHS)
  37. 37. Adding Novel Sequence
  38. 38. Adding Novel Sequence
  39. 39. Genovese et al., 2013
  40. 40. Adding Novel SequenceKaren Hayden and Jim Kent
  41. 41. Human Resolved for GRCh38http://genomereference.org
  42. 42. Examples
  43. 43. Preview of GRCh38 (scheduled Fall 2013)TEX28 TKTL1LOC101060233(opsin related)LOC101060234(TEX28 related)GRCh37 (current reference assembly)chrX
  44. 44. Hydin: chr16 (16q22.2)Hydin2: chr1 (1q21.1)Missing in NCBI35/NCBI36 Unlocalized in GRCh37 Finished in GRCh38Alignment to Hydin2 Genomic, 300 Kb, 99.4% IDAlignment to Hydin1 CHM1_1.0, >99.9% IDAlignment to Hydin2 Genomic, 300 Kb, 99.4% IDAlignment to Hydin1 CHM1_1.0, >99.9% IDDoggett et al., 2006
  45. 45. FAM23_MRC1 Region, chr10Segmental Duplications1KG accessibility MaskNovel Patch 250 kb of artificial duplication
  46. 46. Adding Novel Sequence
  47. 47. Richa AgarwalaMHC Alternate locusAlignment to chr6
  48. 48. Making the assembly accessible toexisting tools: maskingQuery set: 439,109,084 NA12878 HiSeq reads
  49. 49. Masking effectively blocks alignmentsin regions with high identitySimulated reads from GRCh37.p9• Unpaired reads• 101 bp• 1x coverage• Default wgsim parametersMasking parameters• Percent Id: 100%• Step size: 5 bp• Minimum length: 101 bp• Center SNPs in unmasked regions
  50. 50. Masking improves alignments inregions with alternate loci or patches
  51. 51. NA12878 reads whose bestalignment was on an alt/patch inthe masked assembly wereevaluated for their alignmentlocation when aligned to theprimary assembly aloneMasking effectively reduces theincrease in NA12878 reads thathave alignments with MAPQ=0 thatoccurs when the full assembly isused as an alignment substrate
  52. 52. GRCh38 is coming(September, 2013)

×