3. gEVAL Browser
• Evaluation of Genome Assemblies
• Based on Ensembl Framework
• Run by Sanger Institute
• Navigation of current assemblies
• View different annotation
• Punchlists of Assembly Issues
http://geval.sanger.ac.uk/
4. gEVAL Browser - Annotation
• Intra species alignments
• Optical Map data
• Clone End mapping
• Black Tags/Clone sequence anomalies
• Self Comparisons
• CCDS Alignments
• RefSeq Alignments
• Repeat Annotation
– Centromeric repeats
– Telomeric repeats
– 1000 Genomes identified Mobile Element Insertions
http://geval.sanger.ac.uk/
5. Human Build Info
• Primary Reference
– GRCh38
• Current Build
– Human 20140826
• Older Versions
– GRCh37.p13
– NCBI36
• Other Assemblies
– CHM1_1.1
– NA12878
– Huref
– YH2.0
http://geval.sanger.ac.uk/
12. Clone End Library Mappings
19 Human Clone Libraries
• Reveals state of assembly
• Orientation
• Mis-assemblies
• Incorrect location
• Source of new sequence
Mapped 1 time
Mapped multiple times
Wrong direction (<<, <>, >>)
Wrong distance from partner
Spanning partner in the vicinity
http://geval.sanger.ac.uk/
15. Using Clone End Library Mappings
before
after
http://geval.sanger.ac.uk/
16. Optical Maps
• ..ordered restriction maps from single stained molecules of
DNA.
• D.Schwartz (UW-Madison)
http://geval.sanger.ac.uk/
17. TrackHub
http://ngs.sanger.ac.uk/production/grit/track_hub/hub.tx
t
Display GRC issues in Ensembl and UCSC
• Genome issues under review by the GRC
• Genomic regions defined by the GRC
• Alignments between the primary assembly and
alternate loci or patches
• Clone sequence anomalies
• Human regions with clones from the CHORI-17 library
(CHM1tert)
18. Adding a TrackHub to Ensembl
http://ngs.sanger.ac.uk/production/grit/track_hub/hub.txt
25. The 1000 Genomes Project
• IGSR established to maintain 1000 Genomes Data
• Phase 3 release is out on GRCh37
• Reads will be remapping in 1Q 2015
• dbSNP has GRCh38 mapping for most sites now
• Plan to recall/map in 2Q 2015
26. Acknowledgments
GRIT (Team135)
• Kate Auger
• Joanna Collins
• Guy Griffith
• Glenn Harden
• Paul Heath
• Britt Kilian
• Kerstin Howe
• Sarah Pelan
• Glen Threadgold
• James Torrance
• Jo Wood
Alumni
• Kim Brugger
• Mario Caccamo
• Ian Sealy
• Tina Eyre
• Ed Zuiderwijk
• James Smith
• Paul Bevan
• Simon Brent
• Harpreet Riat
• David Harper
• David Schwartz (UW Madison)
• Steve Goldstein (UWMadison)
• Evan Eichler (Uwashington)
• Jeff Kid (Uwashington)
http://geval.sanger.ac.uk/
Editor's Notes
Top BOX is GRCh38
Middle BOX is HuREF Human Assembly
Bottom BOX is GRCh37p13
In GRCh37p13 window there is a gap between the components AC144441/AC147553 and AC232988. Aligning to Huref Assembly, shows potential ability to size of the gap based on sequence (even though there appears to be a small gap). This can allow user to define specific strategy to close this gap (ie small gap -> PCR, or insert wgs sequence or choose fosmid vs BAC clone or other)
The gap is roughly 10kb.
RESOLUTION: If you look at the ticket HG1312 (https://ncbijira.ncbi.nlm.nih.gov/browse/HG-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel):
Milinn sees this as well, and this region is actually important b/c of a gene HIPK2 (GeneID:28996). Milinn edits the TPF to include the HuREF components initially: SEE NEXT SLIDE
Revisiting the first slide, you see the scaffold placed in the gapped region in GRCh38. The scaffold is itself 60kb as seen in the clone track (yellow box: scaf00473_reg01_ctg01), but contributes the 9948 bp.
The gap is now closed, and the associated gene that was split due to the gap ( HIPK2 (GeneID:28996 ) is now fixed.
As seen above the Ticket entries shows the TPF with the WGS added.
However later on, Deanna Church finds an unplaced scaffold in the RP11 WGS assembly and Milinn proceeds to accession and use that to fit into the next release ( see next slide ).
As Predicted from the first slide, the gap was roughly 10kb!
Clone end placements reveal sequence that can be placed in the gap region. Assembly reveals newly sequenced clone in path.
The clone component AL596089 contains a deletion and is highlighted by the 3 cell line optical map analysis (right). This would not have been captured because the clone overlaps do not extend far enough to show this. An issue that is tagged and reported in GRC ticket: HG-1482.