Presentation by Justin Zook at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on benchmarks for indels and structural variants.
Presentation by Justin Zook at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on benchmarks for indels and structural variants.
Golden Helix’s SNP & Variation Suite (SVS) has been used by researchers around the world to do trait analysis and association testing on large cohorts of samples in both humans and other species. As Next-Generation Sequencing of whole genomes becomes more affordable, large cohorts of Whole Genome Sequencing (WGS) samples are available to search for additional trait association signals that were not found in array-based testing. In fact, recent papers have shown that WGS analysis using advanced GREML (Genomic Relatedness Restricted Maximum Likelihood) techniques is able to outperform micro-array based GWAS methods in the analysis of complex traits and proportion of the trait heritability explained.
Our latest update release of SVS has expanded the exiting maximum likelihood and GRM methods to support these new techniques. We have also enhanced various other association testing and prediction methodologies. This webcast showcases:
- Newly supported analysis workflow for whole genome variants using LD binning and enhanced GBLUP analysis
- Enhanced gender correction using REML
- Additional capabilities for genomic prediction and phenotype prediction
We are continually improving our products based on our customer’s feedback. We hope you enjoy this recording highlighting the exciting new features and select enhancements we have made.
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveGolden Helix
Earlier this year, we released VarSeq 2.3.0 which brought massive updates to our VSClinical AMP interface, such as enhanced capabilities for automation and analysis of structural variants in the cancer context. Naturally, we wanted to follow that up shortly with similar advancements to our VSClinical ACMG interface, and also make our customers doing germline variant analysis happy.
Our latest software release, VarSeq 2.4.0, was therefore focused on the advancements in VSClinical ACMG, namely support for importing and clinically evaluating structural variants, long read sequencing, advanced automation with evaluation scripts in VSClinical ACMG and end-to-end automation of ACMG workflows with VSPipeline. These new and improved features were discussed in a great webcast by our VP of Product and Engineering, Gabe Rudy, last month.
This upcoming webcast by our FAS team will be a user’s perspective on the new features in VarSeq 2.4.0 and VSClinical ACMG and how our tools can precisely and efficiently enable the full spectrum NGS analysis for Mendelian disorders.
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveGolden Helix
Earlier this year, we released VarSeq 2.3.0 which brought massive updates to our VSClinical AMP interface, such as enhanced capabilities for automation and analysis of structural variants in the cancer context. Naturally, we wanted to follow that up shortly with similar advancements to our VSClinical ACMG interface, and also make our customers doing germline variant analysis happy.
Our latest software release, VarSeq 2.4.0, was therefore focused on the advancements in VSClinical ACMG, namely support for importing and clinically evaluating structural variants, long read sequencing, advanced automation with evaluation scripts in VSClinical ACMG and end-to-end automation of ACMG workflows with VSPipeline. These new and improved features were discussed in a great webcast by our VP of Product and Engineering, Gabe Rudy, last month.
This upcoming webcast by our FAS team will be a user’s perspective on the new features in VarSeq 2.4.0 and VSClinical ACMG and how our tools can precisely and efficiently enable the full spectrum NGS analysis for Mendelian disorders.
Muktapishti is a traditional Ayurvedic preparation made from Shoditha Mukta (Purified Pearl), is believed to help regulate thyroid function and reduce symptoms of hyperthyroidism due to its cooling and balancing properties. Clinical evidence on its efficacy remains limited, necessitating further research to validate its therapeutic benefits.
micro teaching on communication m.sc nursing.pdfAnurag Sharma
Microteaching is a unique model of practice teaching. It is a viable instrument for the. desired change in the teaching behavior or the behavior potential which, in specified types of real. classroom situations, tends to facilitate the achievement of specified types of objectives.
ABDOMINAL TRAUMA in pediatrics part one.drhasanrajab
Abdominal trauma in pediatrics refers to injuries or damage to the abdominal organs in children. It can occur due to various causes such as falls, motor vehicle accidents, sports-related injuries, and physical abuse. Children are more vulnerable to abdominal trauma due to their unique anatomical and physiological characteristics. Signs and symptoms include abdominal pain, tenderness, distension, vomiting, and signs of shock. Diagnosis involves physical examination, imaging studies, and laboratory tests. Management depends on the severity and may involve conservative treatment or surgical intervention. Prevention is crucial in reducing the incidence of abdominal trauma in children.
NVBDCP.pptx Nation vector borne disease control programSapna Thakur
NVBDCP was launched in 2003-2004 . Vector-Borne Disease: Disease that results from an infection transmitted to humans and other animals by blood-feeding arthropods, such as mosquitoes, ticks, and fleas. Examples of vector-borne diseases include Dengue fever, West Nile Virus, Lyme disease, and malaria.
Adv. biopharm. APPLICATION OF PHARMACOKINETICS : TARGETED DRUG DELIVERY SYSTEMSAkankshaAshtankar
MIP 201T & MPH 202T
ADVANCED BIOPHARMACEUTICS & PHARMACOKINETICS : UNIT 5
APPLICATION OF PHARMACOKINETICS : TARGETED DRUG DELIVERY SYSTEMS By - AKANKSHA ASHTANKAR
Basavarajeeyam is an important text for ayurvedic physician belonging to andhra pradehs. It is a popular compendium in various parts of our country as well as in andhra pradesh. The content of the text was presented in sanskrit and telugu language (Bilingual). One of the most famous book in ayurvedic pharmaceutics and therapeutics. This book contains 25 chapters called as prakaranas. Many rasaoushadis were explained, pioneer of dhatu druti, nadi pareeksha, mutra pareeksha etc. Belongs to the period of 15-16 century. New diseases like upadamsha, phiranga rogas are explained.
These lecture slides, by Dr Sidra Arshad, offer a quick overview of the physiological basis of a normal electrocardiogram.
Learning objectives:
1. Define an electrocardiogram (ECG) and electrocardiography
2. Describe how dipoles generated by the heart produce the waveforms of the ECG
3. Describe the components of a normal electrocardiogram of a typical bipolar lead (limb II)
4. Differentiate between intervals and segments
5. Enlist some common indications for obtaining an ECG
6. Describe the flow of current around the heart during the cardiac cycle
7. Discuss the placement and polarity of the leads of electrocardiograph
8. Describe the normal electrocardiograms recorded from the limb leads and explain the physiological basis of the different records that are obtained
9. Define mean electrical vector (axis) of the heart and give the normal range
10. Define the mean QRS vector
11. Describe the axes of leads (hexagonal reference system)
12. Comprehend the vectorial analysis of the normal ECG
13. Determine the mean electrical axis of the ventricular QRS and appreciate the mean axis deviation
14. Explain the concepts of current of injury, J point, and their significance
Study Resources:
1. Chapter 11, Guyton and Hall Textbook of Medical Physiology, 14th edition
2. Chapter 9, Human Physiology - From Cells to Systems, Lauralee Sherwood, 9th edition
3. Chapter 29, Ganong’s Review of Medical Physiology, 26th edition
4. Electrocardiogram, StatPearls - https://www.ncbi.nlm.nih.gov/books/NBK549803/
5. ECG in Medical Practice by ABM Abdullah, 4th edition
6. Chapter 3, Cardiology Explained, https://www.ncbi.nlm.nih.gov/books/NBK2214/
7. ECG Basics, http://www.nataliescasebook.com/tag/e-c-g-basics
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachAyurveda ForAll
Explore the benefits of combining Ayurveda with conventional Parkinson's treatments. Learn how a holistic approach can manage symptoms, enhance well-being, and balance body energies. Discover the steps to safely integrate Ayurvedic practices into your Parkinson’s care plan, including expert guidance on diet, herbal remedies, and lifestyle modifications.
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
171017 giab for giab grc workshop
1. Genome in a Bottle:
Developing benchmark sets for large indels and
structural variants
Justin Zook, Marc Salit, and the GIAB Consortium
NIST Genome-Scale Measurements Group
Joint Initiative for Metrology in Biology (JIMB)
Oct 16, 2017
2. Take-home Messages
• Genome in a Bottle is authoritatively characterizing human
genomes
• Current characterization enables benchmarking of “easier”
variants/regions in germline genomes
– Clinical validation
– Technology development, optimization, and demonstration
• Now working on difficult variants and regions
– Draft variant calls >=20bp available and feedback requested
– Many challenges remain and collaborations welcome!
3. Why are we doing this?
• Technologies evolving rapidly
• Different sequencing and
bioinformatics methods give
different results
• Now have concordance in easy
regions, but not in difficult
regions
• Challenge:
– How do we characterize 6 billion
bases in the genome with high
confidence?
O’Rawe et al, Genome Medicine, 2013
https://doi.org/10.1186/gm432
4. GIAB is evolving
2012
• No human
benchmark
calls available
• GIAB
Consortium
formed
2014
• Small variant
genotypes
for ~77% of
pilot genome
NA12878
2015
• NIST releases
first human
genome
Reference
Material
2016
• 4 new
genomes
• Small
variants for
90% of 5
genomes for
GRCh37/38
2017+
• Characteriz-
ing difficult
variants
5. Genome in a Bottle Consortium
Authoritative Characterization of Human Genomes
Sample
gDNA isolation
Library Prep
Sequencing
Alignment/Mapping
Variant Calling
Confidence Estimates
Downstream Analysis
• gDNA reference materials to
evaluate performance
• GIAB is developing:
– reference materials
– Reference data
– Methods
– Tools to calculate performance
metrics
genericmeasurementprocess
www.slideshare.net/genomeinabottle
6. Bringing Principles of Metrology
to the Genome
• Reference materials
– DNA in a tube from NIST
• Extensive state-of-the-art
characterization
• “Upgradable” as technology
develops
• Commercial innovation
– PGP genomes suitable for
commercial derived products
• Benchmarking tools and software
– with GA4GH
• Enhance new technologies
7. GIAB has characterized 5 human genome RMs
• Pilot genome
– NA12878
• PGP Human Genomes
– Ashkenazi Jewish son
– Ashkenazi Jewish trio
– Chinese son
• Parents also characterized
National I nstituteof S tandards & Technology
Report of I nvestigation
Reference Material 8391
Human DNA for Whole-Genome Variant Assessment
(Son of Eastern European Ashkenazim Jewish Ancestry)
This Reference Material (RM) is intended for validation, optimization, and process evaluation purposes. It consists
of a male whole human genome sample of Eastern European Ashkenazim Jewish ancestry, and it can be used to assess
performance of variant calling from genome sequencing. A unit of RM 8391 consists of a vial containing human
genomic DNA extracted from a single large growth of human lymphoblastoid cell line GM24385 from the Coriell
Institute for Medical Research (Camden, NJ). The vial contains approximately 10 µg of genomic DNA, with the peak
of the nominal length distribution longer than 48.5 kb, as referenced by Lambda DNA, and the DNA is in TE buffer
(10 mM TRIS, 1 mM EDTA, pH 8.0).
This material is intended for assessing performance of human genome sequencing variant calling by obtaining
estimates of true positives, false positives, true negatives, and false negatives. Sequencing applications could include
whole genome sequencing, whole exome sequencing, and more targeted sequencing such as gene panels. This
genomic DNA is intended to be analyzed in the same way as any other sample a lab would process and analyze
extracted DNA. Because the RM is extracted DNA, it is not useful for assessing pre-analytical steps such as DNA
extraction, but it does challenge sequencing library preparation, sequencing machines, and the bioinformatics steps of
mapping, alignment, and variant calling. This RM is not intended to assess subsequent bioinformatics steps such as
functional or clinical interpretation.
Information Values: Information values are provided for single nucleotide polymorphisms (SNPs), small insertions
and deletions (indels), and homozygous reference genotypes for approximately 88 % of the genome, using methods
similar to described in reference 1. An information value is considered to be a value that will be of interest and use to
the RM user, but insufficient information is available to assess the uncertainty associated with the value. We describe
and disseminate our best, most confident, estimate of the genotypes using the data and methods currently available.
These data and genomic characterizations will be maintained over time as new data accrue and measurement and
informatics methods become available. The information values are given as a variant call file (vcf) that contains the
high-confidence SNPs and small indels, as well as a tab-delimited “bed” file that describes the regions that are called
high-confidence. Information values cannot be used to establish metrological traceability. The files referenced in this
report are available at the Genome in a Bottle ftp site hosted by the National Center for Biotechnology Information
(NCBI). The Genome in a Bottle ftp site for the high-confidence vcf and high confidence regions is:
8. Integration of diverse data types and analyses
• Data publicly available
– Deep short reads
– Linked reads
– Long reads
– Optical/nanopore mapping
• Analyses
– Small variant calling
– SV calling
– Local and global assembly
Discover &
Refine
sequence-
resolved calls
from multiple
datasets &
analyses Compare
variant and
genotype calls
from different
methods
Evaluate/
genotype calls
with other
data
Identify
features
associated
with reliability
of calls from
each method
Form
benchmark
calls using
heuristics &
machine
learning
Compare
benchmarks
to high-
quality
callsets and
examine
differences
10. Evolution of high-confidence small variants
Calls
HC
Regions HC Calls
HC
indels
Concordant
with PG
NIST-
only in
beds
PG-only
in beds PG-only
Variants
Phased
v2.19 2.22 Gb 3153247 352937 3030703 87 404 1018795 0.3%
v3.2.2 2.53 Gb 3512990 335594 3391783 57 52 657715 3.9%
v3.3 2.57 Gb 3566076 358753 3441361 40 60 608137 8.8%
v3.3.2 2.58 Gb 3691156 487841 3529641 47 61 469202 99.6%
5-7
errors
in NIST
1-7
errors
in NIST
~2 FPs and ~2 FNs per million NIST variants in PG and NIST bed files
11. Global Alliance for Genomics and Health Benchmarking Task
Team
• Developed standardized
definitions for performance
metrics like TP, FP, and FN.
• Developing sophisticated
benchmarking tools
• Integrated into a single framework
with standardized inputs and
outputs
• Standardized bed files with
difficult genome contexts for
stratification
https://github.com/ga4gh/benchmarking-tools
Variant types can change when decomposing
or recomposing variants:
Complex variant:
chr1 201586350 CTCTCTCTCT CA
DEL + SNP:
chr1 201586350 CTCTCTCTCT C
chr1 201586359 T A
Credit: Peter Krusche, Illumina
GA4GH Benchmarking Team
13. What are we accessing and what is still
challenging?
Type of variant Genome
context
Fraction
of variants
called*
Number of
variants
missing*
How to improve?
Simple SNPs Not repetitive ~97% >100k Machine learning
Simple indels Not repetitive ~93% >10k Machine learning
All variants Low
mappability
<30% >170k Use linked reads and long
reads
All variants Regions not in
GRCh37/38
0 >>100k??? De novo assembly; long reads
Small indels Tandem repeats
and
homopolymers
<50% >200k STR/homopolymer callers; long
reads; better handle complex
and compound variants
Indels 15-50bp All <25% >30k Assembly-based callers;
integrate larger variants
differently; long reads
Indels >50bp All <1% >20k
* Approximate values based on fraction of variants in GATKHC or FermiKit that are
inside v3.3.2 High-confidence regions
14. How can we extend our approach to structural
variants?
Similarities to small variants
• Collect callsets from multiple
technologies
• Compare callsets to find calls
supported by multiple technologies
Differences from small variants
• Callsets have limited sensitivity
• Variants are often imprecisely
characterized
– breakpoints, size, type, etc.
• Representation of variants is poorly
standardized, especially when complex
• Comparison tools in infancy
15. Our strategy
Collect many candidate calls for AJ Trio
• Gather candidate calls from a variety of
approaches
– Many technologies
• Short, linked, and long reads
• Optical and nanopore mapping
– Many approaches
• Small variant callers
• Structural variant callers
• Local and global de novo assemblies
• Community submitted >1 million calls
from 30+ methods using 5+ technologies
Refine/evaluate/genotype candidates
• Obtain sequence-resolved calls as
often as possible using assembly-based
approaches
• Compare sequence predictions of
candidate calls and merge similar calls
• Determine raw data’s support of each
sequence-resolved call and its
genotype
16. Evaluation/genotyping suite of methods
Current approaches
• svviz – maps reads to REF or ALT alleles
– PacBio
– Illumina paired end and mate-pair
– 10X haplotype-separated
• BioNano – compare size predictions
• Nabsys – evaluates large deletions
Future approaches
• Separate haplotypes on other data
types for svviz using whatshap
• Online manual curation of svviz, IGV,
dotplots, gEVAL, etc.
– Volunteers needed!
• PCR-Sanger targeted sequencing
– Collaborations welcome!
17. Integrating Sequence-resolved Calls >=20bp
>1 million calls from 30+ sequence-resolved callsets from 4 techs for
AJ Trio
>500k unique sequence-resolved calls
30k INS and 32k DEL with 2+ techs or 5+
callers predicting sequences <20%
different or BioNano/Nabsys support
28k INS and 29k DEL
genotyped by svviz in 1+
individuals
v0.4.0
http://tinyurl.com/GIABSV0-4-0
18. Size Distribution of v0.4.0 Calls
Not Tandem Repeat
Tandem Repeat
Deletions Insertions
Alu
LINE
Alu
LINE
20. Insertion sequence prediction accuracy differs
between methods
Relative Distance from exact match
Illumina local
assembly
PacBio raw
read
PacBio consensus
assembly
22. Outstanding challenges and future work
• Large sequence-resolved insertions
• Many fewer multi-kb insertions
than multi-kb deletions
• Dense calls
• ~1/3 v0.4.0 calls are within 1kb of
another v0.4.0 call
• Sequence-resolved insertion size
doesn’t always match BioNano
• Phasing will be important for
these (e.g., with 10X, whatshap)
• Calls with inaccurate or incomplete
sequence change
• Exploring training a model to
predict sequence accuracy
• Homozygous Reference calls
• Can we definitively state there is
no SV in some regions?
• E.g., using diploid assembly?
• Benchmarking tool development
• How to compare SVs to a
benchmark?
• What performance metrics are
important?
23. New public data planned for late 2017
• PacBio Sequel sequencing of
GIAB Chinese trio
– Collaboration with Mt. Sinai
– 60x/30x/30x coverage planned
– Potentially >15kb N50 read length
• Oxford Nanopore sequencing of
Ashkenazim trio
– Collaboration with Nick Loman and
Matt Loose
– ~50x/25x/25x coverage planned
– Ultralong read sequencing (50-
100kb+ N50 read length)
24. New Samples
Additional ancestries
• Shorter term
– Use existing PGP individual samples
– Use existing integration pipeline
• Data-based selection
– Proportion of potential genomes from
different ancestries
• 3 to 8 new samples
• Longer term
– Recruit large family
– Recruit trios from other ancestry groups
Cancer samples
• Longer term
• Make PGP-consented tumor and
normal cell lines from same individual
• Select tumor with diversity of mutation
types
25. Take-home Messages
• Genome in a Bottle is authoritatively characterizing human
genomes
• Current characterization enables robust benchmarking of “easier”
variants/regions
• Actively working on difficult variants and regions
– Draft variant calls >=20bp available – feedback requested!
• New public long and ultralong read datasets coming!
• What can we help enable?
– Clinical applications – precision medicine
– Research applications – how to know new methods are measuring difficult
regions/variants well
http://tinyurl.com/GIABSV0-4-0
26. Acknowledgements
• NIST/JIMB
– Marc Salit
– Jenny McDaniel
– Lindsay Vang
– David Catoe
– Lesley Chapman
• Genome in a Bottle Consortium
• GA4GH Benchmarking Team
• FDA
27. For More Information
www.genomeinabottle.org - sign up for general GIAB and Analysis Team google group emails
github.com/genome-in-a-bottle – Guide to GIAB data & ftp
www.slideshare.net/genomeinabottle
SVs: http://tinyurl.com/GIABSV0-4-0
Data: http://www.nature.com/articles/sdata201625
Global Alliance Benchmarking Team
– https://github.com/ga4gh/benchmarking-tools
– precision.fda.gov – GA4GH benchmarking app
Biweekly Analysis Team calls (open to all)
– https://groups.google.com/forum/#!forum/giab-analysis-team
Public workshops
– Next workshop Jan 25-26, 2018 in Stanford, CA
– http://jimb.stanford.edu/giabworkshops for info and registration
NIST/JIMB postdoc opportunities available!
Justin Zook: jzook@nist.gov
Marc Salit: salit@nist.gov