Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka Elasticsearch Tutorial will help you in understanding the fundamentals of Elasticsearch along with its practical usage and help you in building a strong foundation in ELK Stack. This video helps you to learn following topics:
1. What Is Elasticsearch?
2. Why Elasticsearch?
3. Elasticsearch Advantages
4. Elasticsearch Installation
5. API Conventions
6. Elasticsearch Query DSL
7. Mapping
8. Analysis
9 Modules
PhoRank 2.0: Improved Phenotype-Based Gene Ranking in VarSeqGolden Helix
When performing variant analysis on whole exome or large gene panels, clinicians must sort through thousands of variants to determine which variants are most likely to be associated with the patient’s phenotypes. To assist with this process, we have implemented the PhoRank algorithm, which incorporates phenotypic associations to highlight the most relevant genes with potentially damaging variants. PhoRank 1.0 supports researchers leveraging all possible gene-disease associations by traversing multiple gene and pathway ontologies. Recent papers have demonstrated new techniques that have improved ranking performance in a clinical context. We have incorporated these new strategies into PhoRank 2.0: providing better ranking and improved computational performance for most clinically diagnostic and testing scenarios. Join us in this webinar as we cover:
Utility of gene ranking in genetic testing
Scenarios that warrant the use of PhoRank 2.0
New ranking strategies provided by recent papers
Benchmarks of PhoRank 2.0 on published datasets
Golden Helix provides a comprehensive solution for NGS testing labs to perform best practice guidelines such as ACMG and AMP. Our gene ranking methods provide a vital role in scaling tests to large gene panels and exomes. Please join us as we review the testing workflow and how this significant update to our gene ranking algorithm fits into the testing workflow.
Implementing a Fileserver with Nginx and LuaAndrii Gakhov
Using the power of Nginx it is easy to implement quite complex logic of file upload with metadata and authorization support, and without need of any heavy application server. In this article you can find the basic implementation of such Fileserver using Nginx and Lua only.
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka Elasticsearch Tutorial will help you in understanding the fundamentals of Elasticsearch along with its practical usage and help you in building a strong foundation in ELK Stack. This video helps you to learn following topics:
1. What Is Elasticsearch?
2. Why Elasticsearch?
3. Elasticsearch Advantages
4. Elasticsearch Installation
5. API Conventions
6. Elasticsearch Query DSL
7. Mapping
8. Analysis
9 Modules
PhoRank 2.0: Improved Phenotype-Based Gene Ranking in VarSeqGolden Helix
When performing variant analysis on whole exome or large gene panels, clinicians must sort through thousands of variants to determine which variants are most likely to be associated with the patient’s phenotypes. To assist with this process, we have implemented the PhoRank algorithm, which incorporates phenotypic associations to highlight the most relevant genes with potentially damaging variants. PhoRank 1.0 supports researchers leveraging all possible gene-disease associations by traversing multiple gene and pathway ontologies. Recent papers have demonstrated new techniques that have improved ranking performance in a clinical context. We have incorporated these new strategies into PhoRank 2.0: providing better ranking and improved computational performance for most clinically diagnostic and testing scenarios. Join us in this webinar as we cover:
Utility of gene ranking in genetic testing
Scenarios that warrant the use of PhoRank 2.0
New ranking strategies provided by recent papers
Benchmarks of PhoRank 2.0 on published datasets
Golden Helix provides a comprehensive solution for NGS testing labs to perform best practice guidelines such as ACMG and AMP. Our gene ranking methods provide a vital role in scaling tests to large gene panels and exomes. Please join us as we review the testing workflow and how this significant update to our gene ranking algorithm fits into the testing workflow.
Implementing a Fileserver with Nginx and LuaAndrii Gakhov
Using the power of Nginx it is easy to implement quite complex logic of file upload with metadata and authorization support, and without need of any heavy application server. In this article you can find the basic implementation of such Fileserver using Nginx and Lua only.
Presentation by Valerie Schneider discussing Genome Reference Consortium (GRC) plans for the mouse and zebrafish reference genome assemblies, presented at the 2016 meeting of the The Allied Genetic Conference (TAGC). Includes description of resources at the National Center for Biotechnology Information (NCBI) for working with reference genome assemblies.
Presentation by Valerie Schneider discussing Genome Reference Consortium (GRC) plans for the mouse and zebrafish reference genome assemblies, presented at the 2016 meeting of the The Allied Genetic Conference (TAGC). Includes description of resources at the National Center for Biotechnology Information (NCBI) for working with reference genome assemblies.
Presentation by Justin Zook at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on benchmarks for indels and structural variants.
Please note: This presentation accompanies a recorded webinar at:
https://www1.gotomeeting.com/register/347794241
Biomarkers for studying gene regulation and cell function can be efficiently analyzed by multiplexed methods. Dr. Jim Lazar from OriGene Technologies will provide an overview of four different but related detection technologies that can be used to analyze genetic variants, microRNA expression, transcription factor binding, and protein expression on the Luminex xMAP platform. OriGene’s broad panel of assays and tools for discovery, analysis and validation of multiple classes of important biomarkers will allow researcher to develop more accurate descriptions of biologically complex systems.
Presentation at IMGC 2019 workshop describing the latest improvements to the mouse reference genome assembly and analyses performed in preparation for the next release of the mouse genome assembly (GRCm39).
Presentation at 2019 ASHG GRC/GIAB workshop describing history of the human reference genome, current curation efforts and future plans, and the relationship of all 3 to efforts to produce a human pan-genome.
Platform presentation at ASHG 2019 describing recent updates to the human reference genome assembly (GRCh38) and future plans with relevance to pan-genomic representations.
Presentation at 2019 ASHG GRC/GIAB workshop describing goals and progress of the telomere-to-telomere consortium to generate a genome assembly that provides representation of all sequences, including repetitive regions.
Presentation at 2019 ASHG GRC/GIAB workshop describing features and recent updates to the vg toolkit, including examples of comparisons to other methods used for alignment and variant detection.
Presentation at 2019 ASHG GRC/GIAB workshop describing recent updates to the MANE project, which aims to provide matched annotation from RefSeq and GENCODE.
Presentation at PanGenomics in the Cloud Hackathon, run by NCBI at UCSC (https://ncbiinsights.ncbi.nlm.nih.gov/2019/02/06/pangenomics-cloud-hackathon-march-2019/). Presents points to consider about the adoption of a pangenome reference, emphasizing aspects for long-term data management and wide-spread adoption.
Presentation by Benedict Paten at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Presentation by Valerie Schneider at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Presentation by Tina Graves-Lindsay at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on production of reference grade assemblies for various human populations.
Presentation by Fritz Sedlazeck at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on characterizing human structural variation.
Presentation by Karen Miga at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on centromere assemblies.
The prostate is an exocrine gland of the male mammalian reproductive system
It is a walnut-sized gland that forms part of the male reproductive system and is located in front of the rectum and just below the urinary bladder
Function is to store and secrete a clear, slightly alkaline fluid that constitutes 10-30% of the volume of the seminal fluid that along with the spermatozoa, constitutes semen
A healthy human prostate measures (4cm-vertical, by 3cm-horizontal, 2cm ant-post ).
It surrounds the urethra just below the urinary bladder. It has anterior, median, posterior and two lateral lobes
It’s work is regulated by androgens which are responsible for male sex characteristics
Generalised disease of the prostate due to hormonal derangement which leads to non malignant enlargement of the gland (increase in the number of epithelial cells and stromal tissue)to cause compression of the urethra leading to symptoms (LUTS
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists Saeid Safari
Preoperative Management of Patients on GLP-1 Receptor Agonists like Ozempic and Semiglutide
ASA GUIDELINE
NYSORA Guideline
2 Case Reports of Gastric Ultrasound
Prix Galien International 2024 Forum ProgramLevi Shapiro
June 20, 2024, Prix Galien International and Jerusalem Ethics Forum in ROME. Detailed agenda including panels:
- ADVANCES IN CARDIOLOGY: A NEW PARADIGM IS COMING
- WOMEN’S HEALTH: FERTILITY PRESERVATION
- WHAT’S NEW IN THE TREATMENT OF INFECTIOUS,
ONCOLOGICAL AND INFLAMMATORY SKIN DISEASES?
- ARTIFICIAL INTELLIGENCE AND ETHICS
- GENE THERAPY
- BEYOND BORDERS: GLOBAL INITIATIVES FOR DEMOCRATIZING LIFE SCIENCE TECHNOLOGIES AND PROMOTING ACCESS TO HEALTHCARE
- ETHICAL CHALLENGES IN LIFE SCIENCES
- Prix Galien International Awards Ceremony
- Video recording of this lecture in English language: https://youtu.be/lK81BzxMqdo
- Video recording of this lecture in Arabic language: https://youtu.be/Ve4P0COk9OI
- Link to download the book free: https://nephrotube.blogspot.com/p/nephrotube-nephrology-books.html
- Link to NephroTube website: www.NephroTube.com
- Link to NephroTube social media accounts: https://nephrotube.blogspot.com/p/join-nephrotube-on-social-media.html
Flu Vaccine Alert in Bangalore Karnatakaaddon Scans
As flu season approaches, health officials in Bangalore, Karnataka, are urging residents to get their flu vaccinations. The seasonal flu, while common, can lead to severe health complications, particularly for vulnerable populations such as young children, the elderly, and those with underlying health conditions.
Dr. Vidisha Kumari, a leading epidemiologist in Bangalore, emphasizes the importance of getting vaccinated. "The flu vaccine is our best defense against the influenza virus. It not only protects individuals but also helps prevent the spread of the virus in our communities," he says.
This year, the flu season is expected to coincide with a potential increase in other respiratory illnesses. The Karnataka Health Department has launched an awareness campaign highlighting the significance of flu vaccinations. They have set up multiple vaccination centers across Bangalore, making it convenient for residents to receive their shots.
To encourage widespread vaccination, the government is also collaborating with local schools, workplaces, and community centers to facilitate vaccination drives. Special attention is being given to ensuring that the vaccine is accessible to all, including marginalized communities who may have limited access to healthcare.
Residents are reminded that the flu vaccine is safe and effective. Common side effects are mild and may include soreness at the injection site, mild fever, or muscle aches. These side effects are generally short-lived and far less severe than the flu itself.
Healthcare providers are also stressing the importance of continuing COVID-19 precautions. Wearing masks, practicing good hand hygiene, and maintaining social distancing are still crucial, especially in crowded places.
Protect yourself and your loved ones by getting vaccinated. Together, we can help keep Bangalore healthy and safe this flu season. For more information on vaccination centers and schedules, residents can visit the Karnataka Health Department’s official website or follow their social media pages.
Stay informed, stay safe, and get your flu shot today!
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?bkling
Are you curious about what’s new in cervical cancer research or unsure what the findings mean? Join Dr. Emily Ko, a gynecologic oncologist at Penn Medicine, to learn about the latest updates from the Society of Gynecologic Oncology (SGO) 2024 Annual Meeting on Women’s Cancer. Dr. Ko will discuss what the research presented at the conference means for you and answer your questions about the new developments.
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...kevinkariuki227
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Verified Chapters 1 - 19, Complete Newest Version.pdf
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Verified Chapters 1 - 19, Complete Newest Version.pdf
Ethanol (CH3CH2OH), or beverage alcohol, is a two-carbon alcohol
that is rapidly distributed in the body and brain. Ethanol alters many
neurochemical systems and has rewarding and addictive properties. It
is the oldest recreational drug and likely contributes to more morbidity,
mortality, and public health costs than all illicit drugs combined. The
5th edition of the Diagnostic and Statistical Manual of Mental Disorders
(DSM-5) integrates alcohol abuse and alcohol dependence into a single
disorder called alcohol use disorder (AUD), with mild, moderate,
and severe subclassifications (American Psychiatric Association, 2013).
In the DSM-5, all types of substance abuse and dependence have been
combined into a single substance use disorder (SUD) on a continuum
from mild to severe. A diagnosis of AUD requires that at least two of
the 11 DSM-5 behaviors be present within a 12-month period (mild
AUD: 2–3 criteria; moderate AUD: 4–5 criteria; severe AUD: 6–11 criteria).
The four main behavioral effects of AUD are impaired control over
drinking, negative social consequences, risky use, and altered physiological
effects (tolerance, withdrawal). This chapter presents an overview
of the prevalence and harmful consequences of AUD in the U.S.,
the systemic nature of the disease, neurocircuitry and stages of AUD,
comorbidities, fetal alcohol spectrum disorders, genetic risk factors, and
pharmacotherapies for AUD.
micro teaching on communication m.sc nursing.pdfAnurag Sharma
Microteaching is a unique model of practice teaching. It is a viable instrument for the. desired change in the teaching behavior or the behavior potential which, in specified types of real. classroom situations, tends to facilitate the achievement of specified types of objectives.
NVBDCP.pptx Nation vector borne disease control programSapna Thakur
NVBDCP was launched in 2003-2004 . Vector-Borne Disease: Disease that results from an infection transmitted to humans and other animals by blood-feeding arthropods, such as mosquitoes, ticks, and fleas. Examples of vector-borne diseases include Dengue fever, West Nile Virus, Lyme disease, and malaria.
Explore natural remedies for syphilis treatment in Singapore. Discover alternative therapies, herbal remedies, and lifestyle changes that may complement conventional treatments. Learn about holistic approaches to managing syphilis symptoms and supporting overall health.
Are There Any Natural Remedies To Treat Syphilis.pdf
Genome in a Bottle
1. Genome in a Bottle
Justin Zook and Marc Salit
NIST Genome-Scale Measurements Group
JIMB
October 18, 2016
2. Genome in a Bottle Consortium
Whole Genome Variant Calling
Sample
gDNA isolation
Library Prep
Sequencing
Alignment/Mapping
Variant Calling
Confidence Estimates
Downstream Analysis
• gDNA reference materials to
evaluate performance
– materials certified for their variants
against a reference sequence, with
confidence estimates
• established consortium to develop
reference materials, data, methods,
performance metrics
• Characterized Pilot Genome
NA12878
• Ashkenazim Trio, Asian son from
PGP released in September!
genericmeasurementprocess
3. In September, we released 4 new
GIAB RM Genomes.
• PGP Human Genomes
– AJ son
– AJ trio
– Asian son
• Parents also characterized
National I nstituteof S tandards & Technology
Report of I nvestigation
Reference Material 8391
Human DNA for Whole-Genome Variant Assessment
(Son of Eastern European Ashkenazim Jewish Ancestry)
This Reference Material (RM) is intended for validation, optimization, and process evaluation purposes. It consists
of a male whole human genome sample of Eastern European Ashkenazim Jewish ancestry, and it can be used to assess
performance of variant calling from genome sequencing. A unit of RM 8391 consists of a vial containing human
genomic DNA extracted from a single large growth of human lymphoblastoid cell line GM24385 from the Coriell
Institute for Medical Research (Camden, NJ). The vial contains approximately 10 µg of genomic DNA, with the peak
of the nominal length distribution longer than 48.5 kb, as referenced by Lambda DNA, and the DNA is in TE buffer
(10 mM TRIS, 1 mM EDTA, pH 8.0).
This material is intended for assessing performance of human genome sequencing variant calling by obtaining
estimates of true positives, false positives, true negatives, and false negatives. Sequencing applications could include
whole genome sequencing, whole exome sequencing, and more targeted sequencing such as gene panels. This
genomic DNA is intended to be analyzed in the same way as any other sample a lab would process and analyze
extracted DNA. Because the RM is extracted DNA, it is not useful for assessing pre-analytical steps such as DNA
extraction, but it does challenge sequencing library preparation, sequencing machines, and the bioinformatics steps of
mapping, alignment, and variant calling. This RM is not intended to assess subsequent bioinformatics steps such as
functional or clinical interpretation.
Information Values: Information values are provided for single nucleotide polymorphisms (SNPs), small insertions
and deletions (indels), and homozygous reference genotypes for approximately 88 % of the genome, using methods
similar to described in reference 1. An information value is considered to be a value that will be of interest and use to
the RM user, but insufficient information is available to assess the uncertainty associated with the value. We describe
and disseminate our best, most confident, estimate of the genotypes using the data and methods currently available.
These data and genomic characterizations will be maintained over time as new data accrue and measurement and
informatics methods become available. The information values are given as a variant call file (vcf) that contains the
high-confidence SNPs and small indels, as well as a tab-delimited “bed” file that describes the regions that are called
high-confidence. Information values cannot be used to establish metrological traceability. The files referenced in this
report are available at the Genome in a Bottle ftp site hosted by the National Center for Biotechnology Information
(NCBI). The Genome in a Bottle ftp site for the high-confidence vcf and high confidence regions is:
4. We’re also releasing a
Microbial Genome RM
National I nstituteof S tandards & Technology
Report of I nvestigation
Reference Material 8375
Microbial Genomic DNA Standards for Sequencing Performance Assessment
(MG-001, MG-002, MG-003, MG-004)
This Reference Material (RM) is intended for validation, optimization, process evaluation, and performance
assessment of whole genome sequencing. A unit of RM 8375 consists of four vials. Each vial contains a different
microbial genomic DNA sample (MG-001 Salmonella Typhimurium LT2, MG-002 Staphylococcus aureus, MG-003
Pseudomonas aeruginosa, and MG-004 Clostridium sporogenes). Each vial contains approximately 2 µg of microbial
genomic DNA; with the peak of the nominal length distribution longer than 48.5 kb, as referenced by Lambda DNA;
in TE buffer (10 mM TRIS, 0.1 mM EDTA, pH 8.0).
This material is intended to help assess performance of high-throughput DNA sequencing methods. This genomic
DNA is intended to be analyzed in the same way as any other sample a laboratory would analyze extracted DNA, such
as through the use of a genome assembly or variant calling bioinformatics pipelines. Because the RM is extracted
DNA, it does not assess pre-analytical steps such as DNA extraction. It does, however, challenge sequencing library
preparation, sequencing machines, base calling algorithms, and the subsequent bioinformatics analyses such as variant
calling. This RM is not intended to assess other bioinformatics steps such as genome assembly, strain identification,
phylogenetic analysis, or genome annotation.
Information Values: Information values are currently provided for the whole genome sequence to enable
performance assessment of variant calling and assembly methods. An information value is considered to be a value
that will be of interest and use to the RM user, but insufficient information is available to assess the uncertainty
associated with the value. We describe and disseminate our best, most confident, estimate of the assembly using the
data and methods available at present [1]. Information values cannot be used to establish metrological traceability.
The genome sequence files referenced in this Report of Investigation are available at:
MG-001 Salmonella Typhimurium LT2
https://github.com/usnistgov/NIST_Micro_Genomic_RM_Data/MG001/ref_genome/MG001_v1.00.fasta
MG-002 Staphylococcus aureus
This Reference Material (RM) is
intended for validation,
optimization, process
evaluation, and performance
assessment of whole genome
sequencing.
• Salmonella Typhimurium
• Pseudomonas aeruginosa
• Staphylococcus aureus
• Clostridium sporogenes
5. Bringing Principles of Metrology
to the Genome
• Reference materials
– DNA in a tube you can buy from NIST
– NA12878 pilot sample, now 2 PGP-
sourced trios
• Extensive state-of-the-art
characterization
– as good as we can get for small
variants
– arbitrated “gold standard” calls for
SNPs, small indels
• “Upgradable” as technology
develops
• Analysis of all samples ongoing as
technology develops
• PGP genomes suitable for
commercial derived products
• Developing benchmarking tools and
software
– with GA4GH
• Samples being used to develop and
demonstrate new technology
6. NIST Reference Materials
Genome PGP ID Coriell ID NIST ID NIST RM #
CEPH
Mother/Daughter
N/A GM12878 HG001 RM8398
AJ Son huAA53E0 GM24385 HG002 RM8391
(son)/RM8392
(trio)
AJ Father hu6E4515 GM24149 HG003 RM8392 (trio)
AJ Mother hu8E87A9 GM24143 HG004 RM8392 (trio)
Asian Son hu91BD69 GM24631 HG005 RM8393
Asian Father huCA017E GM24694 N/A N/A
Asian Mother hu38168C GM24695 N/A N/A
7. Data for GIAB PGP Trios
Dataset Characteristics Coverage Availability Most useful for…
Illumina Paired-end WGS 150x150bp
250x250bp
~300x/individual
~50x/individual
on SRA/FTP SNPs/indels/some SVs
Complete Genomics 100x/individual on SRA/ftp SNPs/indels/some SVs
SOLiD 5500W WGS 50bp single end 70x/son on FTP SNPs
Illumina Paired-end WES 100x100bp ~300x/individual on SRA/FTP SNPs/indels in exome
Ion Proton Exome 1000x/individual on SRA/FTP SNPs/indels in exome
Illumina Mate pair ~6000 bp insert ~30x/individual on FTP SVs
Illumina “moleculo” Custom library ~30x by long fragments on FTP SVs/phasing/assembly
Complete Genomics LFR 100x/individual on SRA/FTP SNPs/indels/phasing
10X Linked reads 30-45x/individual on FTP SNPs/SVs/phasing/assembly
PacBio ~10kb reads ~70x on AJ son, ~30x on
each AJ parent
on SRA/FTP SVs/phasing/assembly/STRs
Oxford Nanopore 5.8kb 2D reads 0.05x on AJ son on FTP SVs/assembly
Nabsys 2.0 ~100kbp N50 nanopore
maps
70x on AJ son SVs/assembly
BioNano Genomics 200-250kbp optical map
reads
~100x/AJ individual; 57x on
Asian son
on FTP SVs/assembly
8. Dataset AJ Son AJ Parents Chinese son Chinese parents NA12878
Illumina Paired-end
X X X X X
Illumina Long Mate
pair
X X X X X
Illumina “moleculo”
X X X X X
Complete Genomics
X X X X X
Complete Genomics
LFR
X X X
Ion exome
X X X X
BioNano
X X X X
10X
X X X
PacBio
X X X
SOLiD single end
X X X
Illumina exome
X X X X
Nabsys
X X
Oxford Nanopore
X
10. Integration Methods to Establish Benchmark Variant
Calls
Candidate variants
Concordant variants
Find characteristics of bias
Arbitrate using evidence of bias
Confidence Level Zook et al., Nature Biotechnology, 2014.
11. Integration Methods to Establish Benchmark Variant
Calls
Candidate variants
Concordant variants
Find characteristics of bias
Arbitrate using evidence of bias
Confidence Level Zook et al., Nature Biotechnology, 2014.
12. New Integration Methods to Establish Benchmark
Variant Calls for GRCh38
• Comparison with PG
– ~300 differences not near filtered sites
in either callset (3x GRCh37)
– Appears to result from fewer input
callsets into PG
• Future work
– How can we use ALT loci?
– How to represent variation with
respect to ALT loci?
– How to benchmark variants called on
ALT loci?
• Illumina and 10X
– Map reads to GRCh38 with decoy but
no ALT loci
– Call variants vs. GRCh38
• Complete Genomics, SOLiD, Ion
– Convert vcf and callable bed from
GRCh37 to GRCh38
– Use GenomeWarp by Cory McLean,
Verily
• Accounts for changed bases
• https://github.com/verilylifesciences/gen
omewarp
• ~100k fewer calls than GRCh37
13. Evolution of high-confidence calls
Calls
HC
Regions HC Calls
HC
indels
Concordant
with PG
NIST-only
in beds
PG-only
in beds PG-only
v2.19 2.22 Gb 3153247 352937 3030703 87 404 1018795
v3.1 2.55 Gb 3453085 - 3330275 71 82 719223
v3.2.2 2.53 Gb 3512990 335594 3391783 57 52 657715
v3.3 2.57 Gb 3566076 358753 3441361 40 60 608137
v3.3.1 2.58 Gb 3746191 505169 3550914 50 67 499023
14. Newest calls (v3.3.1) vs. 2015 calls (v2.19)
V3.3.1
• 2.584Gb high-confidence
• 3550914 match PG
• 499023 PG calls outside high conf
• 195277 calls not in PG
• After excluding low confidence regions
and regions around filtered PG calls:
– 50 calls not in PG
– 67 extra PG calls
V2.19
• 2.216 Gb high-confidence
• 3030717 match PG
• 1018795 PG calls outside high conf
• 122359 calls not in PG
• After excluding low confidence regions
and regions around filtered PG calls:
– 87 calls not in PG
– 404 extra PG calls
15. Newest calls (v3.3.1) vs. 2015 calls (v2.19)
Example vcf (verily) Stratified
V3.3.1
• 16% of SNPs not assessed
– 23% of SNPs in RefSeq coding
– 52% of SNPs in “bad promoters”
• 68% of indels not assessed
– 2.0% error rate
• 17% FP rate in regions homologous to
decoy
V2.19
• 27% of SNPs not assessed
– 36% of SNPs in RefSeq coding
– 82% of SNPs in “bad promoters”
• 78% of indels not assessed
– 1.2% error rate
• 0.2% FP rate in regions homologous to
decoy
16. Principles of Integration Process
• Form sensitive variant calls from
each dataset
• Define “callable regions” for each
callset
• Filter calls from each method
with annotations unlike
concordant calls
• Compare high-confidence calls to
other callsets and manually
inspect subset of differences
– vs. pedigree-based calls
– vs. common pipelines
– Trio analysis
• When benchmarking a new
callset against ours, most
putative FPs/FNs should actually
be FPs/FNs
17. Criteria for including new callsets
• Form sensitive variant calls from
each dataset
• Define “callable regions” for each
callset
• Good coverage and MapQ
• Use knowledge about technology and
manual inspection to exclude repetitive
regions difficult for each dataset
• For new callsets, ensure most FNs in
callable regions relative to current high-
confidence calls are questionable in the
current calls
• Filter calls from each method
with annotations unlike
concordant calls
– Annotations for which outliers are
expected to indicate bias should be
selected for each callset
18. Global Alliance for Genomics and Health Benchmarking Task
Team
• Developed standardized
definitions for performance
metrics like TP, FP, and FN.
• Developing sophisticated
benchmarking tools
• Integrated into a single framework
with standardized inputs and
outputs
• Standardized bed files with
difficult genome contexts for
stratification
https://github.com/ga4gh/benchmarking-tools
Variant types can change when decomposing
or recomposing variants:
Complex variant:
chr1 201586350 CTCTCTCTCT CA
DEL + SNP:
chr1 201586350 CTCTCTCTCT C
chr1 201586359 T A
Credit: Peter Krusche, Illumina
GA4GH Benchmarking Team
20. GA4GH benchmarking on Github
In-progress benchmarking standards document: doc/standards
Description of intermediate formats: doc/ref-impl
Truthset descriptions and download links: resources/high-confidence-sets
Stratification bed files and descriptions: resources/stratification-bed-files
Python-code for HTML reporting and running benchmarks: reporting/basic
Please contribute / join the discussion!
https://github.com/ga4gh/benchmarking-tools
Credit: Peter Krusche, Illumina
GA4GH Benchmarking Team
22. FN rates high in some tandem repeats
1x0.3x 10x3x 30x
11to50bp51to200bp
2bp unit repeat
3bp unit repeat
4bp unit repeat
2bp unit repeat
3bp unit repeat
4bp unit repeat
FN rate vs. average
23. Approaches to Benchmarking Variant Calling
• Well-characterized whole genome Reference Materials
• Many samples characterized in clinically relevant regions
• Synthetic DNA spike-ins
• Cell lines with engineered mutations
• Simulated reads
• Modified real reads
• Modified reference genomes
• Confirming results found in real samples over time
24. Challenges in Benchmarking Variant Calling
• It is difficult to do robust benchmarking of tests designed to detect
many analytes (e.g., many variants)
• Easiest to benchmark only within high-confidence bed file, but…
• Benchmark calls/regions tend to be biased towards easier variants
and regions
– Some clinical tests are enriched for difficult sites
• Always manually inspect a subset of FPs/FNs
• Stratification by variant type and region is important
• Always calculate confidence intervals on performance metrics
25. How can we extend this approach to structural
variants?
Similarities to small variants
• Collect callsets from multiple
technologies
• Compare callsets to find calls
supported by multiple technologies
Differences from small variants
• Callsets have limited sensitivity
• Variants are often imprecisely
characterized
– breakpoints, size, type, etc.
• Representation of variants is poorly
standardized, especially when complex
• Comparison tools in infancy
26. Preliminary process for integrated deletions
Merge
deletions
within 1kb
Rank calls by
closeness of
predicted size
to median size
and select call
in each region
from best
callset
Find calls
supported by
2+
technologies
with size
within 20%
Filter calls
overlapping
seg dups,
reference N’s,
or with call
with predicted
size 2x larger
ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_DraftIntegratedDeletionsgt19bp_v0.1.8
<50bp 50-100bp 100-1000bp 1kb-3kb >3kbp
Pre-filtered calls 2627 1600 2306 385 389
Post-filtered calls 2548 1448 1996 297 262
27. Proposed improved integration process
“sequence-
resolved” calls
SV Discovery
Imprecise SV
calls
Sequence-based
comparison
SV corroboration
methods (e.g.,
parliament, svviz,
nabsys, bionano)
Heuristics to form
tiers of benchmark
SVs
Machine learning to
form benchmark
SVs
Comparison of
all candidate
calls
(SURVIVOR/svco
mpare)
SV Comparison SV Corroboration Form SV benchmark calls
SV refinement? (e.g.,
parliament?, others?)
28. Sequence-resolved candidates
Currently sequence-resolved output
• MSPacMon
• Spiral (now only small have sequence)
• Fermikit (now only small have sequence)
• Cortex
• CG (small)
• GATK (small)
• Freebayes (small)
• Pindel
• manta
Potentially sequence-resolved output
• Newly submitted
– PBRefine
– Some MetaSV
– Assemblytics
– 10X deletions
• Possible
– Parliament?
– PBHoney
– Smrt-sv.dip
– Breakseq?
29. Draft de novo assemblies for AJ Son
Data Method
Contig
N50
Scaffold
N50
Number
Scaffolds
Total
Size
PacBio Falcon 5.3 Mb 5.3 Mb 13231 3.04 Gb
PacBio PBcR 4.5 Mb 4.5 Mb 12523 2.99 Gb
PacBio+
BioNano
Falcon+
BioNano 4.1 Mb 22.7 Mb 478 2.38 Gb
PacBio+
Dovetail
Falcon+
HiRise 5.3 Mb 12.9 Mb 12459 3.04 Gb
PacBio+
Dovetail
PBcR+
HiRise 4.1 Mb 20.6 Mb 10491 2.99 Gb
Illumina DISCOVAR 81 kb 149 kb 1.06M 3.13 Gb
Illumina+
Dovetail
DISCOVAR+
HiRise 85 kb 12.9 Mb 1.03M 3.15 Gb
10X Supernova 106 kb 15.2 Mb 1360 2.73 Gb
Credits for assemblies:
Ali Bashir, Mt. Sinai
Jason Chin, PacBio
Alex Hastie, BioNano
Serge Koren, NHGRI
Adam Phillippy, NHGRI
Kareina Dill, Dovetail
Noushin Ghaffari, TAMU
10X Genomics
Assembly-based SV calls:
MSPAC
Assemblytics
PBRefineIMPORTANT NOTE: These are draft assemblies and statistics should not be used to
compare quality of assembly methods.
30. New Samples
Additional ancestries
• Shorter term
– Use existing PGP individual samples
– Use existing integration pipeline
• Data-based selection
– E.g., PCA of existing samples
• 3 to 8 new samples
• Longer term
– Recruit large family
– Recruit trios from other ancestry groups
Cancer samples
• Longer term
• Make PGP-consented tumor and
normal cell lines from same individual
• Select tumor with diversity of mutation
types
31.
32. Acknowledgements
• NIST
– Marc Salit
– Jenny McDaniel
– Lindsay Vang
– David Catoe
• Genome in a Bottle Consortium
• GA4GH Benchmarking Team
• FDA
– Liz Mansfield
– Zivana Tevak
– David Litwack
33. For More Information
www.genomeinabottle.org - sign up for general GIAB and Analysis Team google group
emails
github.com/genome-in-a-bottle – Guide to GIAB data & ftp
www.slideshare.net/genomeinabottle
www.ncbi.nlm.nih.gov/variation/tools/get-rm/ - Get-RM Browser
Data: http://www.nature.com/articles/sdata201625
Global Alliance Benchmarking Team
– https://github.com/ga4gh/benchmarking-tools
Public workshops
– Possible SV integration mini-workshop in Spring 2017
– Next large workshop in Fall 2017
NIST postdoc opportunities available!
Justin Zook: jzook@nist.gov
Marc Salit: salit@nist.gov