WGS data for bacterial typing
This document discusses using whole genome sequencing (WGS) data for bacterial strain typing and phylogenetic analysis. It covers:
1) Bacterial genomes consist of DNA made up of 4 nucleotides (A, C, T, G) that can be sequenced. Genes encode proteins and make up most of bacterial genomes.
2) Mutations like single nucleotide changes can be used to differentiate bacterial strains. Molecular methods like MLST, MLVA, and core genome MLST analyze categorical or continuous differences in bacterial sequences.
3) As sequencing technology advanced, it became possible to generate and analyze whole bacterial genomes, allowing highly discriminatory strain typing and reconstruction of bacterial phylogenies based on single nucleotide polymorph
The human genome is full of repeated DNA sequences which come in various sizes and are classified according to the length of the core repeat units, the number of contiguous repeat units, and/or the overall length of the repeat region. DNA regions with short repeat units (usually 2-6 bp in length) are called Short Tandem Repeats (STR).
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Torsten Seemann
Invited talk at the Australian Society for Microbiology Annual Conference 2014 on "FriPan" our tool for visualizing bacterial pan genomes across 10-100s of isolates.
Two approaches (clone by clone & whole genome shotgun).
Types of DNA sequencing ( 1st, next and 3rd).
Crop genomes sequenced . (Example :Arabidopsis,Rice, Pigeon pea)
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
The molecular microscopes that we use to examine human biology have advanced significantly with the advent of next generation sequencing. RNA-seq is one application of this technology that leads to a very high-resolution view of the transcriptome. With these new technologies come increased data analysis and data handling burdens as well as the promise of new discovery. These slides present a high-level overview of the RNA-seq technology with a focus on the analysis approaches, quality control challenges, and experimental design.
DNA library is a collection of DNA, three types of recombinant DNA libraries are present
1) Genomic library
2) Chromosomal library
3) Complementary DNA library
Random RNA interactions control protein expression in prokaryotesPaul Gardner
Presented at the NZSBMB/NZMS Conference in Christchurch 2016
CustomScience Award
A core assumption of gene expression analysis is that mRNA abundances broadly correlate with protein abundance, but these two can be imperfectly correlated. Some of the discrepancy can be accounted for by two important mRNA features: codon usage and mRNA secondary structure. We present a new global factor, called mRNA:ncRNA avoidance, and provide evidence that avoidance increases translational efficiency. We demonstrate a strong selection for the avoidance of stochastic mRNA:ncRNA interactions across prokaryotes, and that these have a greater impact on protein abundance than mRNA structure or codon usage. By generating synonymously variant green fluorescent protein (GFP) mRNAs with different potential for mRNA:ncRNA interactions, we demonstrate that GFP levels correlate well with interaction avoidance. Therefore, taking stochastic mRNA:ncRNA interactions into account enables precise modulation of protein abundance.
STS stands for sequence tagged site which is short DNA sequence, generally between 100 and 500 bp in length, that is easily recognizable and occurs only once in the chromosome or genome being studied.
The human genome is full of repeated DNA sequences which come in various sizes and are classified according to the length of the core repeat units, the number of contiguous repeat units, and/or the overall length of the repeat region. DNA regions with short repeat units (usually 2-6 bp in length) are called Short Tandem Repeats (STR).
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Torsten Seemann
Invited talk at the Australian Society for Microbiology Annual Conference 2014 on "FriPan" our tool for visualizing bacterial pan genomes across 10-100s of isolates.
Two approaches (clone by clone & whole genome shotgun).
Types of DNA sequencing ( 1st, next and 3rd).
Crop genomes sequenced . (Example :Arabidopsis,Rice, Pigeon pea)
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
The molecular microscopes that we use to examine human biology have advanced significantly with the advent of next generation sequencing. RNA-seq is one application of this technology that leads to a very high-resolution view of the transcriptome. With these new technologies come increased data analysis and data handling burdens as well as the promise of new discovery. These slides present a high-level overview of the RNA-seq technology with a focus on the analysis approaches, quality control challenges, and experimental design.
DNA library is a collection of DNA, three types of recombinant DNA libraries are present
1) Genomic library
2) Chromosomal library
3) Complementary DNA library
Random RNA interactions control protein expression in prokaryotesPaul Gardner
Presented at the NZSBMB/NZMS Conference in Christchurch 2016
CustomScience Award
A core assumption of gene expression analysis is that mRNA abundances broadly correlate with protein abundance, but these two can be imperfectly correlated. Some of the discrepancy can be accounted for by two important mRNA features: codon usage and mRNA secondary structure. We present a new global factor, called mRNA:ncRNA avoidance, and provide evidence that avoidance increases translational efficiency. We demonstrate a strong selection for the avoidance of stochastic mRNA:ncRNA interactions across prokaryotes, and that these have a greater impact on protein abundance than mRNA structure or codon usage. By generating synonymously variant green fluorescent protein (GFP) mRNAs with different potential for mRNA:ncRNA interactions, we demonstrate that GFP levels correlate well with interaction avoidance. Therefore, taking stochastic mRNA:ncRNA interactions into account enables precise modulation of protein abundance.
STS stands for sequence tagged site which is short DNA sequence, generally between 100 and 500 bp in length, that is easily recognizable and occurs only once in the chromosome or genome being studied.
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...ExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in the Microbiological Testing & Traceability for Foodborne Pathogens. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management -23-25 May 2016, Rome, Italy.
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...ExternalEvents
http://tiny.cc/faowgsworkshop
Applications of genome sequencing technology on food safety management - Denmark. Presentation from the FAO expert workshop on practical applications of Whole Genome Sequencing (WGS) for food safety management - 7-8 December 2015, Rome, Italy.
Automated assemblies are one thing, good assemblies are another!
This presentation covers the basic concepts of using paired-end and mate pair read data to identify mis-assemblies. It also covers some of the tools for visualising and correcting mis-assemblies. An attempt is made to rate these tools on their feature set and scalability beyond small (<15MBase) genomes and provides some closing remakes about what the ideal genome assembly editing tool should have in terms of features.
This was a talk given on 2014-06-19 for the Genome Center’s Bioinformatics Core as part of a 1 week workshop on using Galaxy. It concerns the Assemblathon projects as well as other aspects relating to genome assembly.
A version of this talk is also available on Slideshare with embedded notes.
Note, this is an evolving talk. There are older and newer versions of the talk also available on slideshare.
Presentation from the ECDC expert consultation on Whole Genome Sequencing organised by the European Centre of Disease Prevention and Control - Stockholm, 19 November 2015
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...ExternalEvents
http://tiny.cc/faowgsworkshop
Applications of genome sequencing technology on food safety management-United States of America. Presentation from the FAO expert workshop on practical applications of Whole Genome Sequencing (WGS) for food safety management - 7-8 December 2015, Rome, Italy.
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Torsten Seemann
Using Snippy to call variants in bacterial short read datasets via alignment to reference, and then using these alignments to produce core SNP alignments for phylogenomics.
Presentation from the ECDC expert consultation on Whole Genome Sequencing organised by the European Centre of Disease Prevention and Control - Stockholm, 19 November 2015
Apollo is a web-based, collaborative genomic annotation editing platform. We need annotation editing tools to modify and refine precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically.
This presentation is an introduction to how the manual annotation process takes place using Apollo. It is addressed to the members of the American Chestnut & Chinese Chestnut Genomics research community.
Genome to pangenome : A doorway into crops genome explorationKiranKm11
This seminar underpins the significance and need of formulating pan-genome oriented crop improvement strategies over single reference genome based studies. Pangenome graphs uncovers large repository of genetic variation which could we useful for planning and executing strategic crop improvement programmed
Apollo: A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
Apollo is a web-based, collaborative genomic annotation editing platform. We need annotation editing tools to modify and refine precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically.
This presentation is an introduction to how the manual annotation process takes place using Apollo. It is addressed to the members of the Manakin Genomics research community.
Alzheimer’s disease (AD) is a devastating neurodegenerative disease that is genetically complex. Although great progress has been made in identifying fully penetrant mutations in genes that cause early-onset AD, these still represent a very small percentage of AD cases. Large-scale, genome-wide association studies (GWAS) have identified at least 20 additional genetic risk loci for the more common form: late-onset AD. However, the identified SNPs are typically not the actual risk variants, but are in linkage disequilibrium with the presumed causative variants [1].
To help identify causative genetic variants, we have combined highly accurate, long-read sequencing with hybrid-capture technology. In this collaborative webinar*, we present this method and show how combining IDT xGen® Lockdown® Probes with PacBio SMRT® Sequencing allows targeting and sequencing of candidate genes from genomic DNA and corresponding transcripts from cDNA. Using a panel of target capture probes for 35 AD candidate genes, we demonstrate the power of this approach by looking at data for two individuals with AD. Some additional benefits of this method include the ability to leverage long reads, phase heterozygous variants, and link corresponding transcript isoforms to their respective alleles.
Reference: 1. Van Cauwenberghe C, Van Broeckhoven C, Sleegers K. (2016) The genetic landscape of Alzheimer disease: clinical implications and perspectives. Genet Med, 18(5):421–430.
* This presentation represents a collaboration between Pacific Biosciences and Integrated DNA Technologies. The individual opinions expressed may not reflect shared opinions of Pacific Biosciences and Integrated DNA Technologies.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
The French Revolution Class 9 Study Material pdf free download
2015 12-09 nmdd
1. WGS data for bacterial typing
Karin Lagesen
@karinlag
NMDD presentation
2015-12-09
2. Bacterial genomes
Four letters: A, C, T, G
Two strands complementary:
A : T, C : G
Genes: DNA that encode for proteins
Often regarded as the “functional”
regions of the genome
Bacteria: genes approx 90% of the genome
ATCCGGAG GAGGACGG
Mutations: single letter
character changes
TGAGGGACCAAACCGAT
TGAGGGACGAAACCGAT
Bacterial
genomes are
most often
circular
Campylobacter
jejuni genome:
1.68 million
basepairs
3. Bacterial typing
Typing: identifying a bacterial isolate at the strain
level
Goal: discriminate between different bacterial
isolates
● Effectively: a distance measure is often sought
Traditionally done via distinguishing based on
phenotypic characteristics
Molecular strain typing has taken over
Goal: figure out how different sequences are
4. Advances in bacterial genomics
Phyla Number
genomes
% of total
Actinobacteria 4059 13
Bacteroidetes/
Chlorobi group
932 3
Cyanobacteria 340 1
Firmicutes 9628 31
Proteobacteria 14,268 46
Spirochaetes 525 2
Other 1500 5
Number of sequenced genomes for 6 selected phyla and the percent of all genomes found
in the phyla
Source: GenBank prokaryotes.txt file downloaded 4 February 2015
Land et. al., Functional & Integrative Genomics, 2015
8. MLVA – Multi-locus VNTR analysis
Find loci with known
repeats
Discover copy number
of repeat – becomes
identifier for loci
Strain identified by
copy numbers for
defined set of loci
Similarity is # of
idential loci numbers
http://www.applied-maths.com/applications/mlva
9. Multi Locus Sequence Typing
Set of genes
Each variant is assigned
a categorical number
Cluster types on #
shared variants
Numbers becomes
Sequence type (ST)
Similarity is # of idential
loci numbers
MLST: 7 genes
rMLST: ribosomal genes
http://www.applied-maths.com/applications/mlst
11. Phylogeny – tracing ancestry
Many algorithms
● Distance matrix methods (sequence similarity)
● Maximum parsimony methods
● Maximum likelyhood methods
Based on similarity between sequences
Can become very computationally intensive,
especially for longer sequences (e.g. WGS)
Examples:
● 16S rRNA phylogenetic trees
● Multi Locus Sequence Analyses – phylogenies of
concatenated MLST genes
14. Ideal whole genome comparisons
Bacterial species definition:
● 70% of genome should be able to anneal to each
other – i.e. «match»
Converted to whole genome sequences:
● Based on % identity between conserved regions
● Average Nucleotide Identity~95 %
All-against-all sequence alignment is required
● Time complexity: O(n2)
● Not feasible in most cases
Alternatives:
● Focus on core regions of the genome (core genes)
● Find just the variations (SNPs), make trees from those
15. Core genome – # ”shared genes”
Sequences q and s have matching region
Regarded as ”shared” iff k and n are large
enough
Similarity = # ”shared” genes
s
q
length of match (n)
% of matching characters
in matching region (k)
17. Core SNP trees
Approach A: External core gene set
● Map each genome’s reads to genes
● Examine reads mapping to the same gene to
find sequence variations (variant calling)
● Create genome/SNP matrix
Approach B: Intrinsic core set
● Use suffix graphs to get Maximal Unique Matches
● Extend alignments from MUMs to get shared
core set
● Find variants in alignments
● Create genome/SNP matrix
Similarity: genomes that share the same SNP
Snippy
snpTree
Parsnp
18. Campylobacter jejuni, core SNP tree
Maximum likelihood phylogeny derived from the core-genome alignment of 131 C. jejuni
isolates. Isolates with a known hyper-invasive phenotype have their taxa identifier names
highlighted in red. The three clades identified as containing hyper-invasive strains have
branches indicated in red
Baig et al. BMC Genomics 2015 16:852 doi:10.1186/s12864-015-2087-y
19. k-mer based SNP trees
k-mer: piece of sequence, k nucleotides long
Split genomes/reads into k-mers
Find k-mers in different genomes that vary in their
middle character
Create genome/SNP matrix
● Note: this is not core, but pairwise all-against-all
Create trees
Similarity is # shared SNPs
Genome A: TGAGGGACCAAACCGAT
Genome B: TGAGGGACGAAACCGAT
kSNP
21. Classification of distance measures
Categorical
● Loci defined as either equal/different
● Similarity calculated as # shared loci
Ordinal
● Regions defined as “shared” based on sequence
similarity levels
● Similarity calculated as # shared sequences
Continous
● Find all sequence differences (SNPs)
● Similarity calculated as # shared SNPs
22. (Some) sources of variation
Small changes
● Nucleotide substitution
● Insertions and deletions
Recombination
● Shuffling regions of the genome
“Jumping genes”: insertion sequences and transposons
● Small sequences that jump
● Can move other sequences with them
24. Gene tree != genome tree
Rose et. Al., Biology direct 2007
25. So… what do we do?
No real answers (yet)
Could sequence the lot, but is expensive
However: gain so much more with sequencing
● Very high discriminatory power (resolution)
● Access to virulence genes, ++
Be aware of possible fragility in MLST data
● One mutation = changed ST
● Should probably double check STs with MLSA
Compare MLSTs with WGS data, see how stable the
MLSTs are to the whole genome