Keynote presentation from Plant and Pathogen Bioinformatics workshop at EMBL-EBI, 8-11 July 2014
Slides and teaching material are available at https://github.com/widdowquinn/Teaching-EMBL-Plant-Path-Genomics
Slides from a Comparative Genomics and Visualisation course (part 2) presented at the University of Dundee, 11th March 2014. Other materials are available at GitHub (https://github.com/widdowquinn/Teaching)
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesEBI
Event: Plant and Animal Genomes Conference
Speaker: Bert Overduin
The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. As of Ensembl release 65 (December 2011), 56 species are fully supported. Ensembl data are accessible through an interactive web site, flat files, the data mining tool BioMart, direct database querying and a set of Perl APIs. Moreover, Ensembl is not just a data visualisation tool, but a suite of programs for data production (e.g. gene calling and comparative genomics) that can be deployed individually according to the needs of an individual community. Ensembl Genomes (http://www.ensemblgenomes.org) consists of five sub-portals (for bacteria, protists, fungi, plants and invertebrate metazoa) designed to complement the genomes available in Ensembl. It currently contains data for over 300 species. Many of the databases that support Ensembl Genomes have been built by, or in close collaboration with, groups that maintain specialist data resources for individual species, and we are actively seeking to extend the range of these collaborations. Together Ensembl and Ensembl Genomes offer a single unified interface across the taxonomic space. This presentation will consist of a short introduction to Ensembl and Ensembl Genomes followed by a demonstration of the respective websites and the BioMart data retrieval tool. Special attention will be given to recently developed functionality like the Variant Effect Predictor, which predicts the consequences of substitutions, insertions and deletions on transcripts and protein sequences, and the possibility to visualize your own data by attaching BAM and VCF files (for example).
Genomics on the Half Shell: Making Science more Opensr320
Abstract
Technology has significantly changed how research is done in biology. Along with this shift, it is increasingly easier and advantageous to operate in an open science framework. In this presentation I will begin by providing an overview of our research efforts with particularly attention to challenges in data analysis. Research in our lab focuses on characterizing physiological responses of shellfish to environmental change, examining impacts and adaptive potential from the nucleotide to organism level. A core component of this includes investigating the functional relationship of genetics, epigenetics, and transcription. In our research we leverage several computing infrastructure solutions that I will describe. In addition, our lab practices Open Notebook Science. I will describe the practical aspects of how we accomplish this including addressing some of the concerns and realized advantages. Beyond online lab notebooks, we are continually experimenting with different ways to use online resources to engage with a larger audience and improve science communication. I have found this is a complex balance of time and effort versus impact and will discuss how our lab group attempts to reach this balance.
Bio
Steven Roberts is an Associate Professor in the School of Aquatic and Fishery Sciences where his research centers around characterizing the response of aquatic organisms to environmental change. Prior to coming to the University of Washington, in 2007 he was at the Marine Biological Laboratory in Woods Hole, Massachusetts and received his PhD from the University of Notre Dame. In graduate school he spent most of his time transferring agarose gels, and now he spends most of his time transferring files.
This is the webinar presented on the 14th April as part of the Ensembl Online Webinar series. You can view the recorded webinar on the Ensembl Helpdesk youtube channel https://www.youtube.com/watch?v=blbhuqiiDoA
Keynote presentation from Plant and Pathogen Bioinformatics workshop at EMBL-EBI, 8-11 July 2014
Slides and teaching material are available at https://github.com/widdowquinn/Teaching-EMBL-Plant-Path-Genomics
Slides from a Comparative Genomics and Visualisation course (part 2) presented at the University of Dundee, 11th March 2014. Other materials are available at GitHub (https://github.com/widdowquinn/Teaching)
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesEBI
Event: Plant and Animal Genomes Conference
Speaker: Bert Overduin
The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. As of Ensembl release 65 (December 2011), 56 species are fully supported. Ensembl data are accessible through an interactive web site, flat files, the data mining tool BioMart, direct database querying and a set of Perl APIs. Moreover, Ensembl is not just a data visualisation tool, but a suite of programs for data production (e.g. gene calling and comparative genomics) that can be deployed individually according to the needs of an individual community. Ensembl Genomes (http://www.ensemblgenomes.org) consists of five sub-portals (for bacteria, protists, fungi, plants and invertebrate metazoa) designed to complement the genomes available in Ensembl. It currently contains data for over 300 species. Many of the databases that support Ensembl Genomes have been built by, or in close collaboration with, groups that maintain specialist data resources for individual species, and we are actively seeking to extend the range of these collaborations. Together Ensembl and Ensembl Genomes offer a single unified interface across the taxonomic space. This presentation will consist of a short introduction to Ensembl and Ensembl Genomes followed by a demonstration of the respective websites and the BioMart data retrieval tool. Special attention will be given to recently developed functionality like the Variant Effect Predictor, which predicts the consequences of substitutions, insertions and deletions on transcripts and protein sequences, and the possibility to visualize your own data by attaching BAM and VCF files (for example).
Genomics on the Half Shell: Making Science more Opensr320
Abstract
Technology has significantly changed how research is done in biology. Along with this shift, it is increasingly easier and advantageous to operate in an open science framework. In this presentation I will begin by providing an overview of our research efforts with particularly attention to challenges in data analysis. Research in our lab focuses on characterizing physiological responses of shellfish to environmental change, examining impacts and adaptive potential from the nucleotide to organism level. A core component of this includes investigating the functional relationship of genetics, epigenetics, and transcription. In our research we leverage several computing infrastructure solutions that I will describe. In addition, our lab practices Open Notebook Science. I will describe the practical aspects of how we accomplish this including addressing some of the concerns and realized advantages. Beyond online lab notebooks, we are continually experimenting with different ways to use online resources to engage with a larger audience and improve science communication. I have found this is a complex balance of time and effort versus impact and will discuss how our lab group attempts to reach this balance.
Bio
Steven Roberts is an Associate Professor in the School of Aquatic and Fishery Sciences where his research centers around characterizing the response of aquatic organisms to environmental change. Prior to coming to the University of Washington, in 2007 he was at the Marine Biological Laboratory in Woods Hole, Massachusetts and received his PhD from the University of Notre Dame. In graduate school he spent most of his time transferring agarose gels, and now he spends most of his time transferring files.
This is the webinar presented on the 14th April as part of the Ensembl Online Webinar series. You can view the recorded webinar on the Ensembl Helpdesk youtube channel https://www.youtube.com/watch?v=blbhuqiiDoA
Guest lecture on comparative genomics for University of Dundee BS32010, delivered 21/3/2016
Workshop/other materials available at DOI:10.5281/zenodo.49447
Slides from a Comparative Genomics and Visualisation course (part 1) presented at the University of Dundee, 7th March 2014. Other materials are available at GitHub (https://github.com/widdowquinn/Teaching)
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
The NCBI Boot Camp for Beginners was designed to offer an overview of the NCBI suite of resources. In the first half of the presentation, highlighted databases were covered in four main categories: literature, sequences, genes & genomes and expression & structure. The second half of the class used the apolipoprotein A as a query that was explored through many of the NCBI databases, from identifying the reference sequences to a structural analysis of the Cys130Arg variant.
Introduction to an online resource that displays pre-computed phylogenetic trees of gene families alongside experimental gene function data to facilitate inference of unknown gene function in plants. From the same team that brings you TAIR (The Arabidopsis Information Resource)
GRC Workshop held at Churchill College on Sep 21, 2014. Talk by Bronwen Aken discussing the Ensembl approach to annotating the complete human reference assembly.
Short tutorials on how to use the web-based tool DAVID - Database for Annotation, Visualization and Integrated Discovery) - http://david.abcc.ncifcrf.gov/
DAVID provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes.
What's in a name? Better vocabularies = better bioinformatics?Keith Bradnam
Most of the pain and suffering that occurs in bioinformatics happens when database identifier 'A' in file 1, doesn't quite match database identifier 'B' in file 2...even when they are supposed to be the same identifier.
Things don't always match up for a number of reasons, most of which *should* be under our control. This talk covers a few points relating to this and briefly discusses how we should all be using curated ontologies to describe our data.
Inferring microbial gene function from evolution of synonymous codon usage bi...Fran Supek
Introduction: Thousands of microbial genomes are available, yet even for the model organisms, a sizable portion of the genes have unknown function. Phyletic profiling is a technique that can predict their function by comparing the presence/absence profiles of their homologs across genomes. In addition, prokaryotic genomes contain an evolutionary signature of gene expression levels in the codon usage biases, where highly expressed genes prefer the codons better adapted to the cellular tRNA pools.
Objectives: We aimed to augment the existing phyletic profiling approaches by incorporating more detailed knowledge of gene evolutionary history, and create a very large database of predicted gene functions direcly usable for microbiologists.
Materials & methods: We used the OMA groups of orthologs and the paralogy relationships inferred through OMA's „witness of non-orthology“ rule. Genes were assigned to Gene Ontology categories and the phyletic profiles compared using the CLUS classifier that performs a hierarchical multilabel classification using decision trees. We quantified significant codon biases using a Random Forest randomization test that compares against the composition of intergenic DNA. Codon biases in COG gene families were contrasted between microbes inhabiting different enviroments, while controlling for phylogenetic inertia.
Results: The genomic co-occurence patterns of both the orthologs and the paralogs (the homologs separated by a speciation and by a duplication event, respectively) were informative and synergistic in a phylogenetic profiling setup, even though paralogy relationships are thought to conserve function less well. The resulting ~400,000 gene function predictions for 998 prokaryotes (at FDR<10%)> method to systematically link codon adaptation within COG gene families to microbial phenotypes and environments (thus functionally characterizing the COGs) and experimentally validated the predictions for novel E. coli genes relevant for surviving oxidative, thermal or osmotic stress.
Conclusion: Our work towards ehnancing phylogenetic profiling, as well as developing complementary genomic context approaches, will contribute to prioritizing experimental investigation of microbial gene function, cutting time and cost needed for discovery.
Guest lecture on comparative genomics for University of Dundee BS32010, delivered 21/3/2016
Workshop/other materials available at DOI:10.5281/zenodo.49447
Slides from a Comparative Genomics and Visualisation course (part 1) presented at the University of Dundee, 7th March 2014. Other materials are available at GitHub (https://github.com/widdowquinn/Teaching)
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
The NCBI Boot Camp for Beginners was designed to offer an overview of the NCBI suite of resources. In the first half of the presentation, highlighted databases were covered in four main categories: literature, sequences, genes & genomes and expression & structure. The second half of the class used the apolipoprotein A as a query that was explored through many of the NCBI databases, from identifying the reference sequences to a structural analysis of the Cys130Arg variant.
Introduction to an online resource that displays pre-computed phylogenetic trees of gene families alongside experimental gene function data to facilitate inference of unknown gene function in plants. From the same team that brings you TAIR (The Arabidopsis Information Resource)
GRC Workshop held at Churchill College on Sep 21, 2014. Talk by Bronwen Aken discussing the Ensembl approach to annotating the complete human reference assembly.
Short tutorials on how to use the web-based tool DAVID - Database for Annotation, Visualization and Integrated Discovery) - http://david.abcc.ncifcrf.gov/
DAVID provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes.
What's in a name? Better vocabularies = better bioinformatics?Keith Bradnam
Most of the pain and suffering that occurs in bioinformatics happens when database identifier 'A' in file 1, doesn't quite match database identifier 'B' in file 2...even when they are supposed to be the same identifier.
Things don't always match up for a number of reasons, most of which *should* be under our control. This talk covers a few points relating to this and briefly discusses how we should all be using curated ontologies to describe our data.
Inferring microbial gene function from evolution of synonymous codon usage bi...Fran Supek
Introduction: Thousands of microbial genomes are available, yet even for the model organisms, a sizable portion of the genes have unknown function. Phyletic profiling is a technique that can predict their function by comparing the presence/absence profiles of their homologs across genomes. In addition, prokaryotic genomes contain an evolutionary signature of gene expression levels in the codon usage biases, where highly expressed genes prefer the codons better adapted to the cellular tRNA pools.
Objectives: We aimed to augment the existing phyletic profiling approaches by incorporating more detailed knowledge of gene evolutionary history, and create a very large database of predicted gene functions direcly usable for microbiologists.
Materials & methods: We used the OMA groups of orthologs and the paralogy relationships inferred through OMA's „witness of non-orthology“ rule. Genes were assigned to Gene Ontology categories and the phyletic profiles compared using the CLUS classifier that performs a hierarchical multilabel classification using decision trees. We quantified significant codon biases using a Random Forest randomization test that compares against the composition of intergenic DNA. Codon biases in COG gene families were contrasted between microbes inhabiting different enviroments, while controlling for phylogenetic inertia.
Results: The genomic co-occurence patterns of both the orthologs and the paralogs (the homologs separated by a speciation and by a duplication event, respectively) were informative and synergistic in a phylogenetic profiling setup, even though paralogy relationships are thought to conserve function less well. The resulting ~400,000 gene function predictions for 998 prokaryotes (at FDR<10%)> method to systematically link codon adaptation within COG gene families to microbial phenotypes and environments (thus functionally characterizing the COGs) and experimentally validated the predictions for novel E. coli genes relevant for surviving oxidative, thermal or osmotic stress.
Conclusion: Our work towards ehnancing phylogenetic profiling, as well as developing complementary genomic context approaches, will contribute to prioritizing experimental investigation of microbial gene function, cutting time and cost needed for discovery.
Keynote presentation, 4th February 2015, León, México - part of the 2015 Genomics Research on Plant-Parasite Interactions to Increase Food Production UK-MX Workshop.
The flood of nextgen sequencing data is changing the landscape of computation biology, pushing the need for more robust infrastructures, tools, and visualization techniques.
Ensuring Chemical Structure, Biological Data and Computational Model Quality
A talk given at SLAS 2016 mon Jan 25th in San Diego
covers published work and recent forays with BIA 10-2474
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...Surya Saha
The Asian citrus psyllid (D. citri Kuwayama or ACP) is host to 7+ bacterial endosymbionts and is the insect vector of Ca. liberibacter asiaticus (Las), causal agent of citrus greening. To gain a better understanding of endosymbiont and pathogen ecology and develop improved detection strategies for Las, DNA from D. citri was sequenced to 108X coverage. Initial analyses have focused on Wolbachia, an alpha-proteobacterial primary endosymbiont typically found in the reproductive tissues of ACP and other arthropods. The metagenomic sequences were mined for wACP reads using BLAST and 4 sequenced Wolbachia genomes as bait. Putative wACP reads were then assembled using Velvet and MIRA3 assemblers over a range of parameter settings. The resulting wACP contigs were annotated using the RAST pipeline and compared to Wolbachia endosymbiont of Culex quinquefasciatus (wPip). MIRA3 was able to reconstruct a majority of the wPip CDS regions and was selected for scaffolding with Minimus2, SSPACE and SOPRA using large insert mate-pair libraries. The wACP scaffolds were compared to wPip using Abacas and Mauve contig mover to orient and order the contigs. The functional annotation of scaffolds was evaluated by comparing it to wPip genome using RAST. The draft assembly was verified using an OrthoMCL based comparison to the 4 sequenced Wolbachia genomes. We expanded the scope of endosymbiont characterization beyond wACP using 16S rDNA and partial 23S rDNA analysis as a guide. Results will be presented regarding endosymbionts, their potential interactions and their impact on the disease of citrus greening.
Similar to What makes the enterobacterial plant pathogen Pectobacterium atrosepticum different to its animal pathogenic relatives? (20)
Presentation delivered 8th August 2016, at the European Association for Potato Research (EAPR) meeting, Dundee - outlining classification of bacterial plant pathogens with
Introductory slides for the Python hands-on session of the Research Data Visualisation Workshop run by the Software Sustainability Institute, University of Manchester 28th July 2016.
Materials for the session are available at https://github.com/widdowquinn/Teaching-Data-Visualisation
Highly Discriminatory Diagnostic Primer Design From Whole Genome DataLeighton Pritchard
Presented at the GMI (Global Microbial Identifier) satellite meeting, sponsored by the UK Department for Environment, Food and Rural Affairs (DEFRA), organised by the Food and Environment Research Agency (FERA), Bedern Hall, York, 10th September 2014.
Presentation summarising the 2013 ICSB conference in Copenhagen, a requirement of James Hutton Institute Visits Abroad funding. Presented at the Cellular and Molecular Sciences seminar series.
Golden Rules of Bioinformatics.
Presented as part of a full-day introductory bioinformatics course - the example data and source for the slides can be found at https://github.com/widdowquinn/Teaching-Intro-to-Bioinf
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataLeighton Pritchard
Presentation on use of Galaxy for plant pathology bioinformatics, presented by Peter Cock, at the Genomics for Non-Model Organisms workshop, ISMB/ECCB, Vienna, Austria, 19 July 2011
Presentation delivered 29th October 2012, at the CoZee workshop in Dundee (see CoZee zoonosis network site for more information: http://www.cozee-zoonosis.net/).
[For clarity: our diagnostics work did not at the time form part of the excellent E.coli O104:H4 genome analysis crowd-sourcing consortium work, which can be found at https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki - we talked about it here because it was good work, and without their efforts we couldn't have done what we did]
Slides for the afternoon session on "Introduction to Bioinformatics", delivered at the James Hutton Institute, 29th, 20th May and 5th June 2014, by Leighton Pritchard and Peter Cock.
Slides cover introductory guidance and links to resources, theory and use of BLAST tools, and a workshop featuring some common tools and tasks.
Presentation given as part of the EMBO Workshop on Plant-Microbe Interactions, at The Sainsbury Laboratory, Norwich, 20th June 2012. This presentation describes bioinformatic and statistical considerations for the prediction of plant pathogen effectors from genome sequences and annotation, with several literature examples.
A Systems Biology Perspective on Plant-Pathogen Interactions 2012-05-08, TurinLeighton Pritchard
My presentation from 8th May 2012, at a workshop on Plant-Microbe Interactions, held at the Turin Botanical Gardens, University of Turin. The talk expands on concepts from this paper: Pritchard L, Birch P (2011) A systems biology perspective on plant-microbe interactions: Biochemical and structural targets of pathogen effectors. Plant Science 180: 584–603. doi:10.1016/j.plantsci.2010.12.008.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
What makes the enterobacterial plant pathogen Pectobacterium atrosepticum different to its animal pathogenic relatives?
1. Comparative genomics:
What makes the enterobacterial plant pathogen
Pectobacterium atrosepticum different to its animal
pathogenic relatives? And other questions.
Leighton Pritchard
Paul Birch
Ian Toth
2. Pectobacterium atrosepticum (Pba, formerly Erwinia carotovora subsp. atroseptica):
•potato pathogen: blackleg (stem rot), rotting of stored tubers
•major rot symptoms due to plant cell wall-degrading enzymes (PCWDEs)
•also has T3SS and effectors, and phytotoxins
•stealth (host manipulation) and brute force
3. Pectobacterium atrosepticum (Pba, formerly Erwinia carotovora subsp. atroseptica):
•plant, rather than animal-associated enterobacterium
•soft-rot enterobacterium (with Dickeya spp., Pectobacterium carotovorum etc).
•temperature/climate-related disease profiles
•Pba-centric genomic and transcriptomic comparisons
4. Genome sequenced 2004:
•SCRI/Sanger
•Lab strain SCRI1043
•5 Mb
•4472 CDS
•51% (G + C)
•17 putative horizontally-
acquired islands
Bell et al. (2004) Proc. Natl. Acad. Sci. USA 101 11105-11110
6. Extended comparison to all
available genomes
Colours indicate similarity:
•red: high similarity
•blue: low similarity
Most similar organisms on
outer rings
7. Extended comparison to all
available genomes
Colours indicate similarity:
•red: high similarity
•blue: low similarity
Most similar organisms on
outer rings
Radial gaps, with highly-similar
sequences in less-similar
organisms indicate potential
HGT or gene loss
8. Selection pressure in all
environments
Loss of functions important
only in a former niche
Gain of function on adaptation
to a novel niche
Acquisition from organisms
inhabiting novel niche
What does Pba have that
plant-associated bacteria have,
but animal-associated
enterobacteria do not?
9. Marked features:
more similar to PAB
than to AAE
[>1.5X mean bit score]
(497, >10% of genome)
Plant-associated bacteria
(PAB)
Animal-associated
enterobacteria (AAE)
Toth et al. (2006) Ann. Rev. Phytopath. 44 305-336
11. AggA contributes
to root adhesion
in Pseudomonas
putida
Pba ECA3266, aggA are similar to PAB, not AAE, and found in HAI
See Poster PS2-176 (Sonia HUMPHRIS)
12. cfa synthesis genes similar to PAB, not AAE, and found in HAI
A series of cfa synthesis
gene knockouts was
constructed
Lesion length much
reduced in the
knockouts compared to
WT
WT cfa-
Bell et al. (2004) Proc. Natl. Acad. Sci. USA 101 11105-11110
See Poster PS13-642 (Michael RAVENSDALE)
13. 11k Agilent arrays Pba,
Dda
Challenge Pba array with
Dda and Pcc gDNA
(Complementary analysis
to RBH of prepublication
Dda genome)
Ravirala et al. (2007) Mol. Plant-Microbe Int. 20 313-320
See Poster PS13-637 (Hui LIU)
14. Pba Pcc Dda Pba Pcc Dda Pba Pcc Dda
Pba Pcc Dda
Whole-genome
Southern hybridisation
Exact match of 20
probes to Dda genome
Over 900 probes
hybridise strongly to
Dda gDNA (3200 total)
15. PAB AAE
No RBH in Dda
(1351)
No hyb to Pcc
(1035)
30 selected islands of
interest that don’t hyb to
Pcc, or make no RBH to Dda
16. cfa synthesis genes
Pcc BAC spanning
SPI-7/cfa genes
sequenced and
annotated (Sanger)
Blue bars indicate
matches (BLAST)
cfa gene probes do not
hyb to Pcc; have no
counterpart in
syntenous SPI-7
region
Pba
Pcc
17. Region of Pba:
•in HAI
•no hyb to Pcc
•RBH matches to Dda
•RBHs to other PAB
nif: nitrogen fixation
•Prediction:
•WT Pba fixes N
•Pba nif knockouts do
not fix N
•WT Dda fixes N
•WT Pcc does not fix N
18. Pba Pcc Dda
•three WT Pba strains fix N
•two WT Dickeya spp. Fix N
•one of six tested Pcc WT
strains fixes N
•Pba nifA-
mutant does not
fix N
•Prediction:
•WT Pba fixes N
•Pba nif knockouts do
not fix N
•WT Dda fixes N
•WT Pcc does not fix N
19. What makes Pba different from animal-associated enterobacteria, and
from other soft-rotting plant pathogens?
HGT activity
Putatively acquired functions:
•coronafacic acid synthesis
•root adhesion
•nitrogen fixation
about 25% of genome also distinguishes Pba from close relatives…
What else to find out:
•What has Pba lost, in respect to animal-associated enterobacteria?
•(and closer relatives)?
•What do they all have in common?
20. SCRI
Paul Birch
Ian Toth
Hui Liu
Sonia Humphris
Lucy Moleleki
Michael Ravensdale
Pete Hedley
Eduard Venter
Gunnhild Takle
Beth Hyman
Jennifer White
Sanger
Pathogen Sequencing Unit
Funding
SEERAD, BBSRC
21. Comparisons against other bacterial genomes:
Reciprocal best hits
(FASTA, 30% ID,
80% overlap)
linear representation
Coloured by taxonomic
grouping
Bell et al. (2004) Proc. Natl. Acad. Sci. USA 101 11105-11110
22. database .gbk .crunch …
Python Script GenomeDiagram
Reportlab
database .gbk .crunch …
Python Script GenomeDiagram
Reportlab
229 bacterial comparisons
185970 RBH
23Gb of data
24h on 50-node cluster
Visualisation issues
Pritchard et al. (2006) Bioinformatics 22 616-617
23. Failure to hybridise does not
imply that the Pba gene is
absent
Dda: 2910 RBH to Pba; 949
strongly hybridising probes
(ca. 3200 total)
Weakly-hybridising probes
are seen with > 90% amino
acid identity
Good afternoon. I’m Leighton Pritchard. I’m a bioinformatician in the plant pathology programme at SCRI in Scotland, and I’m going to talk to you about some of the comparative genomics work we’ve been doing on the plant pathogenic bacterium Pectobacterium atrosepticum.
Pectobacterium atrosepticum (Pba) is an enterobacterial plant pathogen that causes blackleg and soft rot diseases on potato in temperate climates.
The major rotting symptoms of disease caused by Pba are due to plant cell wall degrading enzymes, such as pectinases and pectate lyases. Evertheless, Pba also possesses a type III secretion system and associated effectors (about which much has been said this week) and synthesises phytotoxins. This implies a more stealthy role for Pba in terms of its interaction with the plant host. It doesn’t just exist to blast holes in plant cell walls – there is the potential for subtler interaction
Pba is also, unlike many of its enterobacterial relatives that invade gut epithelial cells, plant- and not animal-associated. It is thus reasonable to anticipate substantial differences between the genomes of the animal- and plant-associated members of the enterobacteriaceae. It is also one of a number of plant pathogenic, soft-rotting enterobacteria, such as Pectobacterium carotovorum and Dickeya species. These bacteria vary in their host ranges and climate-dependent disease profile. Pba has a relatively narrow host range for disease, and causes symptoms only in temperate climates. The genes differentiating host range, geographical distribution and disease development remain largely unknown, however.
In order to identify candidate genes associated with these differences, we have employed Pba-centric genomic and transcriptomic comparative analyses, and I will report some of our results in this presentation.
Our starting point for genomic comparisons is the sequenced Pba genome. Pba is a fairly typical-looking enterobacterial genome in terms of size, gene count and GC content. One unusual feature however is the large number of putative horizontally-acquired islands, marked with red and green blocks on this circular diagram. These were mostly identified on the basis of the presence of insertion sequences, phage, and aberrant GC content. It has been noted (Kunin et al, 2006) that Pba appears to be a ‘hub’ of bacterial horizontal gene transfer.
Genome Overview
IMAGE: Circular diagram of genome
LOGIC:
O Lab-amenable strain 1043 sequenced by Sanger/SCRI collaboration in 2004
O Typical size for free-living enterobacterium, 5Mb with 4472 CDS, and 51% G+C content
O There appears to be an unusually large degree of horizontal gene transfer, with 17 putative horizontally-acquired islands
In order to find out what makes Pba different from other bacteria, we carried out reciprocal best hit analysis using FASTA, taking reciprocal protein matches with 30% sequence identity over at least 80% of the longer sequence. We plotted each of these hits on a circular diagram of the Pba genome.
Each inner ring represents a comparison with a different bacterial genome, and each coloured block indicates a reciprocal best hit. Here, we’ve coloured organisms by taxonomic class, and grouped them together on the diagram.
Two features stand out immediately in these diagrams. Firstly, radial bands of colour where Pba shares a block of sequences with many other genomes. And secondly, radial ‘gaps’ with no colour, where sequences are unique to Pba, or common with only a few other organisms.
Introduction to comparative approach – RBH against other genomes II
IMAGE: Circular diagram of genome comparisons
LOGIC:
o We can wrap the image round to represent a circular bacterial chromosome
o The colours mean the same as in the previous diagram, but it becomes clearer where regions of overall similarity and dissimilarity lie
We have since extended these comparisons to all available genomes, to generate images such as this.
The colour scheme here is that individual reciprocal best hits are marked in a colour indicating the percentage sequence identity to the Pba sequence – red for near 100%, blue for near 30%.
We’ve also ordered the comparison rings so that the most similar genomes to Pba (on average) are outermost.
Even with 400 comparison sequences, we can still see many large radial gaps, indicating sequences that Pba does not share with many other organisms.
We can zoom in on a region to get a closer look…
Description of more recent comparisons against many genomes, and introduction to interpreting the images with colours, etc… particularly gaps
IMAGE: Circular diagram of comparison against 229 bacteria
LOGIC:
o We need to familiarise the audience with how to interpret the image for the next few slides
o This diagram represents the Pba chromosome, and the reciprocal best hits present in 229 other bacterial genomes, each one represented by a concentric circle
o Coloured blocks represent individual RBHs, coloured on a scale from blue to red by % identity (blue = low identity, red = high identity)
o The concentric circles for organisms are ordered from most to least similar to Pba from the outer edge (red ones are enterobacteria)
o Immediately, we can see regions of radial gaps in the genome, associated with the putative HAIs, and that these gaps get filled in the closer you get to the outer edge
o These gaps are informative in terms of Pba’s evolutionary history and its adaptation to a novel functional niche
In this region we can see the blocks, and their individual colours more clearly.
Focusing on this radial gap here, we can see that on either side of the gap the trend is that the genomes that are more similar to Pba also have the most similar sequences.
Within the gap however, we can see sequences that are highly similar to Pba, but that lie within organisms whose genomes are, on the whole, not very similar to Pba.
We reason that these sequences have therefore potentially been acquired by horizontal gene transfer or, as in the case here, are potentially indicating gene loss.
Description of more recent comparisons against many genomes, and introduction to interpreting the images with colours, etc… particularly gaps
IMAGE: Circular diagram of comparison against 229 bacteria - zoomed
LOGIC:
o We need to familiarise the audience with how to interpret the image for the next few slides
o This diagram represents the Pba chromosome, and the reciprocal best hits present in 229 other bacterial genomes, each one represented by a concentric circle
o Coloured blocks represent individual RBHs, coloured on a scale from blue to red by % identity (blue = low identity, red = high identity)
o The concentric circles for organisms are ordered from most to least similar to Pba from the outer edge (red ones are enterobacteria)
o Immediately, we can see regions of radial gaps in the genome, associated with the putative HAIs, and that these gaps get filled in the closer you get to the outer edge
o These gaps are informative in terms of Pba’s evolutionary history and its adaptation to a novel functional niche
When considering the origin of HGT or gene loss, we consider the environments in which the organism must survive.
Here we have a typical agricultural environment involving animals and plants. I’m told that this is a cow, but I’m no biologist.
We can imagine an organism that divides its time between the animal and the surrounding environment.
The organism enters the organism at the sharp end, here… eventually leaves through the blunt end, here… and hangs around near plants for a while, before going back into the animal at the sharp end and repeating the cycle.
There are many possible trajectories for the genome, depending on circumstances. If the animal-plant cycle persists, we expect the organism to carry functions that benefit it in both environments – as we might for E. coli 0157:H7, for example.
If we remove one or other of the environments, let’s say we take the cow out of the picture, then those functions that were previously relevant to survival in the cow may be lost.
Also, we might expect that the organism might somehow acquire functions specifically to survive better in the environment. A quick way to do so would be to acquire them directly from organisms that are already successful in that environment, for example by HGT.
If we want to find out what has made Pba a successful plant pathogen when many of its relatives are successful animal-associated bacteria that occasionally enter the environment, we can ask: what functions does Pba have that plant-associated bacteria also have, but that animal-associated bacteria do not?
(and if we’re interested in how animal-associated bacteria persist in the environment, we can ask what they still share)
Interpretation of gaps in terms of HGT and niche adaptation – gene acquisition and loss
IMAGE: A cow, some grass, and the cycle of the bacterium as it passes from one environment (enteric) to another (the wider environment)
LOGIC:
o Each bacterial population is under selection pressure to thrive in all the environments to which it is exposed
o In order to thrive better in the gut, we expect to see adaptations to the gut
o In order to thrive better in the environment, we expect to see adaptations to the environment
o In the transition from one environment to the other, we expect to see relative loss of functions advantageous in the ‘old’ environment, and relative acquisition of functions advantageous in the ‘new’ environment
o If we assume that Pba is an enteric bacterium that has adapted to the wider environment, then we expect to find that it has acquired novel functionality relative to other, enteric-based bacteria
o We might expect the gaps in the Pba diagram to correspond to some of the features that Pba has acquired to help it live outside the animal gut.
o Moreover, the organisms with similar genes to those in the gaps may be the originators of that functionality for Pba, elucidating Pba’s evolutionary history
Here we have another circular diagram showing reciprocal best hits.
We’ve clustered the comparison genomes into two groups: plant-associated bacteria and animal-associated enterobacteria.
Note that the AAEs are much more similar, on the whole, to Pba than are the PABs
We’ve also marked, in red around the outside, the locations of those Pba genes that are more similar to PAB than to AAE.
These account for over 10% of the Pba genome, indicating that the scale of acquisition of function from plant associated bacteria may have been very great.
Interpretation of gene acquisition and loss as niche adaptation in the context of adaptation of Pba to plant environment.
IMAGE: Circular diagram of Pba against AAE and PAB, with genes more similar to PAB than AAE sequences marked
LOGIC:
o Acquisition of environment-associated function may be by adaptation of existing functionality to the new environment, or by HGT – direct acquisition from other plant-associated bacteria.
o If adaptation is by HGT, then we expect to see genes/functions that are found in plant-associated bacteria, but not in animal-associated enterobacteria.
O Outer set of rings are AAE, inner set of rings are PAB – overall, Pba is most similar to AAE (seen from red colour)
o Nevertheless, we find XX genes that are more similar to PAB than to AAE – indicated by the red markers on the diagram, and they are frequently associated with islands previously suspected to be horizontally-transferred
O This implies a role for HGT and the acquisition of genes from PABs
Returning to the Pba comparisons, we can identify clusters of these genes that appear to be PAB-associated, and note that they are colocated with regions of putative HGT.
We find that, in many cases, these genes encode for functions which we would reasonably expect to be useful in plant interactions:
Root attachment
Coronafacic acid synthesis
T3SS and effectors
And Nitrogen fixation
In silico predictions are all fine, but in this modern age of systems biology, we need to close the loop between prediction and experiment.
Locations of some of the genes associated with niche adaptation on the comparative diagram
IMAGE: Circular diagram of Pba against AAE and PAB, with genes more similar to PAB than AAE sequences marked. Labels for environmental function fade in
LOGIC:
o We have established that genes associated with PABs are located in sites we previously saw were potentially associated with HGT, but this might be coincidence. Are there functionalities that seem biologically to be associated with this new environment?
o We can identify a number of gene clusters, apparently transferred from PAB, with functionalities that make sense as adaptations in biological terms: e.g. attachment, coronafacic acid and T3SS (P. syringae people should be interested), and nitrogen fixation
o But can we trust these annotations? How well does Pba persist in the environment, or manage at causing disease if we knock out some of these genes?
We consider two genes in a region of the Pba sequence that is similar to PAB, but not AAE, and is also part of a putative HAI
One of those genes is a homologue of P. putida AggA, which contributes to root adhesion in that organism
This is a plot of the recovery of Pba – WT, and ECA3266 and aggA knockouts from a range of plant roots. For each knockout, and each plant, we recover significantly less Pba from plant roots than the wild type, supporting the functionality of root adhesion.
Notably, adhesion occurs on more plants than just potato – the only one on which it causes disease – implying a wider reservoir, and maybe even a more benign lifestyle for Pba than was previously suspected.
Here, then we have evidence supporting the acquisition of functionality relevant to a plant-associated lifestyle for Pba.
This work was carried out mostly by Sonia Humphris, and was presented at a poster that has now been taken down, but you should be able to still catch her about if you’re interested.
Adherence to roots I
IMAGE: Plot of colony-forming units per millilitre against plant (potato)
LOGIC:
O The aggA gene is one of those in an HGT region, and found in PAB but not AAE, and contributes to root attachment in Pseudomonas putida. Pba aggA knockouts are expected to show less root adhesion than wild-type.
O aggA- knockouts show no significant reduction in root adhesion on potato
aggA complemented
Another region on the genome with genes similar to PAB but not AAE, and that is found in HAI is a region containing coronafacic acid synthesis genes.
This work was mostly done by Michael Ravensdale, whose poster is still up outside the Auditorium Sirene and, again, if you’d like to hear more about it, just catch him at some point.
Coronafacic acid is a precursor of coronatine, which is a Pseudomonas phytotoxin.
cfa synthesis gene knockouts are attenuated in virulence, as seen by the reduction in lesion length.
So here we also have evidence supporting the horizontal acquisition of virulence function, as a feature of Pba’s plant-associated lifestyle
Coronafacic acid
IMAGE: Plants with WT and cfa- Pba
LOGIC:
o Knocking out cfa genes attenuates Pba virulence on potato
Several cfa knockouts – picture representative; needs ligase
We haven’t just made in silico comparisons.
We have designed an 11k Pba microarray (and a similar Dda array), and have challenged it with both Dda and Pcc genomic DNA, in order to investigate what makes Pba different from other soft-rotting plant pathogenic enterobacteria.
The genome sequence for Dda is also available for download from Nicole Perna and Jeremy Glasner’s group’s ASAP database, in Wisconsin, which gave us the opportunity to calibrate our transcriptomic and genomic comparisons using the complementary analyses for Dda and Pba.
Hui Liu has a poster up describing some work using these arrays that you might be interested in seeing.
Comparison with Pcc and Dda I
IMAGE: Pcc bugs, Dda bugs, microarray slide
LOGIC:
O We can see broad trends in niche adaptation from comparisons against large numbers of comparator genomes, but differences in infection profile also exist between Pba and its close relatives Dda and Pcc
O We have constructed an 11k Pba microarray, and challenged it with Dda and Pcc genomic DNA
O The Dda sequence hasn’t been published, but is available at ASAP, and reciprocal best hit analysis of the Pba genome against this sequence provides a complementary analysis, and a check on the hybridisation results
When we do whole-genome Southern hybridisations, we can see that in general Pba and Pcc are much more similar to each other than either is to Dda.
Indeed, an in silico analysis identifies only 20 probes from the Pba genome that make exact matches to the Dda genome.
However, over 900 Dda probes hybridise strongly to the array, and 3200 hybridise in total.
Some Dda sequences with over 90% aa identity to their Pba counterparts hybridise very weakly
This illustrates a general point that gDNA hybridisations are not very robust. At best they indicate only a set of divergent and potentially absent sequences in the comparator genome, so we proceed with the Dda reciprocal bet hit comparison, rather than the array hybridisations.
Comparison with Pba and Dda II
IMAGE: Pba-Pcc-Dda Southerns
LOGIC:
O Pba is phylogenetically more closely related to Pbc than to Dda, so we would expect that Pbc would hybridise better to the microarray than Dda does, and this is borne out by the Southern blots, and by the hybridisation data.
O Indeed, it proved impossible to carry out our original aim to construct a microarray that featured common probes to Pba and Dda genes, as the genomes are so dissimilar. An in silico analysis shows exact matches of Dda genomic DNA to only 20 probes on the Pba array.
Common bands = rRNA/16S? Not entirely sure.
This is the Eye of Sauron.
The diagram is similar to those you saw before, with PAB and AAE reciprocal best hit comparisons marked.
We’ve also added rings indicating those Pba sequences whose probes don’t hybridise to Pcc gDNA, and that don’t make reciprocal best hits to Dda. This is about 25% of the genome in each case.
We’ve also marked 30 islands that we’ve been interested in that appear not to have counterparts in either Pcc or Dda.
We’re going to focus in on this region first…
Comparison with Pba and Dda IV
IMAGE: Pba-Dda-Pcc circular diagram
LOGIC:
O When we examine the comparisons between Pba, Dda and Pcc, we identify over 30 clusters on the genome of functional significance where genes from Pba appear to have no counterparts in one, other, or either of Dda or Pba – indicated by the outermost red bars.
O The previously-identified HAIs are mostly associated with these Pba-associated regions
O One of these regions is the SPI-7 homologous region, containing the coronafacic acid synthesis genes. We have already shown this to be involved in virulence, and know that the genes have no homologues in Dda. It remains to show that the failure of Pcc to hybridise to the probes for these genes on the Pba array implies that the genes are absent, or highly divergent.
This is the region of Pba (on the top) containing the cfa synthesis genes we saw earlier.
The bottom row is the corresponding region of Pcc.
The blue bars indicate reciprocal best hits, and we can see that the two regions are mostly syntenous.
However, the region containing the cfa synthesis genes is missing entirely in Pcc, though there has been some interesting expansion with other genes in that region.
In this case at least, then, the absence of Pcc hybridising gDNA does appear to indicate that the region is absent in that organism.
SPI-7 comparison between Pba and Pcc
IMAGE: Pba vs Pcc SPI-7 regions, in ACT
LOGIC:
O We have an annotated Pcc BAC sequence spanning the SPI-7 region from a collaboration with the Sanger Institute
O Pcc regions = resistance to ox. Stress? Being investigated
Now we consider a second region, making no hybridisation to Pcc, but having RBH matches to Dda, and to other PAB.
This is found in an HAI.
It encodes genes that are homologous to those for nitrogen fixation in other organisms.
This analysis in particular enabled us to make some predictions:
WT Pba fixes N
WT Dda fixes N
WT Pcc doesn’t fix N
If we knock out the nif genes in that region, the mutant Pba will not fix N
Comparison with Pba and Dda V
IMAGE: Pba-Dda-Pcc nif region
LOGIC:
O This diagram indicates a number of features:
That there is a region of putative horizontal gene transfer from PAB to Pba
This region does not have counterpart genes in AAE
That part of this region is Pba-specific, in that genes are shared with Dda, but not Pcc
O The genes in yellow are the key genes – they are nif genes, contributing to nitrogen fixation, including nifA. They are present in Pba and Dda, but not Pcc, which allows us to make some predictions:
WT Pba and Dda both fix nitrogen
WT Pcc does not fix nitrogen
Pba nifA knockouts should not fix nitrogen
And these are the results.
On the right is an Azobacter control that really does fix N
On the left we see three WT strains of Pba that fix N, and the nifA mutant that does not
On the right we see a Dda strain, and another Dickeya strain, that both fix N
In the middle we see five Pcc strains, including the one we used for our comparisons, Pcc193, that do not fix N, and one that does.
This confirms our predictions made by comparative genomics and transcriptomics.
Comparison with Pba and Dda VI
IMAGE: Nitrogen fixation graph
LOGIC:
O This graph shows the nitrogen fixing ability of a number of Pba, Pcc and Dda strains, with an Azobacter control
- Three WT Pbas fix nitrogen
- Two WT Ddas fix nitrogen
- Only one of six WT Pccs fixes nitrogen
O This validates our predictions from comparative genomics
Looking to see if NF occurs on roots/in environment as well as in vitro
Did we complement nifs? No.
In summary then, we asked what differentiated Pba from its close and distant relatives, and have found that the answers are, at least in part:
HGT activity
The apparent acquisition of verified plant-associated lifestyle functions: coronafacic acid synthesis, root adhesion, and N fixation.
The differences between Pba and closely-related genomes comprise about 25% of the Pba sequence, so there is still plenty left to investigate.
And we now have some new questions to ask:
Doing the reciprocal comparison – what has Pba lost in respect to AAE?
What does it not have that its closer relatives have?
And, particularly interesting for food safety, what does Pba have in common with animal/human pathogens that might permit them to persist in the wider environment?
Conclusions
IMAGE: None
LOGIC:
O In silico comparative genomics against large numbers of bacteria reveals information about overall niche adaptation
O Comparative genomics, both in silico and experimental, against close relatives reveals information about species/strain-specific differences
O Pba, in adapting to an environmental niche, has acquired nitrogen fixation and root adhesion capability, and a number of virulence-related functionalities associated with host defence manipulation
O Some of these functionalities are absent in other soft-rotting plant pathogens
O This graph shows the nitrogen fixing ability of a number of Pba, Pcc and Dda strains, with an Azobacter control
- Three WT Pbas fix nitrogen
- Two WT Ddas fix nitrogen
- Only one of six WT Pccs fixes nitrogen
O This validates our predictions from comparative genomics
Acknowledgements
Introduction to comparative approach – RBH against other genomes I
IMAGE: Linear diagram of genome comparisons
LOGIC:
o Particularly given the number of putative HGT events, we thought it interesting to identify, where possible, the most closely-related genes to Pba genes in other organisms
o We identified reciprocal best hits using FASTA, and plotted them on a diagram:
blocks indicate where Pba genes are present in other organisms
each row represents a different organism.
blocks are coloured by the phylogenetic class of the organism in which the match is found
Problems of visualisation
IMAGE: GenomeDiagram flowchart
LOGIC:
o The data generated for each comparison of Pba against multiple organisms is considerable:
229 bacterial comparisons
185970 RBH
23Gb of data
24h on a 50-node cluster
o Visualisation of this much data was a problem, so we developed the GenomeDiagram software to integrate with BioPython and produce publication-quality, poster-sized images (some of those at ICPPB or IEW last year will have seen some of the output)
Comparison with Pba and Dda III
IMAGE: Pba-Dda hyb/RBH plots
LOGIC:
o The hybridisation data on the last slide only identifies gDNA that is not highly-divergent from Pba, so a failure of Dda gDNA to hybridise does not imply that the Pba gene is absent in Dda, for example.
o Dda is still functionally-similar to Pba, and makes 2910 reciprocal best hits.
o We don’t have the Pcc genome sequence, so can’t say how many false negatives come from the microarray data
o As we already have hybridisation data, we can explore the relationship between hybridisation intensity and RBH identity
o As you might expect, the relationship is not entirely clear, but there is a general tendency for stronger hybridisations to correspond to better RBHs, though this tendency is not absolute.
o Nevertheless, very few probes with raw hyb scores of over 2000, or normalised scores of greater than 1 fail to make RBHs. This allows us to estimate the number of Pcc RBHs to be…. XX
We can ask if this kind of pattern is found in other bacterial comparisons, and we find that sometimes it is, and sometimes it isn’t.
The human uropathogenic E. coli CFT073 has many radial gaps, perhaps not so many as Pba, but it certainly looks similar. However, the bovine pathogen Pasteurella multocida does not.
These patterns of gaps in the comparison data are not present in all comparisons, but are present in some other comparisons
IMAGES: The whole-genome comparisons of uropathogenic E.coli and P. multocida.
LOGIC:
o The uropathogenic E. coli image shows that the pattern of gaps, potentially indicating large-scale HGT, is not unique to Pba.
o The P. multocida image, which has fewer gaps, indicates that putative HGT, visible in this way, is not the norm for all genome comparisons
o This implies that we can pick up differences between the evolutionary histories of organisms in this way
Again, we can ask whether this is usually the case.
This is a similar comparison of Salmonella Typhi CT18 – a chromosome and two plasmids - against PAB and AAE.
Again we’ve marked the locations of genes that are more similar to PABs than to AAEs, and the number isn’t nearly as great as for Pba.
However, there are hotspots in one of the plasmids. This is a story with implications for the acquisition of niche-adaptive function, but I don’t have time to go into detail about it, so we’ll just note it and move on.
These kinds of gaps and HGT also vary between plasmid and chromosome
IMAGE: Salmonella image from the Annual Reviews paper
LOGIC:
o HGT is not confined to the genome, and may have hotspots on plasmids
o note that Pba SCRI1043 doesn’t carry a plasmid