Surya Saha presented on the history and current state of DNA sequencing technologies. The presentation covered first generation Sanger sequencing and more recent next generation sequencing technologies such as Illumina, Ion Torrent, PacBio, and Oxford Nanopore. Key points included the increasing throughput and decreasing costs of sequencing over time, factors to consider when choosing a sequencing technology, and potential future applications of sequencing in medicine and environmental studies. The presentation concluded by discussing opportunities for students in computational biology.
This was presented on Mar 11, 2014 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/
Sequencing, Genome Assembly and the SGN PlatformSurya Saha
This talk was presented at IASRI Pusa on June 13th, 2014.
Centre for Agricultural Bioinformatics
Indian Agricultural Statistics Research Institute
Library Avenue, Pusa, New Delhi - 110012 (INDIA)
http://cabgrid.res.in/cabin/
The document provides an overview of the history and development of DNA sequencing technologies. It discusses early methods like Sanger sequencing and Maxam-Gilbert sequencing. It then summarizes major next-generation sequencing platforms like Illumina, Pacific Biosciences, and Oxford Nanopore. The document also covers sequencing trends, costs, and considerations for choosing a sequencing platform.
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Lex Nederbragt
Un update of the previous talk with the same title. A talk I gave at the Computational Life Science initiative (University of Oslo) about new High Throughput Sequencing instruments at the Norwegian Sequencing Centre. I also mentioned future upgrades, and the upcoming nanopore sequencing platform of Oxford nanopore.
The document discusses synthetic biology and modular engineering approaches. It describes how standardized biological parts and interchangeable genetic components allow for hierarchical assembly of genetically engineered systems in a way analogous to electrical engineering. This modular approach is facilitated by standards for biological parts, assembly methods, and abstraction levels that allow for complex designs. The document also summarizes the work of the Debrecen-Hungary iGEM team, which developed a toolkit of lipid sensors and expression vectors to expand synthetic biology tools in eukaryotic systems.
The document summarizes a study that used Illumina Hi-seq sequencing to analyze taxon diversity in bulk insect samples. The researchers tested two approaches: 1) PCR-based amplification of the COI barcode region followed by Illumina sequencing, and 2) direct shotgun sequencing of total mitochondrial DNA without PCR. Both approaches showed potential for high-throughput environmental barcoding, though methodological improvements are still needed to address issues like taxonomic and biomass biases. The study demonstrates that Illumina sequencing can perform comparably to other platforms for analyzing mixed insect samples and may help solve amplification biases through a PCR-free method.
Surya Saha presented on the history and current state of DNA sequencing technologies. The presentation covered first generation Sanger sequencing and more recent next generation sequencing technologies such as Illumina, Ion Torrent, PacBio, and Oxford Nanopore. Key points included the increasing throughput and decreasing costs of sequencing over time, factors to consider when choosing a sequencing technology, and potential future applications of sequencing in medicine and environmental studies. The presentation concluded by discussing opportunities for students in computational biology.
This was presented on Mar 11, 2014 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/
Sequencing, Genome Assembly and the SGN PlatformSurya Saha
This talk was presented at IASRI Pusa on June 13th, 2014.
Centre for Agricultural Bioinformatics
Indian Agricultural Statistics Research Institute
Library Avenue, Pusa, New Delhi - 110012 (INDIA)
http://cabgrid.res.in/cabin/
The document provides an overview of the history and development of DNA sequencing technologies. It discusses early methods like Sanger sequencing and Maxam-Gilbert sequencing. It then summarizes major next-generation sequencing platforms like Illumina, Pacific Biosciences, and Oxford Nanopore. The document also covers sequencing trends, costs, and considerations for choosing a sequencing platform.
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Lex Nederbragt
Un update of the previous talk with the same title. A talk I gave at the Computational Life Science initiative (University of Oslo) about new High Throughput Sequencing instruments at the Norwegian Sequencing Centre. I also mentioned future upgrades, and the upcoming nanopore sequencing platform of Oxford nanopore.
The document discusses synthetic biology and modular engineering approaches. It describes how standardized biological parts and interchangeable genetic components allow for hierarchical assembly of genetically engineered systems in a way analogous to electrical engineering. This modular approach is facilitated by standards for biological parts, assembly methods, and abstraction levels that allow for complex designs. The document also summarizes the work of the Debrecen-Hungary iGEM team, which developed a toolkit of lipid sensors and expression vectors to expand synthetic biology tools in eukaryotic systems.
The document summarizes a study that used Illumina Hi-seq sequencing to analyze taxon diversity in bulk insect samples. The researchers tested two approaches: 1) PCR-based amplification of the COI barcode region followed by Illumina sequencing, and 2) direct shotgun sequencing of total mitochondrial DNA without PCR. Both approaches showed potential for high-throughput environmental barcoding, though methodological improvements are still needed to address issues like taxonomic and biomass biases. The study demonstrates that Illumina sequencing can perform comparably to other platforms for analyzing mixed insect samples and may help solve amplification biases through a PCR-free method.
This was presented on Mar 31, 2015 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...Ramy K. Aziz
Tools and Methods developed under the PhAnToMe (http://www.phantome.org) project between 2009-2012 using the Subsystems Technology, the SEED (http://theseed.org) environment, and RAST server (http://rast.nmpdr.org)
Third presentation at the Phage Genomics Workshop at the 20th Biennial Evergreen International Phage Meeting
The Genome in a Bottle Consortium is developing reference materials, reference methods, and reference data to assess confidence in human whole genome variant calls. The Consortium is characterizing several human genomes including the NA12878 genome, an Ashkenazi Jewish trio, and a Chinese trio from the Personal Genome Project. Data generated for these genomes includes various sequencing technologies from Illumina, Complete Genomics, PacBio, BioNano, and others. The Consortium is developing high-confidence variant calls for SNPs, indels, structural variants, and phasing. Individual datasets and integrated variant calls will be made publicly available on the GIAB FTP site.
BAR.utoronto.ca is a bio-analytic resource that allows exploration of large biological datasets for hypothesis generation. It contains over 128,000 SNPs, 150 million gene expression measurements, subcellular localizations for 9,300 proteins and predicted localizations for the Arabidopsis proteome, over 70,000 predicted and 36,000 documented protein-protein interactions, and over 67,000 predicted and 700 experimentally determined protein structures. The site provides easy-to-use tools for exploring these datasets to facilitate research.
Rapid automatic microbial genome annotation using Prokka
Dr Torsten Seemann presents on Prokka, a tool he developed for rapid automatic annotation of microbial genomes. Prokka uses existing gene prediction tools like Prodigal and Infernal along with database searches to identify features like protein coding genes, tRNAs, and rRNAs. Prokka aims to annotate genomes quickly in under 15 minutes while providing standardized GFF3 and Genbank output files along with provenance on the sources of annotations. Prokka has been used to annotate over 50,000 draft genomes and is an ongoing project aimed at improving accuracy, modularity, and performance.
This document outlines exercises for quality control of NGS data from an Illumina sequencing experiment on tomato ripening stages. The exercises include: 1) evaluating raw fastq files for format and number of sequences; 2) using FastQC to analyze read quality scores, lengths, duplication levels, and k-mer content; and 3) preprocessing the reads using fastq-mcf to trim low quality ends and remove short reads before reanalyzing with FastQC. The goal is to learn how to evaluate NGS read quality and preprocess data prior to downstream analysis.
The document discusses genomic research and sequencing technologies. It provides a history of genomic research from early sequencing methods like Sanger sequencing to modern massively parallel sequencing. It describes several next-generation sequencing platforms, their read lengths, accuracy, applications, and differences. It emphasizes that data analysis is a major challenge and recommends consulting sequencing facilities and having bioinformaticians available for analysis.
Next generation-sequencing.ppt-convertedShweta Tiwari
The advance version, sequences the whole genome efficiently with high speed and high throughput sequencing at reduce cost is termed as Next Generation Sequencing (NGS) or massively parallel sequencing (MPS).
Jan2015 GIAB intro, Update, and Data Analysis PlanningGenomeInABottle
The Genome in a Bottle Consortium is developing reference materials, methods, and data to evaluate accuracy of human genome sequencing. The goal is to enable validation and regulation of clinical genome sequencing. For their pilot genome, NA12878, they have generated high-confidence variant calls that are being used for benchmarking. Moving forward they plan to analyze and develop reference materials from trios in the Personal Genome Project, starting with an Ashkenazi Jewish trio.
The document discusses the Genome in a Bottle Consortium's efforts to establish benchmark variant calls for several human genomes to help evaluate the accuracy of sequencing technologies and bioinformatics pipelines. The Consortium has generated extensive sequencing and reference data for several samples, including NA12878 and trios from the Personal Genome Project. Multiple groups are analyzing this data to generate integrated calls for SNPs, indels, structural variants, and long-range phasing. The goal is to provide a high-accuracy set of variant calls across variant types to help validation of sequencing analyses.
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...Ramy K. Aziz
Tools and Methods developed under the PhAnToMe (http://www.phantome.org) project between 2009-2012—and updated in 2015. The tools and database rely on the Subsystems Technology, the SEED (http://theseed.org) environment, and RAST server (http://rast.nmpdr.org)
This is the fourth presentation at the Phage Genomics Workshop at the 21st Biennial Evergreen International Phage Meeting, Aug 2 2015.
The document discusses genomic research and sequencing technologies. It provides a history of genomic research from early sequencing methods like Sanger sequencing to modern massively parallel sequencing. It describes several next-generation sequencing platforms, including their read lengths, accuracy, applications, and differences. It emphasizes that data analysis is a major challenge and advises consulting sequencing facilities and having dedicated bioinformaticians for projects.
Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Ni...Kozo Nishida
This document presents an integrated omics analysis pipeline for model organisms using Cytoscape. The pipeline aims to be reproducible and modifiable in a single IPython notebook environment. It controls Cytoscape using cyREST for network analysis and imports pathway data from KEGG using the KEGGscape app. Examples demonstrate mapping differentially expressed genes and drug targets in E. coli to KEGG pathways. The pipeline can integrate other pathway databases and its utility functions will be packaged for wider use.
This document summarizes work analyzing meta-transcriptomics data from rice stem borer gut bacteria to study gene expression levels and annotate carbohydrate-active enzymes (CAZymes). It discusses quality checking the next-generation sequencing data, predicting proteins, and annotating genes like glycoside hydrolase families GH48, GH9, and GH6. Expression levels of these families are analyzed across time points and diversity and domain arrangements are examined. Future plans include annotating additional CAZy families and correlating transcriptome and proteome data.
This document discusses nanopore sequencing technology. It provides an overview of nanopore sequencing, including what nanopore sequencing is, the types of nanopores used (biological and solid state), advantages such as not requiring amplification or labeling, and challenges with processing large amounts of raw data. The document then examines raw nanopore data and the initial steps needed to process the data, including creating a training data set to predict genomic bases and releasing analysis packages to the community.
Bio2RDF : A biological knowledge base for the Semantic WebMichel Dumontier
A presentation given at the University of Toronto on June 18, 2009 describing the current state of Bio2RDF with respect to biological knowledge representation on the semantic web as linked data with services to describe and answer questions.
An open access resource portal for arthropod vectors and agricultural pathosy...Surya Saha
AgriVectors.org is a systems biology resource for vector biologists that aims to provide omics resources and databases to identify targets for interdiction molecules. It utilizes a distributed data schema to rapidly release genome assemblies and transcriptomes. Undergraduate students manually curate genes and pathways of interest from NCBI gene models. The site also provides web-based tools to visualize and analyze high-dimensional experimental data like proteomics and gene expression networks. The goal is to build an ecosystem of integrated resources and tools to study vector-pathogen-host systems important for agriculture.
Functional annotation of invertebrate genomesSurya Saha
Functional annotation of the Asian citrus psyllid genome identified genes, assigned gene ontology terms, and mapped genes to pathways. Gene ontology and pathway analysis of differentially expressed genes between infected and uninfected psyllids identified enriched terms involved in the cytoskeleton, endocytosis, and mitochondrial dysfunction. Improved functional annotation using GOanna added depth to the gene ontology annotation and identified additional enriched pathways related to response to hypoxia and regulation of cytoskeletal remodeling.
This was presented on Mar 31, 2015 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...Ramy K. Aziz
Tools and Methods developed under the PhAnToMe (http://www.phantome.org) project between 2009-2012 using the Subsystems Technology, the SEED (http://theseed.org) environment, and RAST server (http://rast.nmpdr.org)
Third presentation at the Phage Genomics Workshop at the 20th Biennial Evergreen International Phage Meeting
The Genome in a Bottle Consortium is developing reference materials, reference methods, and reference data to assess confidence in human whole genome variant calls. The Consortium is characterizing several human genomes including the NA12878 genome, an Ashkenazi Jewish trio, and a Chinese trio from the Personal Genome Project. Data generated for these genomes includes various sequencing technologies from Illumina, Complete Genomics, PacBio, BioNano, and others. The Consortium is developing high-confidence variant calls for SNPs, indels, structural variants, and phasing. Individual datasets and integrated variant calls will be made publicly available on the GIAB FTP site.
BAR.utoronto.ca is a bio-analytic resource that allows exploration of large biological datasets for hypothesis generation. It contains over 128,000 SNPs, 150 million gene expression measurements, subcellular localizations for 9,300 proteins and predicted localizations for the Arabidopsis proteome, over 70,000 predicted and 36,000 documented protein-protein interactions, and over 67,000 predicted and 700 experimentally determined protein structures. The site provides easy-to-use tools for exploring these datasets to facilitate research.
Rapid automatic microbial genome annotation using Prokka
Dr Torsten Seemann presents on Prokka, a tool he developed for rapid automatic annotation of microbial genomes. Prokka uses existing gene prediction tools like Prodigal and Infernal along with database searches to identify features like protein coding genes, tRNAs, and rRNAs. Prokka aims to annotate genomes quickly in under 15 minutes while providing standardized GFF3 and Genbank output files along with provenance on the sources of annotations. Prokka has been used to annotate over 50,000 draft genomes and is an ongoing project aimed at improving accuracy, modularity, and performance.
This document outlines exercises for quality control of NGS data from an Illumina sequencing experiment on tomato ripening stages. The exercises include: 1) evaluating raw fastq files for format and number of sequences; 2) using FastQC to analyze read quality scores, lengths, duplication levels, and k-mer content; and 3) preprocessing the reads using fastq-mcf to trim low quality ends and remove short reads before reanalyzing with FastQC. The goal is to learn how to evaluate NGS read quality and preprocess data prior to downstream analysis.
The document discusses genomic research and sequencing technologies. It provides a history of genomic research from early sequencing methods like Sanger sequencing to modern massively parallel sequencing. It describes several next-generation sequencing platforms, their read lengths, accuracy, applications, and differences. It emphasizes that data analysis is a major challenge and recommends consulting sequencing facilities and having bioinformaticians available for analysis.
Next generation-sequencing.ppt-convertedShweta Tiwari
The advance version, sequences the whole genome efficiently with high speed and high throughput sequencing at reduce cost is termed as Next Generation Sequencing (NGS) or massively parallel sequencing (MPS).
Jan2015 GIAB intro, Update, and Data Analysis PlanningGenomeInABottle
The Genome in a Bottle Consortium is developing reference materials, methods, and data to evaluate accuracy of human genome sequencing. The goal is to enable validation and regulation of clinical genome sequencing. For their pilot genome, NA12878, they have generated high-confidence variant calls that are being used for benchmarking. Moving forward they plan to analyze and develop reference materials from trios in the Personal Genome Project, starting with an Ashkenazi Jewish trio.
The document discusses the Genome in a Bottle Consortium's efforts to establish benchmark variant calls for several human genomes to help evaluate the accuracy of sequencing technologies and bioinformatics pipelines. The Consortium has generated extensive sequencing and reference data for several samples, including NA12878 and trios from the Personal Genome Project. Multiple groups are analyzing this data to generate integrated calls for SNPs, indels, structural variants, and long-range phasing. The goal is to provide a high-accuracy set of variant calls across variant types to help validation of sequencing analyses.
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...Ramy K. Aziz
Tools and Methods developed under the PhAnToMe (http://www.phantome.org) project between 2009-2012—and updated in 2015. The tools and database rely on the Subsystems Technology, the SEED (http://theseed.org) environment, and RAST server (http://rast.nmpdr.org)
This is the fourth presentation at the Phage Genomics Workshop at the 21st Biennial Evergreen International Phage Meeting, Aug 2 2015.
The document discusses genomic research and sequencing technologies. It provides a history of genomic research from early sequencing methods like Sanger sequencing to modern massively parallel sequencing. It describes several next-generation sequencing platforms, including their read lengths, accuracy, applications, and differences. It emphasizes that data analysis is a major challenge and advises consulting sequencing facilities and having dedicated bioinformaticians for projects.
Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Ni...Kozo Nishida
This document presents an integrated omics analysis pipeline for model organisms using Cytoscape. The pipeline aims to be reproducible and modifiable in a single IPython notebook environment. It controls Cytoscape using cyREST for network analysis and imports pathway data from KEGG using the KEGGscape app. Examples demonstrate mapping differentially expressed genes and drug targets in E. coli to KEGG pathways. The pipeline can integrate other pathway databases and its utility functions will be packaged for wider use.
This document summarizes work analyzing meta-transcriptomics data from rice stem borer gut bacteria to study gene expression levels and annotate carbohydrate-active enzymes (CAZymes). It discusses quality checking the next-generation sequencing data, predicting proteins, and annotating genes like glycoside hydrolase families GH48, GH9, and GH6. Expression levels of these families are analyzed across time points and diversity and domain arrangements are examined. Future plans include annotating additional CAZy families and correlating transcriptome and proteome data.
This document discusses nanopore sequencing technology. It provides an overview of nanopore sequencing, including what nanopore sequencing is, the types of nanopores used (biological and solid state), advantages such as not requiring amplification or labeling, and challenges with processing large amounts of raw data. The document then examines raw nanopore data and the initial steps needed to process the data, including creating a training data set to predict genomic bases and releasing analysis packages to the community.
Bio2RDF : A biological knowledge base for the Semantic WebMichel Dumontier
A presentation given at the University of Toronto on June 18, 2009 describing the current state of Bio2RDF with respect to biological knowledge representation on the semantic web as linked data with services to describe and answer questions.
An open access resource portal for arthropod vectors and agricultural pathosy...Surya Saha
AgriVectors.org is a systems biology resource for vector biologists that aims to provide omics resources and databases to identify targets for interdiction molecules. It utilizes a distributed data schema to rapidly release genome assemblies and transcriptomes. Undergraduate students manually curate genes and pathways of interest from NCBI gene models. The site also provides web-based tools to visualize and analyze high-dimensional experimental data like proteomics and gene expression networks. The goal is to build an ecosystem of integrated resources and tools to study vector-pathogen-host systems important for agriculture.
Functional annotation of invertebrate genomesSurya Saha
Functional annotation of the Asian citrus psyllid genome identified genes, assigned gene ontology terms, and mapped genes to pathways. Gene ontology and pathway analysis of differentially expressed genes between infected and uninfected psyllids identified enriched terms involved in the cytoskeleton, endocytosis, and mitochondrial dysfunction. Improved functional annotation using GOanna added depth to the gene ontology annotation and identified additional enriched pathways related to response to hypoxia and regulation of cytoskeletal remodeling.
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
Rapidly spreading invasive diseases in systems with little or no prior experimental data or resources pose a unique set of challenges for growers, scientists as well as regulators. As a part of a USDA NIFA CAPS project focused on the psyllid, Diaphorina citri, we have released improved genomics resources including high quality genome assemblies and annotation. We have also created an open access web portal for analyses around the Citrus Greening/Huanglongbing disease complex. Citrusgreening.org includes pathosystem-wide resources and bioinformatics tools for multiple Citrus spp. hosts, the Asian citrus psyllid vector (ACP, Diaphorina citri), and multiple pathogens including Candidatus Liberibacter asiaticus (CLas). To the best of our knowledge, this is the first example of a database to use the pathosystem as a holistic framework to understand an insect transmitted plant disease. Users can submit relevant data sets to enable sharing and allow the community to leverage their data within an integrated system. The system includes the metabolic pathway databases CitrusCyc and DiaphorinaCyc with organism specific pathways that can be used to mine metabolomics, transcriptomics and proteomics results to identify pathways and regulatory mechanisms involved in disease response. The Psyllid Expression Network (PEN) contains expression profiles of ACP genes from multiple life stages, tissues, conditions and hosts. The Citrus Expression Network (CEN) contains public expression data from multiple tissues and conditions for various citrus hosts. All tools connect to a central database. The portal also includes electrical penetration graph (EPG) recordings, information about citrus rootstock trials and metabolomics data in addition to traditional omics data types with a goal of combining and mining all information related to the Huanglongbing pathosystem. User-friendly manual curation tools will allow the continuous improvement of knowledge base as more experimental research is published. The portal can be accessed at https://citrusgreening.org/.
Updates on Citrusgreening.org database from USDA NIFA project meetingSurya Saha
The document discusses the citrusgreening.org portal and its resources for researching citrus greening disease. It provides pathway databases for the Asian citrus psyllid vector and citrus pathogens, as well as expression networks showing gene expression data. It outlines current and future work including a psyllid annotation update, new citrus and psyllid RNA-seq data, and potential methods for studying the insect-pathogen interaction like genomics, transcriptomics, and epigenomics. The document envisions an AgriVectors knowledge base to integrate pathosystem data from multiple sources.
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingSurya Saha
ACP version 3 genome, official gene set version 3 and Isoseq transcriptome
Prashant Hosmani, Mirella Flores-Gonzalez, Lukas Mueller, Surya Saha
5th Annual Meeting
Indian River State College
Fort Pierce, FL
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesSurya Saha
Arthropod vectors of pathogens cause enormous economic losses and are a fundamental challenge for sustainable increases in food production, yet agricultural pathosystems remain an underserved area of research. To more effectively fight plant diseases, data pertaining to a disease system needs to be consolidated, made searchable and amenable to data mining. The AgriVectors platform is an open access and comprehensive resource for growers, researchers and industry working on plant pathogens and pathosystems spread by arthropod vectors. The portal connects established public repositories with pathosystem-specific data repositories. The AgriVectors system will provide tools to enable technologies such as RNAi, CRISPR, screening bioassays, etc. to leverage current and emerging knowledge across disciplines. It will also include private and unpublished data, using passwords and secure protocols for restricted access. The portal will be based on the Citrusgreening.org (https://citrusgreening.org/) community resource that was developed as a model for systems biology of tritrophic disease complexes. Citrusgreening.org provides omics and biology resources for the Huanglongbing pathosystem. In addition, it includes a biochemical pathway database for each organism in this disease complex, and an expression atlas with proteomics and RNAseq data from psyllids (http://pen.citrusgreening.org) and citrus (http://cen.citrusgreening.org) across multiple infection states. The AgriVectors portal will extend this model beyond gene-centric omics data to the broader Pathosystem-wide information, with integrated pest management, behavioral, plant health, soil health and climate data to incorporate rapid phenotyping information from research trials, building a foundation for more effectively identifying solutions to combat plant diseases.
Visualization of insect vector-plant pathogen interactions in the citrus gree...Surya Saha
This document summarizes Surya Saha's presentation on using omics approaches to study the interactions between the Asian citrus psyllid vector, Candidatus Liberibacter asiaticus pathogen, and citrus plants in the citrus greening pathosystem. Key points include the generation of a new reference genome for the Asian citrus psyllid, assembly of genomes for its endosymbionts, development of an online annotation platform for manual gene curation, generation of an isoform-level psyllid transcriptome, analysis of gene expression networks in the psyllid in response to different conditions, and discovery of differences in how psyllid life stages respond transcriptionally to the citrus
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Surya Saha
The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the causal agent for the citrus greening or Huanglongbing disease which threatens citrus industry worldwide. This vector is the primary target of approaches to stop the transmission of the pathogen. Accurate structural and functional annotation of the psyllid’s gene models and understanding its interactions with the pathogenic bacterium, CLas, is required for precise targeting using molecular methods such as RNAi. We opted for manual curation of gene families in the draft genome of D. citri (Diaci v1.1, contig N50 34.4Kb) that have key functional roles in D. citri biology and pathology. The community effort resulted in Official Gene Set v1.0 with more than 500 manually curated gene models across developmental, RNAi regulatory, and immune-related pathways.
Single copy marker analysis of the current genome shows a significant proportion of 3,350 markers conserved in Hemipterans to be missing (25%) with only 74% present in full-length copies. The manual genome annotation also identified a number of misassemblies and missing genes in the current genome. This is, in-part, due to the complexity introduced when assembling a heterogeneous sample containing DNA from multiple psyllids and exacerbated by the use of short reads. This challenge is common with insect genomes due to the size of individuals. To improve quality of genome assembly, we generated 36.2Gb of Pacbio long reads with a coverage of 80X for the 450Mb psyllid genome. The Canu assembler followed by Dovetail Chicago-based scaffolding was used to create an improved assembly (Diaci v2.0) with a contig N50 of 758.7kb and 1906 contigs. The assembly was polished with Pacbio and Illumina paired-end reads to remove indel and SNP errors. We are employing Dovetail Chicago and 10X Illumina libraries generated from a single psyllid in conjunction with Bionano optical maps to achieve long-range scaffolding of the genome. We have also generated full-length cDNA transcripts from diseased and healthy tissue from multiple life stages with the Pacbio IsoSeq technology. This will be the first time all these methods have been applied to resolve a complex insect genome from a highly heterogeneous sample. The new assembly will be available on https://citrusgreening.org/ which is our portal for all omics resources for the citrusgreening disease. We are continuing with the manual curation effort using the improved genome. We will also present how the improved genome and annotation is contributing to the development of molecular interdiction methods to disrupt the vectoring ability of D. citri.
The document discusses quality control of sequencing data. It covers exploration of data files using command line tools, evaluation of read quality metrics like quality scores and length distributions using FastQC, and preprocessing reads by trimming low quality ends and removing short reads using fastq-mcf. Exercises guide exploring a protein fasta file, evaluating quality of Illumina datasets for tomato, and preprocessing the reads.
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...Surya Saha
CitrusCyc is a metabolic pathway database for the Citrus clementina and Citrus sinensis genomes. It was constructed using the Pathway Tools software and contains pathways, reactions, enzymes and genes derived from the annotated citrus genomes and the MetaCyc database. The database contains over 25,000 proteins and 40,000 transcripts with EC numbers for both citrus species. It provides visualizations of metabolic pathways and allows for overlay of RNA-seq expression data. Future work includes manual curation of pathways and development of a Meta-CitrusCyc database.
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Surya Saha
The document discusses efforts to improve the genome assembly of the Asian citrus psyllid (Diaphorina citri), the insect vector of citrus greening disease. It describes using long read sequencing data from PacBio to generate a new assembly with an N50 of 83kb, a significant improvement over the previous N50 of 34kb. It further discusses additional efforts using technologies like Dovetail scaffolding, 10X Genomics, and optical mapping to further improve scaffolding and resolve haplotypes, with the goal of generating a high-quality reference genome for D. citri.
The document summarizes updates to the tomato genome sequence, including:
1) The tomato genome build SL3.0 integrated over 1000 BAC sequences into the previous build SL2.50, improving contiguity and reducing gaps.
2) The BAC sequences were assembled, aligned to SL2.50, and automatically integrated using a published workflow. Integrated BACs then underwent manual and NCBI validation.
3) Compared to SL2.50, the new build SL3.0 has fewer and smaller sequence gaps, representing an improved tomato genome assembly. Future plans include integrating additional sequences and producing new gene annotations.
This was presented on Mar 31, 2015 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/
The tomato reference genome is one of the most widely used genomic resources in the Solanaceae as well as the wider plant research community. We frequently receive questions from the community regarding the assembly versions. This session will explain the changes in the current version of the tomato genome (SL2.50). The current tomato genome build contains numerous inter-contig gaps (median 931bp, mean 1869bp) and inter-scaffold gaps (median 210Kbp, mean 525Kbp). Updates will be provided regarding the forthcoming tomato genome build (SL3.0) that will include finished BACs (HTGS phase 3) for closing the gaps.
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...Surya Saha
The availability of high-throughput next generation sequencing technologies presents an opportunity for in-silico discovery of endosymbionts. We describe a method for mining a whole genome shotgun metagenome to identify members of the endosymbiont community followed by reconstruction and validation of a high-quality draft microbial genome.
The Asian citrus psyllid (D. citri Kuwayama or ACP) is host to 7+ bacterial endosymbionts and is the insect vector of Ca. liberibacter asiaticus, causal agent of citrus greening, a disease that has cost the Florida citrus industry $3.63 billion since 2006.
DNA from D. citri was sequenced to 108X coverage to produce paired-end and mate-pair Illumina libraries. The sequences were mined for wolbachia (wACP) reads using 4 sequenced Wolbachia genomes as bait. Putative wACP reads were then assembled using Velvet and MIRA3 assemblers. The resulting wACP contigs were annotated using the RAST and compared to the closest sequenced wolbachia from an insect genome, Wolbachia endosymbiont of Culex quinquefasciatus (wPip). MIRA3 was able to reconstruct a majority of the wPip CDS regions and was therefore, selected for scaffolding using large insert mate-pair libraries. The wACP scaffolds were further improved using wPip as reference genome to orient and order the contigs.
In order to determine the presence of the core Wolbachia proteins in our wACP scaffold, we compared them to core Wolbachia proteins identified by OrthoMCL. 1164/1213 wACP proteins had matches of which 669 were to core proteins. This number compares favorably to the number of core proteins (670) found in sequenced Wolbachias.
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...Surya Saha
The Asian citrus psyllid (D. citri Kuwayama or ACP) is host to 7+ bacterial endosymbionts and is the insect vector of Ca. liberibacter asiaticus (Las), causal agent of citrus greening. To gain a better understanding of endosymbiont and pathogen ecology and develop improved detection strategies for Las, DNA from D. citri was sequenced to 108X coverage. Initial analyses have focused on Wolbachia, an alpha-proteobacterial primary endosymbiont typically found in the reproductive tissues of ACP and other arthropods. The metagenomic sequences were mined for wACP reads using BLAST and 4 sequenced Wolbachia genomes as bait. Putative wACP reads were then assembled using Velvet and MIRA3 assemblers over a range of parameter settings. The resulting wACP contigs were annotated using the RAST pipeline and compared to Wolbachia endosymbiont of Culex quinquefasciatus (wPip). MIRA3 was able to reconstruct a majority of the wPip CDS regions and was selected for scaffolding with Minimus2, SSPACE and SOPRA using large insert mate-pair libraries. The wACP scaffolds were compared to wPip using Abacas and Mauve contig mover to orient and order the contigs. The functional annotation of scaffolds was evaluated by comparing it to wPip genome using RAST. The draft assembly was verified using an OrthoMCL based comparison to the 4 sequenced Wolbachia genomes. We expanded the scope of endosymbiont characterization beyond wACP using 16S rDNA and partial 23S rDNA analysis as a guide. Results will be presented regarding endosymbionts, their potential interactions and their impact on the disease of citrus greening.
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
Presented at Cornell Symbiosis symposium. Workflow for processing amplicon based 16S/ITS sequences as well as whole genome shotgun sequences are described. Slides include short description and links for each tool.
DISCLAIMER: This is a small subset of tools out there. No disrespect to methods not mentioned.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
1. Surya Saha
Sol Genomics Network (SGN)
Boyce Thompson Institute, Ithaca, NY
ss2489@cornell.edu // @SahaSurya
BTI PGRP Summer Internship Program 2014
http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die
2. Why Sequencing?
• Targeted interrogation
of genome
• Economical
• Technological
developments
• High-throughput assays
• But requires subsequent
validation
7/8/2014 BTI PGRP Summer Internship Program 2014 2
3. 1953
DNA Structure
discovery
1977
2012
Sanger DNA sequencing by
chain-terminating inhibitors
1984
Epstein-Barr
virus
(170 Kb)
1987Abi370
Sequencer
1995
2001
Homo
sapiens
(3.0 Gb)
2005
454
Solexa
Solid
2007
2011
Ion
Torrent
PacBio
Haemophilus
influenzae
(1.83 Mb)
2013
Slide credit: Aureliano Bombarely
Sequencing over the Ages
Illumina
Illumina
Hiseq X
454
7/8/2014 BTI PGRP Summer Internship Program 2014 3
Pinus
taeda
(24 Gb)
5. Sanger method
7/8/2014 BTI PGRP Summer Internship Program 2014 5
Frederick Sanger
13 Aug 1918 – 19 Nov 2013
Won the Nobel Prize for Chemistry in 1958 and
1980. Published the dideoxy chain termination
method or “Sanger method” in 1977
http://dailym.ai/1f1XeTB
7. First generation sequencing
• Very high quality sequences (99.999%)
• Very low throughput
7/8/2014 BTI PGRP Summer Internship Program 2014 7
Run Time Read Length Reads / Run
Total
nucleotides
sequenced
Cost / MB
Capillary
Sequencing
(ABI3730xl)
20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400
http://bit.ly/1clLps3
http://1.usa.gov/1cLqIRd
8. Use the specific technology used
to generate the data
– Illumina Hiseq/Miseq/NextSeq
– Pacific Biosciences RS1/RSII
– Ion Torrent Proton/PGM
– SOLiD
7/8/2014 BTI PGRP Summer Internship Program 2014 8
http://www.acgt.me/blog/2014/3/10/next-generation-
sequencing-must-diepart-2
9. 454 Pyrosequencing
One purified DNA
fragment, to one bead, to
one read.
7/8/2014 BTI PGRP Summer Internship Program 2014 9
http://bit.ly/1ehwxWN
GS FLX
Titanium
http://bit.ly/1ehAcEh
10. Illumina
7/8/2014 BTI PGRP Summer Internship Program 2014 10
Output 15 Gb 120 GB 1000 GB 1800 GB
Number
of Reads
25 Million 400 Million 4 Billion 6 Billion
Read
Length
2x300 bp 2x150 bp 2x125 bp
(2x250 update mid-2014)
2x150 bp
Cost $99K $250K $740K $10M
Source: Illumina
13. Pacific Biosciences SMRT sequencing
Single Molecule Real
Time sequencing
7/8/2014 BTI PGRP Summer Internship Program 2014 13
http://bit.ly/1naxgTe
14. Pacific Biosciences SMRT sequencing
Error correction methods
7/8/2014 BTI PGRP Summer Internship Program 2014 14
Hierarchical genome-assembly
process (HGAP)
PBJelly
Enlish et al., PLOS One. 2012
PBJelly
15. 7/8/2014 Centre for Agricultural Bioinformatics, Pusa 15
Pacific Biosciences SMRT sequencing
Read Lengths
16. Oxford Nanopore
7/8/2014 Centre for Agricultural Bioinformatics, Pusa 16
https://www.nanoporetech.com/
• No data yet??
• Error model
http://erlichya.tumblr.com/post/66376172948/hands-on-
experience-with-oxford-nanopore-minion
22. Real cost of Sequencing!!
Sboner, Genome Biology, 2011
7/8/2014 22Centre for Agricultural Bioinformatics, Pusa
23. Library Types
Single end
Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
7/8/2014 BTI PGRP Summer Internship Program 2014 23
F
F R
F R 454/Roche
FR Illumina
Illumina
Slide credit: Aureliano Bombarely
24. Implications of Choice of Library
7/8/2014 BTI PGRP Summer Internship Program 2014 24
Slide credit: Aureliano Bombarely
Consensus sequence
(Contig)
Reads
Scaffold
(or Supercontig)
Pair Read information
NNNNN
Pseudomolecule
(or ultracontig)
F
Genetic information (markers)
NNNNN NN
25. Multiplexing Libraries
Use of different tags (4-6 nucleotides) to identify
different samples in the same lane/sector.
7/8/2014 BTI PGRP Summer Internship Program 2014 25
Slide credit: Aureliano Bombarely
AGTCGT
TGAGCA
AGTCGT
AGTCGT
AGTCGT
AGTCGT
TGAGCA
TGAGCA
TGAGCA
TGAGCA
AGTCGT
AGTCGT
AGTCGT
AGTCGT
TGAGCA
TGAGCA
TGAGCA
TGAGCA
Sequencing
26. Fasta files:
It is a text-based format for representing either nucleotide sequences or peptide
sequences, in which nucleotides or amino acids are represented using single-letter codes.
-Wikipedia
File Formats
7/8/2014 BTI PGRP Summer Internship Program 2014 26
Slide credit: Aureliano Bombarely
27. Fastq files:
FASTQ format is a text-based format for storing both a biological sequence (usually
nucleotide sequence) and its corresponding quality scores.
-Wikipedia
• Single line ID with at symbol (“@”) in the first column.
• Sequences can be in multiple lines after the ID line
• Single line with plus symbol (“+”) in the first column to represent the quality line.
• Quality ID line may contain ID
• Quality values are in multiple lines after the + line but length should be identical to sequence
7/8/2014 BTI PGRP Summer Internship Program 2014 27
Slide credit: Aureliano Bombarely
File Formats
28. 7/8/2014 BTI PGRP Summer Internship Program 2014 28
Quality control: Encoding
Fastq files:
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
29. Quality control: Encoding
7/8/2014 BTI PGRP Summer Internship Program 2014 29
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
30. 7/8/2014 BTI PGRP Summer Internship Program 2014 30
Quality control: Encoding
http://bit.ly/N28yUd
Phred score of a base is:
Qphred = -10 log10 (e)
where e is the estimated probability of a base
being wrong