This talk was presented at IASRI Pusa on June 13th, 2014.
Centre for Agricultural Bioinformatics
Indian Agricultural Statistics Research Institute
Library Avenue, Pusa, New Delhi - 110012 (INDIA)
http://cabgrid.res.in/cabin/
Surya Saha presented on the history and current state of DNA sequencing technologies. The presentation covered first generation Sanger sequencing and more recent next generation sequencing technologies such as Illumina, Ion Torrent, PacBio, and Oxford Nanopore. Key points included the increasing throughput and decreasing costs of sequencing over time, factors to consider when choosing a sequencing technology, and potential future applications of sequencing in medicine and environmental studies. The presentation concluded by discussing opportunities for students in computational biology.
This was presented on Mar 11, 2014 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/
The document provides an overview of the history and development of DNA sequencing technologies. It discusses early methods like Sanger sequencing and Maxam-Gilbert sequencing. It then summarizes major next-generation sequencing platforms like Illumina, Pacific Biosciences, and Oxford Nanopore. The document also covers sequencing trends, costs, and considerations for choosing a sequencing platform.
This was presented on Mar 31, 2015 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/
The document summarizes a study that used Illumina Hi-seq sequencing to analyze taxon diversity in bulk insect samples. The researchers tested two approaches: 1) PCR-based amplification of the COI barcode region followed by Illumina sequencing, and 2) direct shotgun sequencing of total mitochondrial DNA without PCR. Both approaches showed potential for high-throughput environmental barcoding, though methodological improvements are still needed to address issues like taxonomic and biomass biases. The study demonstrates that Illumina sequencing can perform comparably to other platforms for analyzing mixed insect samples and may help solve amplification biases through a PCR-free method.
Surya Saha presented on the history and current state of DNA sequencing technologies. The presentation covered first generation Sanger sequencing and more recent next generation sequencing technologies such as Illumina, Ion Torrent, PacBio, and Oxford Nanopore. Key points included the increasing throughput and decreasing costs of sequencing over time, factors to consider when choosing a sequencing technology, and potential future applications of sequencing in medicine and environmental studies. The presentation concluded by discussing opportunities for students in computational biology.
This was presented on Mar 11, 2014 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/
The document provides an overview of the history and development of DNA sequencing technologies. It discusses early methods like Sanger sequencing and Maxam-Gilbert sequencing. It then summarizes major next-generation sequencing platforms like Illumina, Pacific Biosciences, and Oxford Nanopore. The document also covers sequencing trends, costs, and considerations for choosing a sequencing platform.
This was presented on Mar 31, 2015 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/
The document summarizes a study that used Illumina Hi-seq sequencing to analyze taxon diversity in bulk insect samples. The researchers tested two approaches: 1) PCR-based amplification of the COI barcode region followed by Illumina sequencing, and 2) direct shotgun sequencing of total mitochondrial DNA without PCR. Both approaches showed potential for high-throughput environmental barcoding, though methodological improvements are still needed to address issues like taxonomic and biomass biases. The study demonstrates that Illumina sequencing can perform comparably to other platforms for analyzing mixed insect samples and may help solve amplification biases through a PCR-free method.
1. BioSMACK is a Linux Live CD customized for analysis of genome-wide association studies (GWAS).
2. It provides pre-compiled, installed and configured software for GWAS analysis like PLINK, EIGENSTRAT, STRUCTURE, and others from a bootable CD/USB without installing on the hard disk.
3. Future works include supporting cloud and cluster computing for parallel GWAS analysis on large datasets.
Next-generation sequencing technologies have rapidly advanced since 2005. Key developments include massively parallel sequencing reactions that enabled sequencing of entire human genomes for less than $1,000 by 2015. While Illumina dominates the market, other platforms like Ion Torrent and PacBio are increasing capabilities. Routine human whole genome sequencing is now used in research and medicine, enabling new opportunities like liquid biopsies and single-cell analysis. However, data storage and analysis remain challenges due to the large volumes of sequencing data.
This document summarizes research on the structural biology of cytochrome P450 monooxygenases in the phenylpropanoid pathway of Arabidopsis thaliana. It discusses homology modeling and protein engineering methods used to study the structures of phenylpropanoid P450s. It also describes efforts to develop X-ray crystallography and NMR techniques to determine the crystal structures of these P450s to better understand their functions in metabolic pathways.
H Mishima - Biogem, Ruby UCSC API, and BioRubyJan Aerts
BioRuby is an open-source bioinformatics library for the Ruby programming language that has been in development since 2000. It utilizes a centralized approach where code changes are reviewed by core committers. In recent years, efforts have been made to decentralize development through the use of GitHub and biogems - plug-ins that can be developed and maintained independently while still following standard guidelines. There are now over 60 biogems covering various domains. The biogems framework aims to further expand and motivate contributions to BioRuby.
CSU Next Generation Sequencing Core 06/09/2015Richard Casey
The Next Generation Sequencing Core at Colorado State University provides next generation sequencing and bioinformatics services and support using Illumina and Ion Torrent sequencing platforms for applications such as DNA sequencing, RNA sequencing, epigenetics, metagenomics, and more, drawing on over 10 years of experience and resources including on-campus laboratories, computational infrastructure, and staff expertise. Services include laboratory preparation, sequencing runs, bioinformatics analysis, and training.
The document outlines a schedule of bioinformatics lessons taking place from September to December 2011. It includes topics such as biological databases, sequence similarity, sequence alignments, database searching, phylogenetics, protein structure, gene prediction, and bio- and cheminformatics in drug discovery. Dates and times are provided for each lesson.
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...Joe Parker
A short presentation to the British Society for Plant Pathology's 'Grand Challenges in Plant Pathology' workshop on the uses of real-time DNA/RNA sequencing technology for plant health applications.
Doctoral Training Centre, University of Oxford, 14th September 2016.
How scientists and artists are discovering musical patterns in protein struct...Willard Van De Bogart
This document discusses how scientists are discovering musical patterns in protein structures by converting their molecular vibrations into sound. Researchers at MIT have created a "protein synthesizer" that can take a protein's structure and transform it into an audible musical composition. They have also developed techniques to take a musical composition and use it to design hypothetical new protein structures. The goal is to better understand protein folding and dynamics through analogies to music. Other researchers are exploring combining sounds of proteins with cosmic sounds, voices, and thought forms to create "neuroacoustic architecture" and "space music".
This document provides information about an upcoming training workshop on using the Ensembl genome browser and related tools. The workshop will cover introductory topics in the morning, including browsing genes and genomic features, exercises with the browser, and an introduction to BioMart. The afternoon sessions will cover genetic variation and gene regulation. Participants are encouraged to bring their own questions about specific genes or genomic regions. Reference materials for the workshop are available under a Creative Commons license.
This is the webinar presented on the 14th April as part of the Ensembl Online Webinar series. You can view the recorded webinar on the Ensembl Helpdesk youtube channel https://www.youtube.com/watch?v=blbhuqiiDoA
This document provides an introduction to next generation sequencing (NGS) technologies. It begins with an outline of topics to be covered, including the evolution of NGS technologies, their descriptions and comparisons, bioinformatics challenges of NGS data analysis, and some aspects of NGS data analysis workflows and tools. The document then delves into explanations of specific NGS platforms, their performance characteristics, and the sequencing processes. It discusses the large computational infrastructure and data management needs of NGS, as well as quality control, preprocessing of NGS data, and popular analysis tools and workflows.
This document provides an overview and comparison of popular next-generation sequencing platforms. It discusses the common sequencing pipeline including library preparation, massively parallel sequencing, and bioinformatics analysis. Popular platforms like Roche 454, Illumina, and SOLiD are described in detail focusing on their specific sequencing chemistries and performance characteristics. Newer third-generation platforms such as Ion Torrent, PacBio, and Oxford Nanopore are also introduced. A wide range of NGS applications from whole genome sequencing to RNA-seq are outlined.
Next-generation sequencing techniques such as Illumina and 454 pyrosequencing were discussed for applications including microbial genome sequencing and metagenomic profiling of microbial communities from targeted gene markers or shotgun sequencing. Key steps include library preparation, sequencing, and downstream bioinformatics analysis of sequencing data for tasks like genome assembly, gene annotation, and taxonomic classification of microbial taxa.
The Molecular Diagnostics Laboratory processes around 20,000 specimens annually for various testing categories including infectious diseases, hematological malignancies, solid tumor malignancies, and inherited disorders. Next generation sequencing is proposed as a solution to provide more comprehensive genetic testing by simultaneously evaluating variations in multiple genes from a single sample. While next generation sequencing offers advantages over traditional testing methods, there are also technical and clinical challenges to address in implementation, including optimizing detection of structural variations and interpretation of variants with unclear clinical significance.
The document describes the steps of Illumina sequencing. Genomic DNA is first fragmented and adapters are ligated to create single-stranded DNA fragments. These fragments are attached to a flow cell and undergo bridge amplification to create clusters of identical DNA fragments. Sequencing occurs through cycles of reversible terminator-based sequencing using fluorescently labeled dNTPs, imaging of the fluorescence, and cleavage of the label and terminator to allow the next cycle. After multiple cycles, the sequenced reads are aligned to the reference genome to determine the original sequence.
The document summarizes molecular characterization of Puccinia striiformis f.sp. tritici (Pst) isolates from Western Canada. Pst isolates were sequenced using Illumina platforms and assembled de novo. Phylogenetic trees were constructed based on rRNA sequences and whole genome assemblies. Comparisons between old isolates from 1990-1993 and new isolates from 2007-2012 identified unique and enriched gene sequences, suggesting genome reorganization in Pst. Functional annotation revealed differences in biological processes between old and new isolates, such as transport and response to exogenous molecules in new isolates.
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...Lex Nederbragt
A talk I gave at the Microbiology Research Group (University of Oslo) about new High Throughput Sequencing instruments at the Norwegian Sequencing Centre. I also mentioned future upgrades, and the upcoming nanopore sequencing platform of Oxford nanopore
Microbiome research is undergoing a crisis due to issues like the correlation-causation fallacy in studies and poor experimental design. The document discusses challenges with studying the microbiome, including biases and errors introduced from DNA extraction methods, sample storage conditions, and contamination from extraction kits. It emphasizes that every step in microbiome research, from sample collection to analysis, needs careful consideration to draw accurate conclusions.
The document describes Phase II of the ABRF Next Generation Sequencing Study which aims to establish reference data sets for evaluating DNA sequencing performance across multiple platforms and laboratories. Phase II will sequence various human and bacterial genomic samples to assess accuracy, coverage, and limits of detection using different platforms and library preparation methods. A collaboration with NIST Genome in a Bottle will provide standardized samples to the participating laboratories. The study aims to provide a resource for ongoing method development and evaluation of sequencing performance.
Next-generation sequencing (NGS) has revolutionized the way we analyze diseases and commercial outfits such as Illumina, Helicos, QIAGEN and Pacific Biosciences have made significant contributions. In addition, the launch of direct-to-consumer genetic testing solutions has dramatically changed the way consumers access genomics data. Until a few years ago, the cost of sequencing was a major bottleneck. Recent developments have reduced the cost from thousands of dollars to a couple of cents per megabase. When did these changes start? What were the changes in the commercial sector in the last 15 years? This infographic is a timeline of the NGS commercial marketplace.
1. BioSMACK is a Linux Live CD customized for analysis of genome-wide association studies (GWAS).
2. It provides pre-compiled, installed and configured software for GWAS analysis like PLINK, EIGENSTRAT, STRUCTURE, and others from a bootable CD/USB without installing on the hard disk.
3. Future works include supporting cloud and cluster computing for parallel GWAS analysis on large datasets.
Next-generation sequencing technologies have rapidly advanced since 2005. Key developments include massively parallel sequencing reactions that enabled sequencing of entire human genomes for less than $1,000 by 2015. While Illumina dominates the market, other platforms like Ion Torrent and PacBio are increasing capabilities. Routine human whole genome sequencing is now used in research and medicine, enabling new opportunities like liquid biopsies and single-cell analysis. However, data storage and analysis remain challenges due to the large volumes of sequencing data.
This document summarizes research on the structural biology of cytochrome P450 monooxygenases in the phenylpropanoid pathway of Arabidopsis thaliana. It discusses homology modeling and protein engineering methods used to study the structures of phenylpropanoid P450s. It also describes efforts to develop X-ray crystallography and NMR techniques to determine the crystal structures of these P450s to better understand their functions in metabolic pathways.
H Mishima - Biogem, Ruby UCSC API, and BioRubyJan Aerts
BioRuby is an open-source bioinformatics library for the Ruby programming language that has been in development since 2000. It utilizes a centralized approach where code changes are reviewed by core committers. In recent years, efforts have been made to decentralize development through the use of GitHub and biogems - plug-ins that can be developed and maintained independently while still following standard guidelines. There are now over 60 biogems covering various domains. The biogems framework aims to further expand and motivate contributions to BioRuby.
CSU Next Generation Sequencing Core 06/09/2015Richard Casey
The Next Generation Sequencing Core at Colorado State University provides next generation sequencing and bioinformatics services and support using Illumina and Ion Torrent sequencing platforms for applications such as DNA sequencing, RNA sequencing, epigenetics, metagenomics, and more, drawing on over 10 years of experience and resources including on-campus laboratories, computational infrastructure, and staff expertise. Services include laboratory preparation, sequencing runs, bioinformatics analysis, and training.
The document outlines a schedule of bioinformatics lessons taking place from September to December 2011. It includes topics such as biological databases, sequence similarity, sequence alignments, database searching, phylogenetics, protein structure, gene prediction, and bio- and cheminformatics in drug discovery. Dates and times are provided for each lesson.
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...Joe Parker
A short presentation to the British Society for Plant Pathology's 'Grand Challenges in Plant Pathology' workshop on the uses of real-time DNA/RNA sequencing technology for plant health applications.
Doctoral Training Centre, University of Oxford, 14th September 2016.
How scientists and artists are discovering musical patterns in protein struct...Willard Van De Bogart
This document discusses how scientists are discovering musical patterns in protein structures by converting their molecular vibrations into sound. Researchers at MIT have created a "protein synthesizer" that can take a protein's structure and transform it into an audible musical composition. They have also developed techniques to take a musical composition and use it to design hypothetical new protein structures. The goal is to better understand protein folding and dynamics through analogies to music. Other researchers are exploring combining sounds of proteins with cosmic sounds, voices, and thought forms to create "neuroacoustic architecture" and "space music".
This document provides information about an upcoming training workshop on using the Ensembl genome browser and related tools. The workshop will cover introductory topics in the morning, including browsing genes and genomic features, exercises with the browser, and an introduction to BioMart. The afternoon sessions will cover genetic variation and gene regulation. Participants are encouraged to bring their own questions about specific genes or genomic regions. Reference materials for the workshop are available under a Creative Commons license.
This is the webinar presented on the 14th April as part of the Ensembl Online Webinar series. You can view the recorded webinar on the Ensembl Helpdesk youtube channel https://www.youtube.com/watch?v=blbhuqiiDoA
This document provides an introduction to next generation sequencing (NGS) technologies. It begins with an outline of topics to be covered, including the evolution of NGS technologies, their descriptions and comparisons, bioinformatics challenges of NGS data analysis, and some aspects of NGS data analysis workflows and tools. The document then delves into explanations of specific NGS platforms, their performance characteristics, and the sequencing processes. It discusses the large computational infrastructure and data management needs of NGS, as well as quality control, preprocessing of NGS data, and popular analysis tools and workflows.
This document provides an overview and comparison of popular next-generation sequencing platforms. It discusses the common sequencing pipeline including library preparation, massively parallel sequencing, and bioinformatics analysis. Popular platforms like Roche 454, Illumina, and SOLiD are described in detail focusing on their specific sequencing chemistries and performance characteristics. Newer third-generation platforms such as Ion Torrent, PacBio, and Oxford Nanopore are also introduced. A wide range of NGS applications from whole genome sequencing to RNA-seq are outlined.
Next-generation sequencing techniques such as Illumina and 454 pyrosequencing were discussed for applications including microbial genome sequencing and metagenomic profiling of microbial communities from targeted gene markers or shotgun sequencing. Key steps include library preparation, sequencing, and downstream bioinformatics analysis of sequencing data for tasks like genome assembly, gene annotation, and taxonomic classification of microbial taxa.
The Molecular Diagnostics Laboratory processes around 20,000 specimens annually for various testing categories including infectious diseases, hematological malignancies, solid tumor malignancies, and inherited disorders. Next generation sequencing is proposed as a solution to provide more comprehensive genetic testing by simultaneously evaluating variations in multiple genes from a single sample. While next generation sequencing offers advantages over traditional testing methods, there are also technical and clinical challenges to address in implementation, including optimizing detection of structural variations and interpretation of variants with unclear clinical significance.
The document describes the steps of Illumina sequencing. Genomic DNA is first fragmented and adapters are ligated to create single-stranded DNA fragments. These fragments are attached to a flow cell and undergo bridge amplification to create clusters of identical DNA fragments. Sequencing occurs through cycles of reversible terminator-based sequencing using fluorescently labeled dNTPs, imaging of the fluorescence, and cleavage of the label and terminator to allow the next cycle. After multiple cycles, the sequenced reads are aligned to the reference genome to determine the original sequence.
The document summarizes molecular characterization of Puccinia striiformis f.sp. tritici (Pst) isolates from Western Canada. Pst isolates were sequenced using Illumina platforms and assembled de novo. Phylogenetic trees were constructed based on rRNA sequences and whole genome assemblies. Comparisons between old isolates from 1990-1993 and new isolates from 2007-2012 identified unique and enriched gene sequences, suggesting genome reorganization in Pst. Functional annotation revealed differences in biological processes between old and new isolates, such as transport and response to exogenous molecules in new isolates.
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...Lex Nederbragt
A talk I gave at the Microbiology Research Group (University of Oslo) about new High Throughput Sequencing instruments at the Norwegian Sequencing Centre. I also mentioned future upgrades, and the upcoming nanopore sequencing platform of Oxford nanopore
Microbiome research is undergoing a crisis due to issues like the correlation-causation fallacy in studies and poor experimental design. The document discusses challenges with studying the microbiome, including biases and errors introduced from DNA extraction methods, sample storage conditions, and contamination from extraction kits. It emphasizes that every step in microbiome research, from sample collection to analysis, needs careful consideration to draw accurate conclusions.
The document describes Phase II of the ABRF Next Generation Sequencing Study which aims to establish reference data sets for evaluating DNA sequencing performance across multiple platforms and laboratories. Phase II will sequence various human and bacterial genomic samples to assess accuracy, coverage, and limits of detection using different platforms and library preparation methods. A collaboration with NIST Genome in a Bottle will provide standardized samples to the participating laboratories. The study aims to provide a resource for ongoing method development and evaluation of sequencing performance.
Next-generation sequencing (NGS) has revolutionized the way we analyze diseases and commercial outfits such as Illumina, Helicos, QIAGEN and Pacific Biosciences have made significant contributions. In addition, the launch of direct-to-consumer genetic testing solutions has dramatically changed the way consumers access genomics data. Until a few years ago, the cost of sequencing was a major bottleneck. Recent developments have reduced the cost from thousands of dollars to a couple of cents per megabase. When did these changes start? What were the changes in the commercial sector in the last 15 years? This infographic is a timeline of the NGS commercial marketplace.
Phylogenomic methods for comparative evolutionary biology - University Colleg...Joe Parker
This document outlines Joe Parker's research interests in phylogenomics and high-throughput comparative genomics at Queen Mary University London. It discusses why phylogenomics is important, provides examples of past studies, and describes the lab's workflow and tools for sequencing, assembly, alignment, phylogeny inference, and phylogenetic analysis. It also presents a case study on detecting genome-wide convergence and discusses future directions including environmental metagenomics, cloud computing models, and real-time phylogenetics.
The document describes the ABRF Next Generation Sequencing Study which aims to produce reference data sets to establish baseline performance of sequencing platforms and methods. Phase I focused on RNA-Seq and produced major conclusions about intraplatform and interplatform concordance. Phase II will focus on DNA sequencing including performance using different platforms/protocols, damaged DNA, small genomes, and oncogenic mutations. Samples and sequencing plans are described for three projects. The study is a collaboration between ABRF and Genome in a Bottle to generate standardized reference data.
Molecular QC: Interpreting your Bioinformatics PipelineCandy Smellie
What is the impact of assay failure in your laboratory and how do you monitor for it?
The most heavily degraded samples are not suitable for standard exome coverage: sometimes it’s not even a matter of getting bad sequencing, you might get nothing at all!
FFPE artifacts increase with storage time
Artifacts go against the statistical power of your variant calling analysis
Molecular reference standards help filter out bad mappings and spurious variants
Bioinformatics pipelines allow adding Molecular Reference Standards in your joint variant calling pipeline
Genome In A Bottle Reference Standards are invaluable for validating variant calling analysis
NIST and its collaborators shared datasets created with most NGS technologies
Horizon Diagnostics shared annotated, merged variant calls from NIST for the Ashkenazim Trio
~35K variants are predicted having high or moderate impact within the Trio
GM24385 (Ashkenazim Son) includes 352 small variants with high/moderate impact which are absent in Father and Mother
Routinely monitor the performance of your workflows and assays with independent external controls
This document provides an overview of next generation sequencing technologies and applications. It summarizes an upcoming webinar series on next generation sequencing and its role in cancer biology. The first webinar will provide an introduction to next generation sequencing technologies and applications and be presented by Quan Peng on April 4, 2013. The following two webinars will focus on next generation sequencing for cancer research and data analysis and be presented on April 11 and 18, 2013 respectively.
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...John Blue
Use of Next Generation Sequencing for Whole Genome Analysis of Pathogens - Dr. Douglas Marthaler, Veterinary Diagnostic Laboratory, College of Veterinary Medicine, University of Minnesota, from the 2016 Allen D. Leman Swine Conference, September 17-20, 2016, St. Paul, Minnesota, USA.
More presentations at http://www.swinecast.com/2016-leman-swine-conference-material
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
This document provides an overview of variant analysis from next-generation sequencing data. It begins with introductions to the CCA-Drylab@VUmc, TraIT, and Galaxy projects. The focus of the lecture is explained to be variant analysis from NGS data using interactive demos in Galaxy. Background is provided on Illumina sequencing technology and properties of sequencing reads. Key steps in variant analysis are outlined, including quality control and read mapping, variant calling and annotation using tools like FastQC, BWA, FreeBayes, and SnpEff. Formats for storing sequencing data and variants are also introduced, such as FASTQ, SAM/BAM, and VCF.
This document provides an overview of analyzing RNA-Seq data using the Tuxedo protocol in Galaxy. It describes experimental design considerations, quality control of sequencing data using FastQC, mapping reads to a reference genome using Tophat, determining differential expression with Cuffdiff, and visualizing results using IGV and CummeRbund. The tutorial walks through an example analysis on Drosophila melanogaster RNA-Seq data, covering topics such as setting file formats, running alignment and expression tools, extracting workflows, and useful Galaxy resources.
The document discusses Illumina's role in advancing precision medicine through next-generation sequencing and data analytics. It notes that while sequencing costs have decreased dramatically, challenges remain in interpreting, integrating, and analyzing the large volumes of genomic and other healthcare data. Illumina aims to develop comprehensive, patient-centric analytics platforms and knowledgebases to help address these challenges and enable more effective prevention, diagnosis, and treatment based on a patient's genetics, environment, and lifestyle. The success of these efforts will be measured by improvements in patient outcomes, healthcare costs and efficiencies, and changes in clinical practice guided by integrated genomic and clinical data analysis.
my students use ideas from my class on business models to develop a business model for ion proton's DNA sequencer. This sequencer uses semiconductor technology to read an organism's DNA sequence and is faster and cheaper than existing sequencers. This presentation describes the value proposition, customer selection, method of value capture and other aspects of a business model for Ion Proton's DNA sequencer
Improving Torsion Library Patterns with SMARTScomparePatrick Penner
The document appears to be a presentation on improving torsion library patterns using SMARTScompare. It discusses classifying torsions, matching patterns from databases like CSD and PDB, provides examples of sorting patterns with SMARTScompare, and describes available torsion analysis tools and software. The presentation is intended to improve understanding of torsion libraries and their automated enhancement using substructure pattern matching algorithms.
The document discusses the development of Phytophthora and Pythium databases to support the identification and monitoring of these major plant pathogen groups. It describes the objectives of building a cyberinfrastructure to archive genotype, phenotype and distribution data on Phytophthora species/isolates. The Phytophthora Database provides tools for sequence analysis, phylogenetic analysis and molecular identification. Future directions include expanding to other plant pathogen databases and integrating genomic and geospatial data.
This document summarizes a presentation given by A. Dürauer at the 9th Annual BioInnovation Leaders Summit on February 11th, 2016. The presentation discusses how high-throughput screening platforms can be used to address key questions in early biopharmaceutical process development, specifically around cell disruption, inclusion body recovery and processing, solubilization, and refolding/oxidation. The document outlines experimental workflows that could be used at a microscale to rapidly screen conditions across these unit operations in parallel. It suggests this approach could significantly reduce the time and materials needed for early process development activities compared to traditional benchscale methods.
Bioinformatic tools in Pheromone technologyTHILAKAR MANI
This document discusses the role of bioinformatics tools in pheromone technology. It provides an introduction to bioinformatics and describes some commonly used bioinformatics tools, including UniProt, DDBJ, KEGG, BLAST, and PyMol. UniProt is a database of protein sequences that is composed of UniProtKB/Swiss-Prot which contains manually annotated entries and UniProtKB/TrEMBL which contains automatically annotated entries. DDBJ is a nucleotide sequence database in Japan that collaborates with EMBL and GenBank to share data. KEGG is a database that integrates genomic and chemical information and contains pathway maps and functional hierarchies. BLAST is used for sequence alignment and comparison.
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
The document discusses how the semantic web can help power scientific discovery. It proposes building a massive network of interconnected data and software using web standards to 1) generate and test hypotheses by discovering associations in the data, 2) gather evidence to support or dispute hypotheses, and 3) contribute new knowledge back to the global network. This network, called the semantic web, treats data as a web of facts that can be shared and queried using semantic web standards. The document provides examples of how linked open data in the life sciences is being created and used via semantic web technologies to integrate data from multiple sources and answer complex queries.
Similar to Sequencing, Genome Assembly and the SGN Platform (6)
An open access resource portal for arthropod vectors and agricultural pathosy...Surya Saha
AgriVectors.org is a systems biology resource for vector biologists that aims to provide omics resources and databases to identify targets for interdiction molecules. It utilizes a distributed data schema to rapidly release genome assemblies and transcriptomes. Undergraduate students manually curate genes and pathways of interest from NCBI gene models. The site also provides web-based tools to visualize and analyze high-dimensional experimental data like proteomics and gene expression networks. The goal is to build an ecosystem of integrated resources and tools to study vector-pathogen-host systems important for agriculture.
Functional annotation of invertebrate genomesSurya Saha
Functional annotation of the Asian citrus psyllid genome identified genes, assigned gene ontology terms, and mapped genes to pathways. Gene ontology and pathway analysis of differentially expressed genes between infected and uninfected psyllids identified enriched terms involved in the cytoskeleton, endocytosis, and mitochondrial dysfunction. Improved functional annotation using GOanna added depth to the gene ontology annotation and identified additional enriched pathways related to response to hypoxia and regulation of cytoskeletal remodeling.
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
Rapidly spreading invasive diseases in systems with little or no prior experimental data or resources pose a unique set of challenges for growers, scientists as well as regulators. As a part of a USDA NIFA CAPS project focused on the psyllid, Diaphorina citri, we have released improved genomics resources including high quality genome assemblies and annotation. We have also created an open access web portal for analyses around the Citrus Greening/Huanglongbing disease complex. Citrusgreening.org includes pathosystem-wide resources and bioinformatics tools for multiple Citrus spp. hosts, the Asian citrus psyllid vector (ACP, Diaphorina citri), and multiple pathogens including Candidatus Liberibacter asiaticus (CLas). To the best of our knowledge, this is the first example of a database to use the pathosystem as a holistic framework to understand an insect transmitted plant disease. Users can submit relevant data sets to enable sharing and allow the community to leverage their data within an integrated system. The system includes the metabolic pathway databases CitrusCyc and DiaphorinaCyc with organism specific pathways that can be used to mine metabolomics, transcriptomics and proteomics results to identify pathways and regulatory mechanisms involved in disease response. The Psyllid Expression Network (PEN) contains expression profiles of ACP genes from multiple life stages, tissues, conditions and hosts. The Citrus Expression Network (CEN) contains public expression data from multiple tissues and conditions for various citrus hosts. All tools connect to a central database. The portal also includes electrical penetration graph (EPG) recordings, information about citrus rootstock trials and metabolomics data in addition to traditional omics data types with a goal of combining and mining all information related to the Huanglongbing pathosystem. User-friendly manual curation tools will allow the continuous improvement of knowledge base as more experimental research is published. The portal can be accessed at https://citrusgreening.org/.
Updates on Citrusgreening.org database from USDA NIFA project meetingSurya Saha
The document discusses the citrusgreening.org portal and its resources for researching citrus greening disease. It provides pathway databases for the Asian citrus psyllid vector and citrus pathogens, as well as expression networks showing gene expression data. It outlines current and future work including a psyllid annotation update, new citrus and psyllid RNA-seq data, and potential methods for studying the insect-pathogen interaction like genomics, transcriptomics, and epigenomics. The document envisions an AgriVectors knowledge base to integrate pathosystem data from multiple sources.
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingSurya Saha
ACP version 3 genome, official gene set version 3 and Isoseq transcriptome
Prashant Hosmani, Mirella Flores-Gonzalez, Lukas Mueller, Surya Saha
5th Annual Meeting
Indian River State College
Fort Pierce, FL
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesSurya Saha
Arthropod vectors of pathogens cause enormous economic losses and are a fundamental challenge for sustainable increases in food production, yet agricultural pathosystems remain an underserved area of research. To more effectively fight plant diseases, data pertaining to a disease system needs to be consolidated, made searchable and amenable to data mining. The AgriVectors platform is an open access and comprehensive resource for growers, researchers and industry working on plant pathogens and pathosystems spread by arthropod vectors. The portal connects established public repositories with pathosystem-specific data repositories. The AgriVectors system will provide tools to enable technologies such as RNAi, CRISPR, screening bioassays, etc. to leverage current and emerging knowledge across disciplines. It will also include private and unpublished data, using passwords and secure protocols for restricted access. The portal will be based on the Citrusgreening.org (https://citrusgreening.org/) community resource that was developed as a model for systems biology of tritrophic disease complexes. Citrusgreening.org provides omics and biology resources for the Huanglongbing pathosystem. In addition, it includes a biochemical pathway database for each organism in this disease complex, and an expression atlas with proteomics and RNAseq data from psyllids (http://pen.citrusgreening.org) and citrus (http://cen.citrusgreening.org) across multiple infection states. The AgriVectors portal will extend this model beyond gene-centric omics data to the broader Pathosystem-wide information, with integrated pest management, behavioral, plant health, soil health and climate data to incorporate rapid phenotyping information from research trials, building a foundation for more effectively identifying solutions to combat plant diseases.
Visualization of insect vector-plant pathogen interactions in the citrus gree...Surya Saha
This document summarizes Surya Saha's presentation on using omics approaches to study the interactions between the Asian citrus psyllid vector, Candidatus Liberibacter asiaticus pathogen, and citrus plants in the citrus greening pathosystem. Key points include the generation of a new reference genome for the Asian citrus psyllid, assembly of genomes for its endosymbionts, development of an online annotation platform for manual gene curation, generation of an isoform-level psyllid transcriptome, analysis of gene expression networks in the psyllid in response to different conditions, and discovery of differences in how psyllid life stages respond transcriptionally to the citrus
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Surya Saha
The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the causal agent for the citrus greening or Huanglongbing disease which threatens citrus industry worldwide. This vector is the primary target of approaches to stop the transmission of the pathogen. Accurate structural and functional annotation of the psyllid’s gene models and understanding its interactions with the pathogenic bacterium, CLas, is required for precise targeting using molecular methods such as RNAi. We opted for manual curation of gene families in the draft genome of D. citri (Diaci v1.1, contig N50 34.4Kb) that have key functional roles in D. citri biology and pathology. The community effort resulted in Official Gene Set v1.0 with more than 500 manually curated gene models across developmental, RNAi regulatory, and immune-related pathways.
Single copy marker analysis of the current genome shows a significant proportion of 3,350 markers conserved in Hemipterans to be missing (25%) with only 74% present in full-length copies. The manual genome annotation also identified a number of misassemblies and missing genes in the current genome. This is, in-part, due to the complexity introduced when assembling a heterogeneous sample containing DNA from multiple psyllids and exacerbated by the use of short reads. This challenge is common with insect genomes due to the size of individuals. To improve quality of genome assembly, we generated 36.2Gb of Pacbio long reads with a coverage of 80X for the 450Mb psyllid genome. The Canu assembler followed by Dovetail Chicago-based scaffolding was used to create an improved assembly (Diaci v2.0) with a contig N50 of 758.7kb and 1906 contigs. The assembly was polished with Pacbio and Illumina paired-end reads to remove indel and SNP errors. We are employing Dovetail Chicago and 10X Illumina libraries generated from a single psyllid in conjunction with Bionano optical maps to achieve long-range scaffolding of the genome. We have also generated full-length cDNA transcripts from diseased and healthy tissue from multiple life stages with the Pacbio IsoSeq technology. This will be the first time all these methods have been applied to resolve a complex insect genome from a highly heterogeneous sample. The new assembly will be available on https://citrusgreening.org/ which is our portal for all omics resources for the citrusgreening disease. We are continuing with the manual curation effort using the improved genome. We will also present how the improved genome and annotation is contributing to the development of molecular interdiction methods to disrupt the vectoring ability of D. citri.
The document discusses quality control of sequencing data. It covers exploration of data files using command line tools, evaluation of read quality metrics like quality scores and length distributions using FastQC, and preprocessing reads by trimming low quality ends and removing short reads using fastq-mcf. Exercises guide exploring a protein fasta file, evaluating quality of Illumina datasets for tomato, and preprocessing the reads.
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...Surya Saha
CitrusCyc is a metabolic pathway database for the Citrus clementina and Citrus sinensis genomes. It was constructed using the Pathway Tools software and contains pathways, reactions, enzymes and genes derived from the annotated citrus genomes and the MetaCyc database. The database contains over 25,000 proteins and 40,000 transcripts with EC numbers for both citrus species. It provides visualizations of metabolic pathways and allows for overlay of RNA-seq expression data. Future work includes manual curation of pathways and development of a Meta-CitrusCyc database.
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Surya Saha
The document discusses efforts to improve the genome assembly of the Asian citrus psyllid (Diaphorina citri), the insect vector of citrus greening disease. It describes using long read sequencing data from PacBio to generate a new assembly with an N50 of 83kb, a significant improvement over the previous N50 of 34kb. It further discusses additional efforts using technologies like Dovetail scaffolding, 10X Genomics, and optical mapping to further improve scaffolding and resolve haplotypes, with the goal of generating a high-quality reference genome for D. citri.
The document summarizes updates to the tomato genome sequence, including:
1) The tomato genome build SL3.0 integrated over 1000 BAC sequences into the previous build SL2.50, improving contiguity and reducing gaps.
2) The BAC sequences were assembled, aligned to SL2.50, and automatically integrated using a published workflow. Integrated BACs then underwent manual and NCBI validation.
3) Compared to SL2.50, the new build SL3.0 has fewer and smaller sequence gaps, representing an improved tomato genome assembly. Future plans include integrating additional sequences and producing new gene annotations.
This was presented on Mar 31, 2015 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/
The tomato reference genome is one of the most widely used genomic resources in the Solanaceae as well as the wider plant research community. We frequently receive questions from the community regarding the assembly versions. This session will explain the changes in the current version of the tomato genome (SL2.50). The current tomato genome build contains numerous inter-contig gaps (median 931bp, mean 1869bp) and inter-scaffold gaps (median 210Kbp, mean 525Kbp). Updates will be provided regarding the forthcoming tomato genome build (SL3.0) that will include finished BACs (HTGS phase 3) for closing the gaps.
This document outlines exercises for quality control of NGS data from an Illumina sequencing experiment on tomato ripening stages. The exercises include: 1) evaluating raw fastq files for format and number of sequences; 2) using FastQC to analyze read quality scores, lengths, duplication levels, and k-mer content; and 3) preprocessing the reads using fastq-mcf to trim low quality ends and remove short reads before reanalyzing with FastQC. The goal is to learn how to evaluate NGS read quality and preprocess data prior to downstream analysis.
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...Surya Saha
The availability of high-throughput next generation sequencing technologies presents an opportunity for in-silico discovery of endosymbionts. We describe a method for mining a whole genome shotgun metagenome to identify members of the endosymbiont community followed by reconstruction and validation of a high-quality draft microbial genome.
The Asian citrus psyllid (D. citri Kuwayama or ACP) is host to 7+ bacterial endosymbionts and is the insect vector of Ca. liberibacter asiaticus, causal agent of citrus greening, a disease that has cost the Florida citrus industry $3.63 billion since 2006.
DNA from D. citri was sequenced to 108X coverage to produce paired-end and mate-pair Illumina libraries. The sequences were mined for wolbachia (wACP) reads using 4 sequenced Wolbachia genomes as bait. Putative wACP reads were then assembled using Velvet and MIRA3 assemblers. The resulting wACP contigs were annotated using the RAST and compared to the closest sequenced wolbachia from an insect genome, Wolbachia endosymbiont of Culex quinquefasciatus (wPip). MIRA3 was able to reconstruct a majority of the wPip CDS regions and was therefore, selected for scaffolding using large insert mate-pair libraries. The wACP scaffolds were further improved using wPip as reference genome to orient and order the contigs.
In order to determine the presence of the core Wolbachia proteins in our wACP scaffold, we compared them to core Wolbachia proteins identified by OrthoMCL. 1164/1213 wACP proteins had matches of which 669 were to core proteins. This number compares favorably to the number of core proteins (670) found in sequenced Wolbachias.
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...Surya Saha
The Asian citrus psyllid (D. citri Kuwayama or ACP) is host to 7+ bacterial endosymbionts and is the insect vector of Ca. liberibacter asiaticus (Las), causal agent of citrus greening. To gain a better understanding of endosymbiont and pathogen ecology and develop improved detection strategies for Las, DNA from D. citri was sequenced to 108X coverage. Initial analyses have focused on Wolbachia, an alpha-proteobacterial primary endosymbiont typically found in the reproductive tissues of ACP and other arthropods. The metagenomic sequences were mined for wACP reads using BLAST and 4 sequenced Wolbachia genomes as bait. Putative wACP reads were then assembled using Velvet and MIRA3 assemblers over a range of parameter settings. The resulting wACP contigs were annotated using the RAST pipeline and compared to Wolbachia endosymbiont of Culex quinquefasciatus (wPip). MIRA3 was able to reconstruct a majority of the wPip CDS regions and was selected for scaffolding with Minimus2, SSPACE and SOPRA using large insert mate-pair libraries. The wACP scaffolds were compared to wPip using Abacas and Mauve contig mover to orient and order the contigs. The functional annotation of scaffolds was evaluated by comparing it to wPip genome using RAST. The draft assembly was verified using an OrthoMCL based comparison to the 4 sequenced Wolbachia genomes. We expanded the scope of endosymbiont characterization beyond wACP using 16S rDNA and partial 23S rDNA analysis as a guide. Results will be presented regarding endosymbionts, their potential interactions and their impact on the disease of citrus greening.
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
Presented at Cornell Symbiosis symposium. Workflow for processing amplicon based 16S/ITS sequences as well as whole genome shotgun sequences are described. Slides include short description and links for each tool.
DISCLAIMER: This is a small subset of tools out there. No disrespect to methods not mentioned.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
1. Surya Saha, Ph.D.
Cornell University & Boyce Thompson Institute
suryasaha@cornell.edu @SahaSurya
Centre for Agricultural Bioinformatics
Pusa, New Delhi
June 13,2014
Slides: http://bit.ly/CABin_Pusa_2014
http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die
Genome Assembly
Jason Chin http://www.bit.ly/SZPKIG
2. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 2
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that:
You attribute the work to its author and respect the rights
and licenses associated with its components.
Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero. Social Media Icons adapted with
permission from originals by Christopher Ross. Original images are available under GPL at
http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
4. 1953
DNA Structure
discovery
1977
2012
Sanger DNA sequencing by
chain-terminating inhibitors
1984
Epstein-Barr
virus
(170 Kb)
1987Abi370
Sequencer
1995
2001
Homo
sapiens
(3.0 Gb)
2005
454
Solexa
Solid
2007
2011
Ion
Torrent
PacBio
Haemophilus
influenzae
(1.83 Mb)
2013
Slide credit: Aureliano Bombarely
Sequencing over the Ages
Illumina
Illumina
Hiseq X
454
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 4
Pinus
taeda
(24 Gb)
2014
MinION
The Next Generation
5. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 5
Its all about the $£€¥
http://www.genome.gov/sequencingcosts/
6. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 6
First generation sequencing
7. Sanger method
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 7
Frederick Sanger
13 Aug 1918 – 19 Nov 2013
Won the Nobel Prize for Chemistry in 1958 and
1980. Published the dideoxy chain termination
method or “Sanger method” in 1977
http://dailym.ai/1f1XeTB
9. First generation sequencing
• Very high quality sequences (99.999%)
• Very low throughput
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 9
Run Time Read Length Reads / Run
Total
nucleotides
sequenced
Cost / MB
Capillary
Sequencing
(ABI3730xl)
20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400
http://bit.ly/1clLps3
http://1.usa.gov/1cLqIRd
10. Use the specific technology used
to generate the data
– Illumina Hiseq/Miseq/NextSeq
– Pacific Biosciences RS I/RS II
– Ion Torrent Proton/PGM
– SOLiD
– 454
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 10
http://www.acgt.me/blog/2014/3/10/next-generation-
sequencing-must-diepart-2
11. 454 Pyrosequencing
One purified DNA
fragment, to one bead, to
one read.
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 11
http://bit.ly/1ehwxWN
GS FLX
Titanium
http://bit.ly/1ehAcEh
12. Illumina
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 12
Output 15 Gb 120 GB 1000 GB 1800 GB
Number
of Reads
25 Million 400 Million 4 Billion 6 Billion
Read
Length
2x300 bp 2x150 bp 2x125 bp
(2x250 update mid-2014)
2x150 bp
Cost $99K $250K $740K $10M
Source: Illumina
$1000 human
genome??
15. Pacific Biosciences SMRT sequencing
Single Molecule Real
Time sequencing
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 15
http://bit.ly/1naxgTe
16. Pacific Biosciences SMRT sequencing
Error correction methods
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 16
Hierarchical genome-assembly
process (HGAP)
PBJelly
Enlish et al., PLOS One. 2012
PBJelly
17. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 17
Pacific Biosciences SMRT sequencing
Read Lengths
18. Oxford Nanopore
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 18
https://www.nanoporetech.com/
• No data yet??
• Error model
http://erlichya.tumblr.com/post/66376172948/hands-on-
experience-with-oxford-nanopore-minion
19. Others
• Ion Torrent Proton/PGM
• Nabsys
• SOLiD
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 19
24. Real cost of Sequencing!!
Sboner, Genome Biology, 2011
6/15/2014 24Centre for Agricultural Bioinformatics, Pusa
25. Library Types
Single end
Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 25
F
F R
F R 454/Roche
FR Illumina
Illumina
Slide credit: Aureliano Bombarely
26. Implications of Choice of Library
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 26
Slide credit: Aureliano Bombarely
Consensus sequence
(Contig)
Reads
Scaffold
(or Supercontig)
Pair Read information
NNNNN
Pseudomolecule
(or ultracontig)
F
Genetic information (markers)
NNNNN NN
27. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 27
Quality control: Encoding
http://bit.ly/N28yUd
Phred score of a base is:
Qphred = -10 log10 (e)
where e is the estimated probability of a base
being incorrect
36. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 36
• You have the expertise to install and run
• You have the suitable infrastructure (CPU & RAM) to run the assembler
• You have sufficient time to run the assembler
• Is designed to work with the specific mix of NGS data that you have
generated
• Best addresses what you want to get out of a genome assembly (bigger
overall assembly, more genes, most accuracy, longer scaffolds, most
resolution of haplotypes, most tolerant of repeats, etc.)
The BEST?? Genome Assembler for YOU
http://haldanessieve.org/2013/01/28/our-paper-making-pizzas-and-genome-assemblies/
38. Which technology to use??
• Microbial genomes
• Eukaryotic genomes
• Resequencing genomes
• RNAseq and other XXXseq methods
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 38
http://bit.ly/1ko9Kgh
43. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 43
Main web page (front page):
WEB ICONS
TOOL BAR
44. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 44
Main web page (front page):
TOOL BAR
(MENUS)
45. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 45
But the DATA also can be
edited
LocusLocus Editor Data
Community Data Curation
46. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 46
You need
• SGN account.
• Activate submitter / Locus Editor privileges by SGN curator
LocusLocus Editor Data
51. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 51
CassavaBase
http://cassavabase.org/
Slide credit: Jeremy Edwards
52. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 52
NextGen Cassava Project
● Project: Adapt SGN database for Cassava Breeding
● Goal: Apply Genomic Selection to cassava breeding
● Predict breeding values from genotype information
● Shorten the breeding cycle
● Massive amounts of genotypic data (GBS)
● Phenotypic data
● Data management challenge
● Improve flowering
● http://nextgencassava.org
Slide credit: Jeremy Edwards
53. SGN/Cassavabase behind the scenes
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 53
● Perl/Catalyst MVC Framework
● PostgreSQL Database
● Generic Model Organism Database (GMOD)
– Chado relational database schema
– GBrowse
– JBrowse
● R
– Experimental design
– QTL mapping
– Genomic selection
Slide credit: Jeremy Edwards
54. Objectives
Provide cassava breeders and researchers access
to data and tools in a centralized, user-friendly
and reliable database.
– Improve partner breeding program information
tracking
– Streamline management of genotypic and
phenotypic data
– Pipeline genotypic and phenotypic data through
Genomic Selection prediction analyses
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 54
Slide credit: Jeremy Edwards
55. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 55
Genomic Selection
The 'training population' is genotyped and phenotyped to 'train'
the genomic selection (GS) prediction model. Genotypic
information from the breeding material is then fed into the
model to calculate genomic estimated breeding values (GEBV)
for these lines. From Heffner et al. 2009 Crop Sci. 49:1–12
Information from a majority of lines in the breeding population (the training set) is used to create the
prediction model. The model is then used to predict the phenotypes of the remaining lines (the validation
set), using genotypic information only. The results from the model are compared to the actual data to give
the prediction accuracy. Image courtesy of Martha Hamblin, Cornell University
Flow diagram of a genomic selection breeding program.
Breeding cycle time is shortened by removing phenotypic
evaluation of lines before selection as parents for the next
cycle. From Heffner et al. 2009 Crop Sci. 49:1–12
Slide credit: Jeremy Edwards
56. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 56
Data collection in the field
● Android tablets
● Field book app
– Jesse Poland's group at
USDA-ARS / Kansas
State University
Slide credit: Jeremy Edwards
57. Cassava Trait Ontology
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 57
Kulakow et al. 2011
Kulakow et al. 2011
● Standard terminology
● Facilitate the sharing of information
● Allow users to query keywords related to traits
Slide credit: Jeremy Edwards
58. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 58
Position available at Solgenomics
Cassavabase project
Plant Breeding + Bioinformatician
● Familiar with breeding
● Programming in Perl, R, SQL, Hadoop
● Linux
● Africa
● Genius
http://www.cassavabase.org/forum/posts
.pl?topic_id=9