Talk in gene discovery session at PAGXXII (https://pag.confex.com/pag/xxii/webprogram/Session2128.html)
Joint work with Jonas Behr, Gabriele Schweikert, Andre Kahles and others.
Abstract: High throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in detection of expressed genes and transcripts. However, the immense dynamic range of gene expression, limitations and biases of the sequencing technology, as well as the observed complexity of the transcriptional landscape pose profound computational challenges. We discuss several of these challenges and based on illustrative simulation examples, we identify the limits of state-of-the-art tools in reconstructing multiple alternative transcripts even if sufficient information is provided. We propose a novel framework, called MiTie, for simultaneous transcript reconstruction and quantification based on combinatorial optimization. We use the negative binomial distribution to define a likelihood function and use a regularization approach to select a small number of transcripts quantitatively explaining the observed read data. We show that the resulting regularized maximum likelihood problem can be formulated as a mixed integer programming problem (MIP) which can be solved optimally using standard optimization approaches. We will also describe an extension of the discriminative gene finding system mGene that takes advantage of RNA-seq reads. We demonstrate that the extended system mGene.ngs can significantly more accurately predict transcript annotations when using RNA-seq data and also better than tools for transcriptome reconstruction that are solely based on RNA-seq data. Finally, we illustrate how a combination of gene finding and transcriptome reconstruction methods like MiTie can be used to accurately annotate newly sequenced genomes without prior annotations.
1. The document describes a method called Anchored Assembly for detecting structural variants from short-read sequencing data using read overlap assembly and reference removal.
2. The method was validated against other SV detection tools using validated SVs from fosmid/PacBio sequencing, detecting 15 previously undetected SVs with high sensitivity and specificity.
3. Examples are given of validated deletions and insertions detected in an Ashkenazi Jewish trio that were identical in the offspring and followed expected inheritance patterns from parents.
The document describes BioNano Genomics' Irys system for generating genome maps using single molecule imaging. The Irys system labels sites in native genomic DNA, linearizes and images the molecules to create digital maps over 100kb in length. These maps can then be assembled into consensus maps over 30Mb long and used for structural variation detection, genome finishing by aligning sequencing data, and validation of genome assemblies. Examples are provided analyzing data from the NIST GIAB trio to validate structural variants and correct conflicts between sequencing and genetic maps.
The National Center for Biotechnology Information (NCBI) provides one of the most extensive sets of web-based tools for biological research. The tools are indispensable when planning genomics experiments, including for qPCR, NGS, and CRISPR. In this presentation, Dr Matt McNeill takes a practical look at getting started with the wealth of NCBI tools, and shares some relevant tips to help you sift through the tools and options that we regularly use. In particular, he focuses on commonly adjusted parameters that will allow you to more effectively use the powerful Basic Local Alignment Algorithm Tool (BLAST) to identify off-target hybridization/annealing events. Dr McNeill also covers practical examples using NCBI tools to design assays.
1. The document discusses variant calling from NGS data and prioritizing variants. It covers calling variants, identifying somatic mutations by comparing tumor and normal samples, and identifying inherited variants using trio analysis.
2. Key steps include calling variants, filtering, identifying somatic mutations as variants present in tumor but not normal, and identifying inherited variants by applying models of inheritance to family trio data.
3. Prioritization considers functional impact, population frequencies, and visual inspection to select candidates for follow up.
Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...QIAGEN
DNA copy number variations (CNVs) play an important role in the pathogenesis and progression of cancer. While array comparative genomic hybridization (aCGH) has generally been used to identify CNVs in the whole genome, next-generation sequencing (NGS) provides an opportunity to characterize CNVs genome-wide with unprecedented resolution, even at the single cell level.
However, CNV detection in single cells is faced with various challenges, such as incomplete genome coverage, introduction of sequence errors, GC bias and false positives.
In this new poster, we show a method for capturing the entire genomic complexity of a single cell, overcoming these challenges and ensuring accurate detection of CNVs.
The document discusses using Genome in a Bottle (GIAB) data on DNAnexus cloud platform. It describes two examples: 1) Comparing different mapper and variant caller combinations using GIAB pilot genome data. Benchmarking shows BWA and GATK Haplotype Caller performed best. 2) Assessing structural variation detection in the Ashkenazi Jewish Trio, combining data from Illumina and PacBio sequencing. DNAnexus is working with GIAB to develop benchmark datasets for structural variants.
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...OECD Environment
CRISPR-Cas genome editing can be used to engineer disease resistance in crops by targeting susceptibility genes. The document discusses using CRISPR to knock out the S-gene SlMLO1 in tomato, conferring resistance to powdery mildew. It also describes improving resistance to RNA and DNA viruses in Nicotiana benthamiana by targeting viral sequences or genes involved in viral infection. CRISPR allows generating mutations without transgenes for non-GM disease resistant crops.
1. The document describes a method called Anchored Assembly for detecting structural variants from short-read sequencing data using read overlap assembly and reference removal.
2. The method was validated against other SV detection tools using validated SVs from fosmid/PacBio sequencing, detecting 15 previously undetected SVs with high sensitivity and specificity.
3. Examples are given of validated deletions and insertions detected in an Ashkenazi Jewish trio that were identical in the offspring and followed expected inheritance patterns from parents.
The document describes BioNano Genomics' Irys system for generating genome maps using single molecule imaging. The Irys system labels sites in native genomic DNA, linearizes and images the molecules to create digital maps over 100kb in length. These maps can then be assembled into consensus maps over 30Mb long and used for structural variation detection, genome finishing by aligning sequencing data, and validation of genome assemblies. Examples are provided analyzing data from the NIST GIAB trio to validate structural variants and correct conflicts between sequencing and genetic maps.
The National Center for Biotechnology Information (NCBI) provides one of the most extensive sets of web-based tools for biological research. The tools are indispensable when planning genomics experiments, including for qPCR, NGS, and CRISPR. In this presentation, Dr Matt McNeill takes a practical look at getting started with the wealth of NCBI tools, and shares some relevant tips to help you sift through the tools and options that we regularly use. In particular, he focuses on commonly adjusted parameters that will allow you to more effectively use the powerful Basic Local Alignment Algorithm Tool (BLAST) to identify off-target hybridization/annealing events. Dr McNeill also covers practical examples using NCBI tools to design assays.
1. The document discusses variant calling from NGS data and prioritizing variants. It covers calling variants, identifying somatic mutations by comparing tumor and normal samples, and identifying inherited variants using trio analysis.
2. Key steps include calling variants, filtering, identifying somatic mutations as variants present in tumor but not normal, and identifying inherited variants by applying models of inheritance to family trio data.
3. Prioritization considers functional impact, population frequencies, and visual inspection to select candidates for follow up.
Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...QIAGEN
DNA copy number variations (CNVs) play an important role in the pathogenesis and progression of cancer. While array comparative genomic hybridization (aCGH) has generally been used to identify CNVs in the whole genome, next-generation sequencing (NGS) provides an opportunity to characterize CNVs genome-wide with unprecedented resolution, even at the single cell level.
However, CNV detection in single cells is faced with various challenges, such as incomplete genome coverage, introduction of sequence errors, GC bias and false positives.
In this new poster, we show a method for capturing the entire genomic complexity of a single cell, overcoming these challenges and ensuring accurate detection of CNVs.
The document discusses using Genome in a Bottle (GIAB) data on DNAnexus cloud platform. It describes two examples: 1) Comparing different mapper and variant caller combinations using GIAB pilot genome data. Benchmarking shows BWA and GATK Haplotype Caller performed best. 2) Assessing structural variation detection in the Ashkenazi Jewish Trio, combining data from Illumina and PacBio sequencing. DNAnexus is working with GIAB to develop benchmark datasets for structural variants.
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...OECD Environment
CRISPR-Cas genome editing can be used to engineer disease resistance in crops by targeting susceptibility genes. The document discusses using CRISPR to knock out the S-gene SlMLO1 in tomato, conferring resistance to powdery mildew. It also describes improving resistance to RNA and DNA viruses in Nicotiana benthamiana by targeting viral sequences or genes involved in viral infection. CRISPR allows generating mutations without transgenes for non-GM disease resistant crops.
The document describes a presentation given by Gunnar Rätsch on tools for RNA-seq analysis and isoform characterization. It discusses the increasing amounts of biological data and challenges in developing accurate analysis algorithms. The presentation covers multiple tools developed by Rätsch's group for analyzing RNA-seq data, including tools for transcript quantification, multiple read mapping, alternative splicing analysis and detection of novel isoforms. The tools aim to improve RNA-seq analysis for large datasets and characterization of transcript isoforms and splicing.
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
This workshop will address critical issues related to Transcriptomics data:
Processing raw Next Generation Sequencing (NGS) data:
1. Next Generation Sequencing data preprocessing:
Trimming technical sequences
Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
Conventional pipelines (looking at known transcripts)
Identification of novel isoforms
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
Principal Component Analysis
Clustering
4. Supervised analysis:
Differential expression analysis
Classification, gene signature construction
5. Gene set enrichment analysis
The workshop will include hands-on exercises utilizing public domain datasets:
breast cancer cell lines transcriptomic profiles (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
patient-derived xenograft (PDX) mouse model of tumor and stroma transcriptomic profiles (http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014&path[]=23533), and
processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
Team: The workshops are designed by the researchers at the Tauber Bioinformatics Research Center at University of Haifa, Israel in collaboration with academic centers across the US. Technical support for the workshops is provided by the Pine Biotech team. https://edu.t-bio.info/a-critical-approach-to-transcriptomic-data-analysis/
The document provides an update on the Genome in a Bottle (GIAB) Consortium. Key points include:
- New benchmark sets have been developed for mosaic variants, tandem repeats, and chromosomes X and Y using whole genome assemblies.
- Additional reference materials and samples are available, including a new tumor/normal cell line and over 50 products based on broadly consented genomes.
- Benchmarking methods are improving to better evaluate variant calling, including for structural variants and different data types like RNA sequencing.
- Future plans include developing more somatic benchmarks, assembling the HG002 genome to near perfection, and a searchable public data registry.
Presentation carried out by Sergi Beltran Agulló, from the CNAG, at the course: Identification and analysis of sequence variants in sequencing projects: fundamentals and tools .
The document describes Phase II of the ABRF Next Generation Sequencing Study which aims to establish reference data sets for evaluating DNA sequencing performance across multiple platforms and laboratories. Phase II will sequence various human and bacterial genomic samples to assess accuracy, coverage, and limits of detection using different platforms and library preparation methods. A collaboration with NIST Genome in a Bottle will provide standardized samples to the participating laboratories. The study aims to provide a resource for ongoing method development and evaluation of sequencing performance.
This document discusses using presence-absence variation (PAV) analysis to assess genome assemblies and identify foreign DNA. It describes the ScanPAV pipeline which extracts and analyzes PAV sequences between assemblies. The document provides examples of ScanPAV being used to evaluate human genome assemblies and identify contaminants in Tasmanian devil cancer genomes. Key findings include Mycoplasma and Streptococcus contamination identified in some devil cancer assemblies but no exogenous contribution found to the emergence of transmissible devil facial tumors.
Bioinformatics tools are essential for analyzing next-generation sequencing (NGS) data. The summary describes the typical stages of NGS data analysis:
1. Primary analysis involves demultiplexing, base calling and quality control to produce fastq files.
2. Secondary analysis maps reads to a reference genome to produce SAM/BAM files and calls variants to produce VCF files.
3. Tertiary analysis annotates and filters variants to prioritize those relevant to disease.
This document discusses next-generation sequencing and its applications in genomics and pathology. It begins with an overview of common NGS terms and technologies. It then covers the typical NGS analysis workflow including quality control, mapping reads to a reference genome, variant calling and annotation. Challenges such as data storage, sharing and reporting are also addressed. The document concludes that clinical sequencing is becoming established but requires ongoing collaboration between pathologists, geneticists and bioinformaticians to realize its potential.
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopNuria Lopez-Bigas
The document describes an oncogenomics workshop discussing methods for identifying cancer driver genes from tumor sequencing data. It introduces two computational methods developed by the speaker's group called OncodriveFM and OncodriveCLUST that identify drivers by looking at the functional impact of mutations and regional mutation clustering, respectively. These methods can be applied across multiple cancer sequencing projects in a scalable way without needing raw sequencing data. The International Cancer Genome Consortium's IntOGen database currently analyzes over 3,000 tumor samples across 27 cancer projects using these and other methods.
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...ExternalEvents
The document describes the NCBI Pathogen Analysis Pipeline which supports real-time sequencing of foodborne pathogens. The pipeline performs k-mer analysis, genome assembly, annotation, placement, clustering, SNP analysis, and tree construction on sequencing data submitted to NCBI. It provides automated bacterial assembly and a SNP analysis pipeline for clustering isolates and identifying outbreaks. The pipeline is demonstrated on examples of outbreaks linked to stone fruit and chicken kiev. NCBI aims to build a database of sequenced antibiotic resistant isolates with standardized metadata and maintain reference databases of antibiotic resistance genes.
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
Paolo Missier presented on optimizing the re-execution of analytics pipelines in response to changes in input data. The talk discussed using provenance to selectively re-run parts of workflows impacted by changes. ProvONE combines process structure and runtime provenance to enable granular re-execution. The ReComp framework detects and quantifies data changes, estimates impact, and selectively re-executes relevant sub-processes to optimize re-running workflows in response to evolving data.
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
This document summarizes Golden Helix's capabilities for handling big data in genomics. It discusses how Golden Helix's software tools like VarSeq and the Variant Storage Warehouse can scale to handle large volumes of genomic and clinical data from whole genome sequencing, exome sequencing, and large population studies. It provides examples of how these tools have been used for clinical and research applications like variant filtering, annotation, and analysis of rare variants. The presentation concludes with a demonstration of the software.
The document provides information about a short course on next generation sequencing and analysis of sequence variants. It includes an agenda with sessions on introduction to NGS applications in medical research, data analysis pipelines, interpretation of variants, and tools for predicting pathogenicity. It also provides background on the organizing institutions, the CNAG sequencing center and its projects, and an overview of bioinformatics analysis pipelines and resources.
Apac distributor training series 3 swift product for cancer studySwift Biosciences
This document provides an overview of Swift Biosciences' product training series for their APAC distributors on using Swift products for cancer studies. It summarizes different Swift library preparation and sequencing kits that can be used for various cancer applications, including genomic sequencing, RNA sequencing, amplicon panels, hybridization capture, and DNA methylation. The document also reviews types of mutations found in cancer, considerations for cancer clinical workflows, and provides an example of using Swift's 2S Turbo kit for targeted sequencing of formalin-fixed, paraffin-embedded tissue samples.
The document provides details on analyzing genomic sequencing data from a breast cancer patient to identify somatic mutations. It describes extracting over 100 features from the normal and tumor sequencing data, selecting the most informative features, and using machine learning algorithms on principal components of the data to classify SNPs as somatic or germline. Several known breast cancer driver genes were found among the identified somatic mutations, including RAP1A, PARP1, and TACSTD2. Future work could involve analyzing more data from similar patients to increase training data and address sequencing errors.
This document summarizes the process used to benchmark large deletion calls from multiple sequencing technologies and bioinformatics pipelines. Researchers merged deletion calls from 14 datasets into regions and evaluated call size accuracy. Calls supported by two or more technologies were identified as draft benchmark calls. Sensitivity to these calls was calculated for each method. The results provide insight into strengths and weaknesses of different approaches to structural variant detection.
Kshivets O. Lung Cancer Surgery: PrognosisOleg Kshivets
1) The study investigated how homeostasis networks influence 5-year survival and lifespan in 404 lung cancer patients after radical surgery. 2) It was revealed that survival significantly depended on blood cell levels, the ratio of cancer to blood cells, cancer characteristics, biochemical homeostasis, coagulation, and anthropometric data. 3) Structural equation modeling confirmed significant relationships between survival and these homeostasis factors.
Research Program Genetic Gains (RPGG) Review Meeting 2021: Forward Breeding: ...ICRISAT
Integration of various molecular breeding approaches (MABC, MARS, and GS) in the product development process at ICRISAT. Accelerated rate of genetic gain across all mandate crops by leveraging expertise from various groups inside and outside ICRISAT.
Karen miga centromere sequence characterization and variant detectionGenomeInABottle
Centromeric regions contain significant human genetic variation that is not represented in current reference genomes. This document proposes a two-part approach to characterize sequence variation in centromeric regions: (1) construct chromosome-specific reference maps of centromeric DNA, and (2) expand the human variation reference map to include centromeric regions. Key aspects include using long reads to assemble higher-order repeats, short reads to estimate array sizes and variant frequencies, and graph representations to model structural variation while retaining haplotype information. This would provide new insights into centromeric biology and identify centromeric variants associated with disease.
Next generation sequencing (NGS) allows for the massively parallel sequencing of DNA sequences. NGS technologies can sequence entire genomes in a single run and provide information useful for pathogen identification, outbreak investigation, and molecular diagnostics. NGS workflows involve sample preparation, sequencing using platforms such as Illumina or Ion Torrent, and bioinformatics analysis to assemble and interpret the large amounts of sequencing data produced. NGS has many applications including mutation discovery, microbial genome mapping, and metagenomics.
- Video recording of this lecture in English language: https://youtu.be/kqbnxVAZs-0
- Video recording of this lecture in Arabic language: https://youtu.be/SINlygW1Mpc
- Link to download the book free: https://nephrotube.blogspot.com/p/nephrotube-nephrology-books.html
- Link to NephroTube website: www.NephroTube.com
- Link to NephroTube social media accounts: https://nephrotube.blogspot.com/p/join-nephrotube-on-social-media.html
Cell Therapy Expansion and Challenges in Autoimmune DiseaseHealth Advances
There is increasing confidence that cell therapies will soon play a role in the treatment of autoimmune disorders, but the extent of this impact remains to be seen. Early readouts on autologous CAR-Ts in lupus are encouraging, but manufacturing and cost limitations are likely to restrict access to highly refractory patients. Allogeneic CAR-Ts have the potential to broaden access to earlier lines of treatment due to their inherent cost benefits, however they will need to demonstrate comparable or improved efficacy to established modalities.
In addition to infrastructure and capacity constraints, CAR-Ts face a very different risk-benefit dynamic in autoimmune compared to oncology, highlighting the need for tolerable therapies with low adverse event risk. CAR-NK and Treg-based therapies are also being developed in certain autoimmune disorders and may demonstrate favorable safety profiles. Several novel non-cell therapies such as bispecific antibodies, nanobodies, and RNAi drugs, may also offer future alternative competitive solutions with variable value propositions.
Widespread adoption of cell therapies will not only require strong efficacy and safety data, but also adapted pricing and access strategies. At oncology-based price points, CAR-Ts are unlikely to achieve broad market access in autoimmune disorders, with eligible patient populations that are potentially orders of magnitude greater than the number of currently addressable cancer patients. Developers have made strides towards reducing cell therapy COGS while improving manufacturing efficiency, but payors will inevitably restrict access until more sustainable pricing is achieved.
Despite these headwinds, industry leaders and investors remain confident that cell therapies are poised to address significant unmet need in patients suffering from autoimmune disorders. However, the extent of this impact on the treatment landscape remains to be seen, as the industry rapidly approaches an inflection point.
More Related Content
Similar to RNA-seq based Genome Annotation with mGene.ngs and MiTie
The document describes a presentation given by Gunnar Rätsch on tools for RNA-seq analysis and isoform characterization. It discusses the increasing amounts of biological data and challenges in developing accurate analysis algorithms. The presentation covers multiple tools developed by Rätsch's group for analyzing RNA-seq data, including tools for transcript quantification, multiple read mapping, alternative splicing analysis and detection of novel isoforms. The tools aim to improve RNA-seq analysis for large datasets and characterization of transcript isoforms and splicing.
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
This workshop will address critical issues related to Transcriptomics data:
Processing raw Next Generation Sequencing (NGS) data:
1. Next Generation Sequencing data preprocessing:
Trimming technical sequences
Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
Conventional pipelines (looking at known transcripts)
Identification of novel isoforms
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
Principal Component Analysis
Clustering
4. Supervised analysis:
Differential expression analysis
Classification, gene signature construction
5. Gene set enrichment analysis
The workshop will include hands-on exercises utilizing public domain datasets:
breast cancer cell lines transcriptomic profiles (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
patient-derived xenograft (PDX) mouse model of tumor and stroma transcriptomic profiles (http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014&path[]=23533), and
processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
Team: The workshops are designed by the researchers at the Tauber Bioinformatics Research Center at University of Haifa, Israel in collaboration with academic centers across the US. Technical support for the workshops is provided by the Pine Biotech team. https://edu.t-bio.info/a-critical-approach-to-transcriptomic-data-analysis/
The document provides an update on the Genome in a Bottle (GIAB) Consortium. Key points include:
- New benchmark sets have been developed for mosaic variants, tandem repeats, and chromosomes X and Y using whole genome assemblies.
- Additional reference materials and samples are available, including a new tumor/normal cell line and over 50 products based on broadly consented genomes.
- Benchmarking methods are improving to better evaluate variant calling, including for structural variants and different data types like RNA sequencing.
- Future plans include developing more somatic benchmarks, assembling the HG002 genome to near perfection, and a searchable public data registry.
Presentation carried out by Sergi Beltran Agulló, from the CNAG, at the course: Identification and analysis of sequence variants in sequencing projects: fundamentals and tools .
The document describes Phase II of the ABRF Next Generation Sequencing Study which aims to establish reference data sets for evaluating DNA sequencing performance across multiple platforms and laboratories. Phase II will sequence various human and bacterial genomic samples to assess accuracy, coverage, and limits of detection using different platforms and library preparation methods. A collaboration with NIST Genome in a Bottle will provide standardized samples to the participating laboratories. The study aims to provide a resource for ongoing method development and evaluation of sequencing performance.
This document discusses using presence-absence variation (PAV) analysis to assess genome assemblies and identify foreign DNA. It describes the ScanPAV pipeline which extracts and analyzes PAV sequences between assemblies. The document provides examples of ScanPAV being used to evaluate human genome assemblies and identify contaminants in Tasmanian devil cancer genomes. Key findings include Mycoplasma and Streptococcus contamination identified in some devil cancer assemblies but no exogenous contribution found to the emergence of transmissible devil facial tumors.
Bioinformatics tools are essential for analyzing next-generation sequencing (NGS) data. The summary describes the typical stages of NGS data analysis:
1. Primary analysis involves demultiplexing, base calling and quality control to produce fastq files.
2. Secondary analysis maps reads to a reference genome to produce SAM/BAM files and calls variants to produce VCF files.
3. Tertiary analysis annotates and filters variants to prioritize those relevant to disease.
This document discusses next-generation sequencing and its applications in genomics and pathology. It begins with an overview of common NGS terms and technologies. It then covers the typical NGS analysis workflow including quality control, mapping reads to a reference genome, variant calling and annotation. Challenges such as data storage, sharing and reporting are also addressed. The document concludes that clinical sequencing is becoming established but requires ongoing collaboration between pathologists, geneticists and bioinformaticians to realize its potential.
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopNuria Lopez-Bigas
The document describes an oncogenomics workshop discussing methods for identifying cancer driver genes from tumor sequencing data. It introduces two computational methods developed by the speaker's group called OncodriveFM and OncodriveCLUST that identify drivers by looking at the functional impact of mutations and regional mutation clustering, respectively. These methods can be applied across multiple cancer sequencing projects in a scalable way without needing raw sequencing data. The International Cancer Genome Consortium's IntOGen database currently analyzes over 3,000 tumor samples across 27 cancer projects using these and other methods.
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...ExternalEvents
The document describes the NCBI Pathogen Analysis Pipeline which supports real-time sequencing of foodborne pathogens. The pipeline performs k-mer analysis, genome assembly, annotation, placement, clustering, SNP analysis, and tree construction on sequencing data submitted to NCBI. It provides automated bacterial assembly and a SNP analysis pipeline for clustering isolates and identifying outbreaks. The pipeline is demonstrated on examples of outbreaks linked to stone fruit and chicken kiev. NCBI aims to build a database of sequenced antibiotic resistant isolates with standardized metadata and maintain reference databases of antibiotic resistance genes.
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
Paolo Missier presented on optimizing the re-execution of analytics pipelines in response to changes in input data. The talk discussed using provenance to selectively re-run parts of workflows impacted by changes. ProvONE combines process structure and runtime provenance to enable granular re-execution. The ReComp framework detects and quantifies data changes, estimates impact, and selectively re-executes relevant sub-processes to optimize re-running workflows in response to evolving data.
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
This document summarizes Golden Helix's capabilities for handling big data in genomics. It discusses how Golden Helix's software tools like VarSeq and the Variant Storage Warehouse can scale to handle large volumes of genomic and clinical data from whole genome sequencing, exome sequencing, and large population studies. It provides examples of how these tools have been used for clinical and research applications like variant filtering, annotation, and analysis of rare variants. The presentation concludes with a demonstration of the software.
The document provides information about a short course on next generation sequencing and analysis of sequence variants. It includes an agenda with sessions on introduction to NGS applications in medical research, data analysis pipelines, interpretation of variants, and tools for predicting pathogenicity. It also provides background on the organizing institutions, the CNAG sequencing center and its projects, and an overview of bioinformatics analysis pipelines and resources.
Apac distributor training series 3 swift product for cancer studySwift Biosciences
This document provides an overview of Swift Biosciences' product training series for their APAC distributors on using Swift products for cancer studies. It summarizes different Swift library preparation and sequencing kits that can be used for various cancer applications, including genomic sequencing, RNA sequencing, amplicon panels, hybridization capture, and DNA methylation. The document also reviews types of mutations found in cancer, considerations for cancer clinical workflows, and provides an example of using Swift's 2S Turbo kit for targeted sequencing of formalin-fixed, paraffin-embedded tissue samples.
The document provides details on analyzing genomic sequencing data from a breast cancer patient to identify somatic mutations. It describes extracting over 100 features from the normal and tumor sequencing data, selecting the most informative features, and using machine learning algorithms on principal components of the data to classify SNPs as somatic or germline. Several known breast cancer driver genes were found among the identified somatic mutations, including RAP1A, PARP1, and TACSTD2. Future work could involve analyzing more data from similar patients to increase training data and address sequencing errors.
This document summarizes the process used to benchmark large deletion calls from multiple sequencing technologies and bioinformatics pipelines. Researchers merged deletion calls from 14 datasets into regions and evaluated call size accuracy. Calls supported by two or more technologies were identified as draft benchmark calls. Sensitivity to these calls was calculated for each method. The results provide insight into strengths and weaknesses of different approaches to structural variant detection.
Kshivets O. Lung Cancer Surgery: PrognosisOleg Kshivets
1) The study investigated how homeostasis networks influence 5-year survival and lifespan in 404 lung cancer patients after radical surgery. 2) It was revealed that survival significantly depended on blood cell levels, the ratio of cancer to blood cells, cancer characteristics, biochemical homeostasis, coagulation, and anthropometric data. 3) Structural equation modeling confirmed significant relationships between survival and these homeostasis factors.
Research Program Genetic Gains (RPGG) Review Meeting 2021: Forward Breeding: ...ICRISAT
Integration of various molecular breeding approaches (MABC, MARS, and GS) in the product development process at ICRISAT. Accelerated rate of genetic gain across all mandate crops by leveraging expertise from various groups inside and outside ICRISAT.
Karen miga centromere sequence characterization and variant detectionGenomeInABottle
Centromeric regions contain significant human genetic variation that is not represented in current reference genomes. This document proposes a two-part approach to characterize sequence variation in centromeric regions: (1) construct chromosome-specific reference maps of centromeric DNA, and (2) expand the human variation reference map to include centromeric regions. Key aspects include using long reads to assemble higher-order repeats, short reads to estimate array sizes and variant frequencies, and graph representations to model structural variation while retaining haplotype information. This would provide new insights into centromeric biology and identify centromeric variants associated with disease.
Next generation sequencing (NGS) allows for the massively parallel sequencing of DNA sequences. NGS technologies can sequence entire genomes in a single run and provide information useful for pathogen identification, outbreak investigation, and molecular diagnostics. NGS workflows involve sample preparation, sequencing using platforms such as Illumina or Ion Torrent, and bioinformatics analysis to assemble and interpret the large amounts of sequencing data produced. NGS has many applications including mutation discovery, microbial genome mapping, and metagenomics.
Similar to RNA-seq based Genome Annotation with mGene.ngs and MiTie (20)
- Video recording of this lecture in English language: https://youtu.be/kqbnxVAZs-0
- Video recording of this lecture in Arabic language: https://youtu.be/SINlygW1Mpc
- Link to download the book free: https://nephrotube.blogspot.com/p/nephrotube-nephrology-books.html
- Link to NephroTube website: www.NephroTube.com
- Link to NephroTube social media accounts: https://nephrotube.blogspot.com/p/join-nephrotube-on-social-media.html
Cell Therapy Expansion and Challenges in Autoimmune DiseaseHealth Advances
There is increasing confidence that cell therapies will soon play a role in the treatment of autoimmune disorders, but the extent of this impact remains to be seen. Early readouts on autologous CAR-Ts in lupus are encouraging, but manufacturing and cost limitations are likely to restrict access to highly refractory patients. Allogeneic CAR-Ts have the potential to broaden access to earlier lines of treatment due to their inherent cost benefits, however they will need to demonstrate comparable or improved efficacy to established modalities.
In addition to infrastructure and capacity constraints, CAR-Ts face a very different risk-benefit dynamic in autoimmune compared to oncology, highlighting the need for tolerable therapies with low adverse event risk. CAR-NK and Treg-based therapies are also being developed in certain autoimmune disorders and may demonstrate favorable safety profiles. Several novel non-cell therapies such as bispecific antibodies, nanobodies, and RNAi drugs, may also offer future alternative competitive solutions with variable value propositions.
Widespread adoption of cell therapies will not only require strong efficacy and safety data, but also adapted pricing and access strategies. At oncology-based price points, CAR-Ts are unlikely to achieve broad market access in autoimmune disorders, with eligible patient populations that are potentially orders of magnitude greater than the number of currently addressable cancer patients. Developers have made strides towards reducing cell therapy COGS while improving manufacturing efficiency, but payors will inevitably restrict access until more sustainable pricing is achieved.
Despite these headwinds, industry leaders and investors remain confident that cell therapies are poised to address significant unmet need in patients suffering from autoimmune disorders. However, the extent of this impact on the treatment landscape remains to be seen, as the industry rapidly approaches an inflection point.
Hiranandani Hospital in Powai, Mumbai, is a premier healthcare institution that has been serving the community with exceptional medical care since its establishment. As a part of the renowned Hiranandani Group, the hospital is committed to delivering world-class healthcare services across a wide range of specialties, including kidney transplantation. With its state-of-the-art facilities, advanced medical technology, and a team of highly skilled healthcare professionals, Hiranandani Hospital has earned a reputation as a trusted name in the healthcare industry. The hospital's patient-centric approach, coupled with its focus on innovation and excellence, ensures that patients receive the highest standard of care in a compassionate and supportive environment.
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachAyurveda ForAll
Explore the benefits of combining Ayurveda with conventional Parkinson's treatments. Learn how a holistic approach can manage symptoms, enhance well-being, and balance body energies. Discover the steps to safely integrate Ayurvedic practices into your Parkinson’s care plan, including expert guidance on diet, herbal remedies, and lifestyle modifications.
One health condition that is becoming more common day by day is diabetes.
According to research conducted by the National Family Health Survey of India, diabetic cases show a projection which might increase to 10.4% by 2030.
Our backs are like superheroes, holding us up and helping us move around. But sometimes, even superheroes can get hurt. That’s where slip discs come in.
Does Over-Masturbation Contribute to Chronic Prostatitis.pptxwalterHu5
In some case, your chronic prostatitis may be related to over-masturbation. Generally, natural medicine Diuretic and Anti-inflammatory Pill can help mee get a cure.
Osteoporosis - Definition , Evaluation and Management .pdfJim Jacob Roy
Osteoporosis is an increasing cause of morbidity among the elderly.
In this document , a brief outline of osteoporosis is given , including the risk factors of osteoporosis fractures , the indications for testing bone mineral density and the management of osteoporosis
Muktapishti is a traditional Ayurvedic preparation made from Shoditha Mukta (Purified Pearl), is believed to help regulate thyroid function and reduce symptoms of hyperthyroidism due to its cooling and balancing properties. Clinical evidence on its efficacy remains limited, necessitating further research to validate its therapeutic benefits.
Clinic ^%[+27633867063*Abortion Pills For Sale In Tembisa Central19various
Clinic ^%[+27633867063*Abortion Pills For Sale In Tembisa Central Clinic ^%[+27633867063*Abortion Pills For Sale In Tembisa CentralClinic ^%[+27633867063*Abortion Pills For Sale In Tembisa CentralClinic ^%[+27633867063*Abortion Pills For Sale In Tembisa CentralClinic ^%[+27633867063*Abortion Pills For Sale In Tembisa Central
Travel vaccination in Manchester offers comprehensive immunization services for individuals planning international trips. Expert healthcare providers administer vaccines tailored to your destination, ensuring you stay protected against various diseases. Conveniently located clinics and flexible appointment options make it easy to get the necessary shots before your journey. Stay healthy and travel with confidence by getting vaccinated in Manchester. Visit us: www.nxhealthcare.co.uk
Histololgy of Female Reproductive System.pptxAyeshaZaid1
Dive into an in-depth exploration of the histological structure of female reproductive system with this comprehensive lecture. Presented by Dr. Ayesha Irfan, Assistant Professor of Anatomy, this presentation covers the Gross anatomy and functional histology of the female reproductive organs. Ideal for students, educators, and anyone interested in medical science, this lecture provides clear explanations, detailed diagrams, and valuable insights into female reproductive system. Enhance your knowledge and understanding of this essential aspect of human biology.
RNA-seq based Genome Annotation with mGene.ngs and MiTie
1. RNA-Seq-based Genome Annotation
using mGene.ngs and MiTie
Gunnar R¨tsch
a
Biomedical Data Science Group
Computational Biology Center
Memorial Sloan-Kettering Cancer Center
gxr #mGene #MiTie #PAGXXII
2. Memorial Sloan-Kettering Cancer Center
Acknowledgements and Disclosures
Main contributors
Gabriele Schweikert
Jonas Behr
Andre Kahles
Funding
Financial interest disclosure
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
2
3. Memorial Sloan-Kettering Cancer Center
Acknowledgements and Disclosures
Main contributors
Gabriele Schweikert
Jonas Behr
Andre Kahles
Funding
Financial interest disclosure
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
2
4. Memorial Sloan-Kettering Cancer Center
Acknowledgements and Disclosures
Main contributors
Gabriele Schweikert
Jonas Behr
Andre Kahles
Funding
Financial interest disclosure
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
2
5. Memorial Sloan-Kettering Cancer Center
Acknowledgements and Disclosures
Main contributors
Gabriele Schweikert
Jonas Behr
Andre Kahles
Funding
Financial interest disclosure
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
2
6. Memorial Sloan-Kettering Cancer Center
Genome Annotation Pipeline(s)
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
3
7. Memorial Sloan-Kettering Cancer Center
Genome Annotation Pipeline(s)
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
3
8. Memorial Sloan-Kettering Cancer Center
Genome Annotation Pipeline(s)
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
3
9. Memorial Sloan-Kettering Cancer Center
Genome Annotation Pipeline(s)
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
3
10. Memorial Sloan-Kettering Cancer Center
Genome Annotation Pipeline(s)
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
3
11. Proposed new gene finding method (mGene.ngs) for reannotation of
19 A. thaliana genomes (and genome assembly + analysis).
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
4
12. Memorial Sloan-Kettering Cancer Center
mGene.ngs Overview
Goal: Predict annotation based on RNA-seq and genomic
sequence information
Learn function f (y |x) that scores gene models y based on
different sources of information x
Train parameters such that
f (y |x)
f (y |x) for all y = y
(“large margin”)
Hidden semi-Markov Support Vector Machines (HsM-SVMs)
[Altun et al., 2003, R¨tsch and Sonnenburg, 2007]
a
Automatically adapts to quality of RNA-seq data/alignments
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
5
13. Memorial Sloan-Kettering Cancer Center
mGene.ngs Overview
Goal: Predict annotation based on RNA-seq and genomic
sequence information
Learn function f (y |x) that scores gene models y based on
different sources of information x
Train parameters such that
f (y |x)
f (y |x) for all y = y
(“large margin”)
Hidden semi-Markov Support Vector Machines (HsM-SVMs)
[Altun et al., 2003, R¨tsch and Sonnenburg, 2007]
a
Automatically adapts to quality of RNA-seq data/alignments
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
5
14. Memorial Sloan-Kettering Cancer Center
mGene.ngs Overview
Goal: Predict annotation based on RNA-seq and genomic
sequence information
Learn function f (y |x) that scores gene models y based on
different sources of information x
Train parameters such that
f (y |x)
f (y |x) for all y = y
(“large margin”)
Hidden semi-Markov Support Vector Machines (HsM-SVMs)
[Altun et al., 2003, R¨tsch and Sonnenburg, 2007]
a
Automatically adapts to quality of RNA-seq data/alignments
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
5
15. Memorial Sloan-Kettering Cancer Center
Training of mGene
genomic position
True gene model
2
3
4
5
STEP 1: SVM Signal Predictions
tss
tis
acc
don
Score f(y|x)
stop
genomic position
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
6
16. Memorial Sloan-Kettering Cancer Center
Training of mGene
genomic position
True gene model
2
3
4
5
STEP 1: SVM Signal Predictions
tss
tis
acc
don
Score f(y|x)
stop
genomic position
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
6
17. Memorial Sloan-Kettering Cancer Center
Training of mGene
genomic position
True gene model
2
3
4
5
Wrong gene model
STEP 1: SVM Signal Predictions
tss
tis
acc
don
Score f(y|x)
stop
large margin
genomic position
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
6
18. Memorial Sloan-Kettering Cancer Center
Training of mGene.ngs
genomic position
True gene model
2
3
4
5
Wrong gene model
STEP 1: SVM Signal Predictions
tss
tis
acc
don
stop
Coverage
RNA-seq
Score f(y|x)
intron support
from spliced reads
large margin
genomic position
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
6
19. Memorial Sloan-Kettering Cancer Center
Training of mGene.ngs
genomic position
True gene model
2
3
4
5
Wrong gene model
STEP 1: SVM Signal Predictions
tss
tis
acc
don
stop
Coverage
RNA-seq
intron support
from spliced reads
Score f(y|x)
larger margin
genomic position
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
6
20. Memorial Sloan-Kettering Cancer Center
Results for C. elegans
RNA-seq:
paired-end, strand-specific RNA ligation based protocol
76bp reads, 50 million reads
Alignment with Palmapper
Evaluation:
Transcript-level F-score of coding transcripts
. . . for different expression levels
Compare mGene (ab initio), mGene.ngs, cufflinks
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
7
21. Memorial Sloan-Kettering Cancer Center
Results for C. elegans
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
8
22. Memorial Sloan-Kettering Cancer Center
Digestion
Observations:
RNA-seq helps to improve performance
Genomic signals help much (see cufflinks)
Problems:
Need existing annotation for training
Cannot predict non-coding transcripts
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
9
23. Memorial Sloan-Kettering Cancer Center
Skimming and Non-coding Transcripts
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
10
24. Memorial Sloan-Kettering Cancer Center
Skimming and Non-coding Transcripts
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
10
25. Memorial Sloan-Kettering Cancer Center
Learning Strategy
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
11
26. Memorial Sloan-Kettering Cancer Center
Learning Strategy
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
11
27. Memorial Sloan-Kettering Cancer Center
Learning Strategy
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
11
28. Memorial Sloan-Kettering Cancer Center
Results for C. elegans
0.7
0.6
F−score
0.5
mGene − ab initio w/ annotation
mGene.ngs − w/ annotation
cufflinks − Trapnell et al. 2010
mGene.ngs − w/o annotation
0.4
0.3
0.2
0.1
0
0
10
c Gunnar R¨tsch (cBio@MSKCC)
a
20
30
40
50
60
expression percentile
70
RNA-Seq-based Annotation using mGene.ngs and MiTie
80
90
100
PAG XXII Gene Discovery Workshop
12
29. Memorial Sloan-Kettering Cancer Center
Results for C. elegans
0.7
0.6
F−score
0.5
mGene − ab initio w/ annotation
mGene.ngs − w/ annotation
cufflinks − Trapnell et al. 2010
mGene.ngs − w/o annotation
mGene.nc − w/o annotation
0.4
0.3
0.2
0.1
0
0
10
c Gunnar R¨tsch (cBio@MSKCC)
a
20
30
40
50
60
expression percentile
70
RNA-Seq-based Annotation using mGene.ngs and MiTie
80
90
100
PAG XXII Gene Discovery Workshop
12
30. Memorial Sloan-Kettering Cancer Center
Results for C. elegans
0.7
0.6
F−score
0.5
mGene − ab initio w/ annotation
mGene.ngs − w/ annotation
cufflinks − Trapnell et al. 2010
mGene.ngs − w/o annotation
mGene.nc − w/o annotation
0.4
0.3
0.2
De novo prediction works!
Modeling noncoding
transcripts improves coding
transcript prediction.
0.1
0
0
10
c Gunnar R¨tsch (cBio@MSKCC)
a
20
30
40
50
60
expression percentile
70
RNA-Seq-based Annotation using mGene.ngs and MiTie
80
90
100
PAG XXII Gene Discovery Workshop
12
31. Memorial Sloan-Kettering Cancer Center
Gene Finding vs. Transcript Assembly
Gene expression level
low
high
Genefinding + RNA-seq
=> only one transcript
RNA transcript assembly
=>multiple transcripts
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
13
32. BIOINFORMATICS
ORIGINAL PAPER
Genome analysis
Vol. 29 no. 20 2013, pages 2529–2538
doi:10.1093/bioinformatics/btt442
Advance Access publication August 25, 2013
MITIE: Simultaneous RNA-Seq-based transcript identification and
quantification in multiple samples
´
Jonas Behr1,2,*,y, Andre Kahles1, Yi Zhong1, Vipin T. Sreedharan1, Philipp Drewe1 and
¨
Gunnar Ratsch1,*
1
Computational Biology Center, Sloan-Kettering Institute, 1275 York Avenue, New York, NY 10065, USA and 2Friedrich
Miescher Laboratory, Max Planck Society, Spemannstr. 39, 72076 Tubingen, Germany
¨
Associate Editor: Ivo Hofacker
ABSTRACT
c Gunnar R¨tsch (cBio@MSKCC)
a
Motivation: High-throughput sequencing of mRNA (RNA-Seq) has led
to tremendous improvements in the detection of expressed genes and
reconstruction of RNA transcripts. However, the extensive dynamic
range of gene expression, technical limitations and biases, as well
as the observed complexity of the transcriptional landscape, pose
profound computational challenges for transcriptome reconstruction.
Results: We present the novel framework MITIE (Mixed Integer
Transcript IdEntification) for simultaneous transcript reconstruction
and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few
transcripts collectively explaining the observed read data and show
how to find the optimal solution using Mixed Integer Programming.
MITIE can (i) take advantage of known transcripts, (ii) reconstruct
and quantify transcripts simultaneously in multiple samples, and
(iii) resolve the location of multi-mapping reads. It is designed for
genome- and assembly-based transcriptome reconstruction. We
present an extensive study based on realistic simulated RNA-Seq
data. When compared with state-of-the-art approaches, MITIE
proves to be significantly more sensitive and overall more accurate.
Moreover, MITIE yields substantial performance gains when used with
multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of
reconstructing omitted transcript annotations and the specificity with
RNA-Seq-based Annotation using corroborate that aand
respect to annotated transcripts. Our results mGene.ngs well-
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on Decem
genic locus by means of alternative splicing, transcription start
and termination (e.g. Nilsen and Graveley, 2010; Ratsch et al.,
¨
2007; Schweikert et al., 2009). A comprehensive catalog of all
transcripts encoded by a genomic locus is essential for downstream analyses that aim at a more detailed understanding of
gene expression and RNA processing regulation.
RNA-Seq is a method for parallel sequencing of a large number of RNA molecules based on high-throughput sequencing
technologies (ENCODE Project Consortium et al., 2012;
Mortazavi et al., 2008; Wang et al., 2009). Currently available
sequencing platforms typically provide several 10–100 millions of
sequence fragments (reads) with a typical length of 50–150 bases.
By mapping these reads back to the genome, one can determine
where gene products are encoded in the genome (e.g. Denoeud
et al., 2008; Guttman et al., 2010; Trapnell et al., 2010; Xia et al.,
2011) and collect evidence of RNA processing such as splicing
(Bradley et al., 2012; Sonnenburg et al., 2007) or RNA-editing
(Bahn et al., 2012).
In many cases, the RNA-Seq reads are first aligned to a reference genome using an alignment tool that identifies possible
read origins within the genome. Contiguous regions covered with
read alignments (possibly with small gaps) are candidates for
exonic segments. Alignment tools for RNA-Seq reads, such as
PALMapper PAG XXIIal., 2008; Discovery Workshop
(De Bona et Gene Jean et al., 2010), TopHat
MiTie
Transcript prediction via combinatorial optimization that combines
evidence from multiple experiments & achieves higher accuracy.
14
33. Memorial Sloan-Kettering Cancer Center
Transcript Reconstruction with RNA-seq
Reads
Genome Based Assembly
(Cufflinks, Scripture)
Read alignments
Denovo Assembly
(Trinity, Oases)
Genomic DNA
Data
processing
Segment graph
Optimization
108 possible transcripts, 1028 possible subsets of transcripts
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
15
34. Memorial Sloan-Kettering Cancer Center
Transcript Reconstruction with RNA-seq
Reads
Genome Based Assembly
(Cufflinks, Scripture)
Read alignments
Denovo Assembly
(Trinity, Oases)
Genomic DNA
Data
processing
Segment graph
Optimization
108 possible transcripts, 1028 possible subsets of transcripts
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
15
35. Memorial Sloan-Kettering Cancer Center
Enumerate and Quantify all Transcripts
Segment Graph
Potential Transcripts
[Behr et al., 2013]
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
16
36. Memorial Sloan-Kettering Cancer Center
Enumerate and Quantify all Transcripts
Segment Graph
Potential Transcripts
1
1
1
1
1
1
1
1
0
0
1
0
1
0
1
0
1
1
1
1
1
1
1
1
0
1
0
0
1
1
0
0
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
[Behr et al., 2013]
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
16
37. Memorial Sloan-Kettering Cancer Center
Enumerate and Quantify all Transcripts
Segment Graph
Abundance
Potential Transcripts
1
1
1
1
1
1
1
1
0
0
1
0
1
0
1
0
1
1
1
1
1
1
1
1
0
1
0
0
1
1
0
0
Sample1 Sample2
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0.0
0.2
0.0
0.8
0.0
0.0
0.0
0.0
Expected coverage
0.0
0.0
0.1
0.9
0.0
0.0
0.0
0.0
R. Bohnert and G. R¨tsch, NAR (2010)
a
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
[Behr et al., 2013]
PAG XXII Gene Discovery Workshop
16
38. Memorial Sloan-Kettering Cancer Center
Enumerate and Quantify all Transcripts
Segment Graph
Abundance
Potential Transcripts
1
1
1
1
1
1
1
1
0
0
1
0
1
0
1
0
1
1
1
1
1
1
1
1
0
1
0
0
1
1
0
0
Sample1 Sample2
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
min L( U T × W ,
W
0.0
0.2
0.0
0.8
0.0
0.0
0.0
0.0
Expected coverage
0.0
0.0
0.1
0.9
0.0
0.0
0.0
0.0
C
)+γ× W
1
expected coverage observed coverage
R. Bohnert and G. R¨tsch, NAR (2010)
a
[Behr et al., 2013]
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
16
39. Memorial Sloan-Kettering Cancer Center
Simultaneous Identification & Quantification
Segment Graph
Abundance
Transcripts Matrix
...
1 1 0
k
1
1
0
0
1
0
1
1
1
0
Sample1 Sample2
0
1
0
0
1
1
1
0
1
1
1
0
1
1
1
0
0.8
0.2
0.0
0.0
Expected coverage
0.9
0.0
0.1
0.0
[Behr et al., 2013]
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
16
40. Memorial Sloan-Kettering Cancer Center
Simultaneous Identification & Quantification
Segment Graph
Abundance
Transcripts Matrix
...
1 1 0
N
1
1
0
0
1
0
1
1
1
0
0
1
0
0
U
1
1
1
0
1
1
1
0
1
1
1
0
min L( U T × W ,
U,W
Expected coverage
Sample1 Sample2
0.8
0.2
0.0
0.0
0.9
0.0
0.1
0.0
W
)+γ×N
C
expected coverage observed coverage
[Behr et al., 2013]
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
16
41. Memorial Sloan-Kettering Cancer Center
Simultaneous Identification & Quantification
Segment Graph
Abundance
Transcripts Matrix
...
1 1 0
N
1
1
0
0
1
0
1
1
1
0
Sample1 Sample2
0
1
0
0
U
1
1
1
0
1
1
1
0
1
1
1
0
0.8
0.2
0.0
0.0
Expected coverage
0.9
0.0
0.1
0.0
W
min L(U T × W , C ) + γ × N
U,W
[Behr et al., 2013]
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
16
42. Memorial Sloan-Kettering Cancer Center
Simultaneous Identification & Quantification
Segment Graph
Abundance
Transcripts Matrix
...
1 1 0
N
1
1
0
0
1
0
1
1
1
0
Sample1 Sample2
0
1
0
0
U
1
1
1
0
1
1
1
0
1
1
1
0
0.8
0.2
0.0
0.0
Expected coverage
0.9
0.0
0.1
0.0
W
min L(U T × W , C ) + γ × N
U,W
s.t.
c Gunnar R¨tsch (cBio@MSKCC)
a
U is valid
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
16
43. Memorial Sloan-Kettering Cancer Center
Simultaneous Identification & Quantification
Segment Graph
Abundance
Transcripts Matrix
...
1 1 0
N
1
1
0
0
1
0
1
1
1
0
Sample1 Sample2
0
1
0
0
U
1
1
1
0
1
1
1
0
1
1
1
0
0.8
0.2
0.0
0.0
Expected coverage
0.9
0.0
0.1
0.0
W
min L(U T × W , C ) + γ × N
U,W
'$
s.t.
c Gunnar R¨tsch (cBio@MSKCC)
a
U is valid
&%
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
16
44. Memorial Sloan-Kettering Cancer Center
Simultaneous Identification & Quantification
Segment Graph
Abundance
Transcripts Matrix
...
1 1 0
N
1
1
0
0
1
0
1
1
1
0
Sample1 Sample2
0
1
0
0
U
1
1
1
0
1
1
1
0
1
1
1
0
0.8
0.2
0.0
0.0
Expected coverage
0.9
0.0
0.1
0.0
W
min L(U T × W , C ) + γ × N
U,W
s.t.
c Gunnar R¨tsch (cBio@MSKCC)
a
U is valid
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
16
45. Memorial Sloan-Kettering Cancer Center
Simultaneous Identification Quantification
Segment Graph
Abundance
Transcripts Matrix
...
1 1 0
N
1
1
0
0
1
0
1
1
1
0
Sample1 Sample2
0
1
0
0
U
1
1
1
0
1
1
1
0
1
1
1
0
0.8
0.2
0.0
0.0
Expected coverage
0.9
0.0
0.1
0.0
W
min × W , C ) + γ × N
L(U T
U,W
s.t.
c Gunnar R¨tsch (cBio@MSKCC)
a
U is valid
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
16
46. Memorial Sloan-Kettering Cancer Center
MiTie’s Main Features
Uses a likelihood function L based on a probabilistic model for
the read coverage.
Uses combinatorial optimization to find transcripts that explain
data from multiple RNA-seq libraries
Newly predicted transcripts are penalized (once).
Can use already known/confirmed transcripts without penalty.
Provides a p-value for each transcript providing a confidence
measure for presence of predicted transcript.
Log-likelihood ratio test:
Tt = −2 log
p(D|M)
p(D|Mt )
[Behr et al., 2013]
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
17
47. Memorial Sloan-Kettering Cancer Center
MiTie Results
F−score on Transcript Level
A
F−score on Transcript Level
B
Human Simulated Data
0.45
MITIE + MMO
MITIE
Cufflinks + Cuffmerge
Cufflinks
0.40
0.35
1
0.37
2
3
4
D. melanogaster modENCODE Data
5
0.35
0.33
0.31
0.29
MITIE
Cufflinks + Cuffmerge
1
2
3
4
5
Number of Samples
6
7
[Behr et al., 2013]
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
18
48. Memorial Sloan-Kettering Cancer Center
Gene Finding vs. Transcript Assembly
Gene expression level
low
high
mGene.ngs
= only one transcript
MiTie
=multiple transcripts
low confidence
high confidence
for alternative transcripts
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
19
49. Memorial Sloan-Kettering Cancer Center
Conclusions
Genome annotation pipeline
Transcript Skimmer identifies highly expressed genes for training
mGene.ngs predicts coding and non-coding transcripts
MiTie predicts alternative transcripts for highly expressed genes
Genome annotation pipeline requires only
Genome sequence
RNA-seq alignments
Good for annotating new genomes or improving existing ones
Sources are free http://bioweb.me/mgene
http://bioweb.me/mitie
Functionality partially available in Galaxy instance
(http://galaxy.cbio.mskcc.org)
Thank you!
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
20
50. Memorial Sloan-Kettering Cancer Center
Conclusions
Genome annotation pipeline
Transcript Skimmer identifies highly expressed genes for training
mGene.ngs predicts coding and non-coding transcripts
MiTie predicts alternative transcripts for highly expressed genes
Genome annotation pipeline requires only
Genome sequence
RNA-seq alignments
Good for annotating new genomes or improving existing ones
Sources are free http://bioweb.me/mgene
http://bioweb.me/mitie
Functionality partially available in Galaxy instance
(http://galaxy.cbio.mskcc.org)
Thank you!
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
20
51. Memorial Sloan-Kettering Cancer Center
Conclusions
Genome annotation pipeline
Transcript Skimmer identifies highly expressed genes for training
mGene.ngs predicts coding and non-coding transcripts
MiTie predicts alternative transcripts for highly expressed genes
Genome annotation pipeline requires only
Genome sequence
RNA-seq alignments
Good for annotating new genomes or improving existing ones
Sources are free http://bioweb.me/mgene
http://bioweb.me/mitie
Functionality partially available in Galaxy instance
(http://galaxy.cbio.mskcc.org)
Thank you!
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
20
52. Memorial Sloan-Kettering Cancer Center
Conclusions
Genome annotation pipeline
Transcript Skimmer identifies highly expressed genes for training
mGene.ngs predicts coding and non-coding transcripts
MiTie predicts alternative transcripts for highly expressed genes
Genome annotation pipeline requires only
Genome sequence
RNA-seq alignments
Good for annotating new genomes or improving existing ones
Sources are free http://bioweb.me/mgene
http://bioweb.me/mitie
Functionality partially available in Galaxy instance
(http://galaxy.cbio.mskcc.org)
Thank you!
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
20
53. Memorial Sloan-Kettering Cancer Center
Conclusions
Genome annotation pipeline
Transcript Skimmer identifies highly expressed genes for training
mGene.ngs predicts coding and non-coding transcripts
MiTie predicts alternative transcripts for highly expressed genes
Genome annotation pipeline requires only
Genome sequence
RNA-seq alignments
Good for annotating new genomes or improving existing ones
Sources are free http://bioweb.me/mgene
http://bioweb.me/mitie
Functionality partially available in Galaxy instance
(http://galaxy.cbio.mskcc.org)
Thank you!
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
20
54. Memorial Sloan-Kettering Cancer Center
Conclusions
Genome annotation pipeline
Transcript Skimmer identifies highly expressed genes for training
mGene.ngs predicts coding and non-coding transcripts
MiTie predicts alternative transcripts for highly expressed genes
Genome annotation pipeline requires only
Genome sequence
RNA-seq alignments
Good for annotating new genomes or improving existing ones
Sources are free http://bioweb.me/mgene
http://bioweb.me/mitie
Functionality partially available in Galaxy instance
(http://galaxy.cbio.mskcc.org)
Thank you!
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
20
56. References I
Y. Altun, I. Tsochantaridis, and T. Hofmann. Hidden Markov Support Vector Machines. In Proc. 20th Int. Conf. Mach. Learn.,
pages 3–10, 2003.
J. Behr, G. Schweikert, J. Cao, F. De Bona, G. Zeller, S. Laubinger, S. Ossowski, K. Schneeberger, D. Weigel, and G. R¨tsch.
a
Rna-seq and tiling arrays for improved gene finding. Oral presentation at the CSHL Genome Informatics Meeting,
September 2008. URL http://www.fml.tuebingen.mpg.de/raetsch/lectures/RaetschGenomeInformatics08.pdf.
Jonas Behr, Andr´ Kahles, Yi Zhong, Vipin T Sreedharan, Philipp Drewe, and Gunnar R¨tsch. Mitie: Simultaneous
e
a
rna-seq-based transcript identification and quantification in multiple samples. Bioinformatics, 29(20):2529–38, Oct 2013.
doi: 10.1093/bioinformatics/btt442.
RM Clark, G Schweikert, C Toomajian, S Ossowski, G Zeller, P Shinn, N Warthmann, TT Hu, G Fu, DA Hinds, H Chen,
KA Frazer, DH Huson, B Sch¨lkopf, M Nordborg, G R¨tsch, JR Ecker, and D Weigel. Common sequence polymorphisms
o
a
shaping genetic diversity in arabidopsis thaliana. Science, 317(5836):338–342, 2007. ISSN 1095-9203 (Electronic). doi:
10.1126/science.1138632.
G. R¨tsch and S. Sonnenburg. Accurate splice site detection for Caenorhabditis elegans. In K. Tsuda B. Schoelkopf and J.-P.
a
Vert, editors, Kernel Methods in Computational Biology. MIT Press, 2004.
G R¨tsch and S Sonnenburg. Large scale hidden semi-markov svms. In B. Sch¨lkopf, J. Platt, and T. Hoffman, editors,
a
o
Advances in Neural Information Processing Systems (NIPS’06), volume 19, pages 1161–1168, Cambridge, MA, 2007. MIT
Press. URL http://www.fml.tuebingen.mpg.de/raetsch/projects/HSMSVM.
G. R¨tsch, S. Sonnenburg, and B. Sch¨lkopf. RASE: recognition of alternatively spliced exons in C. elegans. Bioinformatics, 21
a
o
(Suppl. 1):i369–i377, June 2005.
Gabriele Schweikert, Alexander Zien, Georg Zeller, Jonas Behr, Christoph Dieterich, Cheng Soon Ong, Petra Philips, Fabio
De Bona, Lisa Hartmann, Anja Bohlen, Nina Kr¨ger, S¨ren Sonnenburg, and Gunnar R¨tsch. mgene: Accurate svm-based
u
o
a
gene finding with an application to nematode genomes. Genome Research, 2009. URL
http://genome.cshlp.org/content/early/2009/06/29/gr.090597.108.full.pdf+html. Advance access June 29, 2009.
S. Sonnenburg, G. R¨tsch, A. Jagota, and K.-R. M¨ller. New methods for splice-site recognition. In Proc. International
a
u
Conference on Artificial Neural Networks, 2002.
S¨ren Sonnenburg, Alexander Zien, and Gunnar R¨tsch. ARTS: Accurate Recognition of Transcription Starts in Human.
o
a
Bioinformatics, 22(14):e472–480, 2006.
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
22
57. References II
VT Sreedharan, SJ Schultheiss, G Jean, A Kahles, R Bohnert, P Drewe, P Mudrakarta, N G¨rnitz, G Zeller, and Gunnar
o
R¨tsch. Oqtans: The rna-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis.
a
Bioinformatics, 2014. Bioinformatics Advance Access published January 11, 2014.
G Zeller, RM Clark, K Schneeberger, A Bohlen, D Weigel, and G Ratsch. Detecting polymorphic regions in arabidopsis thaliana
with resequencing microarrays. Genome Res, 18(6):918–929, 2008. ISSN 1088-9051 (Print). doi:
10.1101/gr.070169.107.
A. Zien, G. R¨tsch, S. Mika, B. Sch¨lkopf, T. Lengauer, and K.-R. M¨ller. Engineering Support Vector Machine Kernels That
a
o
u
Recognize Translation Initiation Sites. BioInformatics, 16(9):799–807, September 2000.
c Gunnar R¨tsch (cBio@MSKCC)
a
RNA-Seq-based Annotation using mGene.ngs and MiTie
PAG XXII Gene Discovery Workshop
23