This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.
Automated sequencing of genomes require automated gene assignment
Includes detection of open reading frames (ORFs)
Identification of the introns and exons
Gene prediction a very difficult problem in pattern recognition
Coding regions generally do not have conserved sequences
Much progress made with prokaryotic gene prediction
Eukaryotic genes more difficult to predict correctly
Automated sequencing of genomes require automated gene assignment
Includes detection of open reading frames (ORFs)
Identification of the introns and exons
Gene prediction a very difficult problem in pattern recognition
Coding regions generally do not have conserved sequences
Much progress made with prokaryotic gene prediction
Eukaryotic genes more difficult to predict correctly
A knockout mouse is a mouse in which a specific gene has been inactivated or“knocked out” by replacing it or disrupting it with an artificial piece of DNA.
The loss of gene activity often causes changes in a mouse's phenotype and thus provides valuable information on the function of the gene.
A physical map of a chromosome or a genome that shows the physical locations of genes and other DNA sequences of interest. Physical maps are used to help scientists identify and isolate genes by positional cloning.
According to the ICSM (Intergovernmental Committee on Surveying and Mapping), there are five different types of maps: General Reference, Topographical, Thematic, Navigation Charts and Cadastral Maps and Plans.
description of functional genomics and structural genomics and the techniques involved in it and also decribing the models of forward genetics and techniques involved in it and reverse genetics and techniques involved in it
Genomic library and shotgun sequencing. It includes the topics about genomic library,construction method, its uses and applications, shotgun sequencing, difference between random and whole genome sequencing, its advantages and disadvantages etc.
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...Prasenjit Mitra
This set of slides gives an overview regarding the various omics technologies available and how they can be used for improvement in clinical setting or research
Genome annotation, NGS sequence data, decoding sequence information, The genome contains all the biological information required to build and maintain any given living organism.
Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.
A knockout mouse is a mouse in which a specific gene has been inactivated or“knocked out” by replacing it or disrupting it with an artificial piece of DNA.
The loss of gene activity often causes changes in a mouse's phenotype and thus provides valuable information on the function of the gene.
A physical map of a chromosome or a genome that shows the physical locations of genes and other DNA sequences of interest. Physical maps are used to help scientists identify and isolate genes by positional cloning.
According to the ICSM (Intergovernmental Committee on Surveying and Mapping), there are five different types of maps: General Reference, Topographical, Thematic, Navigation Charts and Cadastral Maps and Plans.
description of functional genomics and structural genomics and the techniques involved in it and also decribing the models of forward genetics and techniques involved in it and reverse genetics and techniques involved in it
Genomic library and shotgun sequencing. It includes the topics about genomic library,construction method, its uses and applications, shotgun sequencing, difference between random and whole genome sequencing, its advantages and disadvantages etc.
Genomics, Transcriptomics, Proteomics, Metabolomics - Basic concepts for clin...Prasenjit Mitra
This set of slides gives an overview regarding the various omics technologies available and how they can be used for improvement in clinical setting or research
Genome annotation, NGS sequence data, decoding sequence information, The genome contains all the biological information required to build and maintain any given living organism.
Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.
MAGIC :Multiparent advanced generation intercross and QTL discovery Senthil Natesan
MAGIC or multiparent advanced generation inter-crosses is an experimental method that increases the precision with which genetic markers are linked to quantitative trait loci (QTL). This method was first introduced by (Mott et al., 2000) in animals as an extension of the advanced intercrossing (AIC) approach suggested by (Darvasi and Soller , 1995)for fine mapping multiple QTLs for multiple traits. Advanced Intercrossed Lines (AILs) are generated by randomly and sequentially intercrossing a population initially originating from a cross between two inbred lines.
MAGIC involves multiple parents, called founder lines, rather than bi-parental control. AILs increase the recombination events in small chromosomal regions for the purpose of fine mapping. These lines are then cycled through multiple generations of outcrossing. Each generation of random mating reduces the extent of linkage disequilibrium (LD), allowing the QTL to be mapped more accurately.
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
Talk presented at the Simons Foundation Biotech Symposium "Complex Data Visualization: Approach and Application" (12 September 2014)
http://www.simonsfoundation.org/event/complex-data-visualization-approach-and-application/
In this talk I describe how we integrated a sophisticated computational framework directly into the StratomeX visualization technique to enable rapid exploration of tens of thousands of stratifications in cancer genomics data, creating a unique and powerful tool for the identification and characterization of tumor subtypes. The tool can handle a wide range of genomic and clinical data types for cohorts with hundreds of patients. StratomeX also provides direct access to comprehensive data sets generated by The Cancer Genome Atlas Firehose analysis pipeline.
http://stratomex.caleydo.org
Variant (SNPs/Indels) calling in DNA sequences, Part 2Denis C. Bauer
Abstract: This session will focus on the steps involved in identifying genomic variants after an initial mapping was achieved: improvement the mapping, SNP and indel calling and variant filtering/recalibration will be introduced.
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHMijcsa
There are plenty specific types of data which are needed to compress for easy storage and to reduce overall retrieval times. Moreover, compressed sequence can be used to understand similarities between biological sequences. DNA data compression challenge has become a major task for many researchers for the last few years as a result of exponential increase of produced sequences in gene databases. In this research paper we have attempt to develop an algorithm by self-reference bases; namely Single Base Variable Repeat Length DNA Compression (SBVRLDNAComp). There are a number of reference based compression methods but they are not satisfactory for forthcoming new species. SBVRLDNAComp is an optimal solution of the result obtained from small to long, uniform identical and non-identical string of nucleotides checked in four different ways. Both exact repetitive and non-repetitive bases are compressed by SBVRLDNAComp.The sound part of it is without any reference database BVRLDNAComp achieves 1.70 to 1.73 compression ratio α after testing on ten benchmark DNA sequences. The compressed file can be further compressed with standard tools (such as WinZip or WinRar) but even without this SBVRLDNAComp outperforms many standard DNA compression algorithms.
Lecture focuses on how to measure genetic distances, with emphasis on whole genome data.
Lecture held at the Nordic Working Group for Microbiology and Animal Health and Welfare (NMDD) meeting on Source attribution of Campylobacter in the Nordic countries.
ASHG 2015 - Redundant Annotations in Tertiary AnalysisJames Warren
After obtaining genetic variants from next generation sequencing data, a precursory step in tertiary analysis is to annotate each variant with available relevant information. There is no standardized compendium for this purpose; researchers instead are required to compile data from a motley of annotation tools and public datasets. These sources for annotation are independently maintained, and accordingly there is limited concordance between their reported contents. The choice of annotation datasets thus has a direct and significant impact on the results of the analysis.
Cloud-native machine learning - Transforming bioinformatics research Denis C. Bauer
Cloud computing and artificial intelligence transforms bioinformatics research
Denis Bauer, Transformational Bioinformatics Team
Genomic data is outpacing traditional Big Data disciplines, producing more information than Astronomy, twitter, and YouTube combined. As such, Genomic research has leapfrogged to the forefront of Big Data and Cloud solutions. We developed software platforms using the latest in cloud architecture, artificial intelligence and machine learning to support every aspect genome medicine; from disease gene detection through to validation and personalized medicine.
This talk outlines how we find disease genes for complex genetic diseases, such as ALS, using VariantSpark, which is a custom machine learning implementation capable of dealing with Whole Genome Sequencing data of 80 million common and rare variants. To support disease gene validation, we created GT-Scan, which is an innovative web application, which we think of it as the “search engine for the genome”. It enables researchers to identify the optimal editing spot to create animal models efficiently. The talk concludes by demonstrating how cloud-based software distribution channels (digital Marketplaces) can be harnessed to share bioinformatics tools internationally and make research more reproducible.
Translating genomics into clinical practice - 2018 AWS summit keynoteDenis C. Bauer
CSIRO's part of the co-presented Keynote at the AWS Public Sector Summit in Canberra on genomics health care. Three key messages: 1) We need a shift from treatment towards prevention 2) Once you go serverless you never go back 3) DevOps 2.0: Hypothesis-driven architecture evolution
Going Server-less for Web-Services that need to Crunch Large Volumes of DataDenis C. Bauer
AgileIndia Breakout session on serverless applications. This talk covers how AWS serverless infrastructure can be used for a wide range of applications, such as compute intensive tasks (GT-Scan), tasks requiring continuous learning (CryptoBreeder), data intensive tasks (PhenGen Database).
How novel compute technology transforms life science researchDenis C. Bauer
AgileIndia 2018 Keynote. This talk covers how ‘Datafication’ will make data ‘wider’ (more features describing a data point), which represents a paradigm shift for Machine Learning applications. It also covers serverless architecture, which can cater for even compute-intensive tasks. It concludes by stating that business and life-science research are not that different: so let’s build a community together!
How novel compute technology transforms life science researchDenis C. Bauer
Unprecedented data volumes and pressure on turnaround time driven by commercial applications require bioinformatics solutions to evolve to meed these new demands. New compute paradigms and cloud-based IT solutions enable this transition. Here I present two solution capable of meeting these demands for genomic variant analysis, VariantSpark, as well as genome engineering applications, GT-Scan2.
VariantSpark classifies 3000 individuals with 80 Million genomic variants each in under 30 minutes. This Hadoop/Spark solution for machine learning application on genomic data is hence capable to scale up to population size cohorts.
GT-Scan2, identifies CRISPR target sites by minimizing off-target effects and maximizing on-target efficiency. This optimization is powered by AWS Lambda functions, which offer an “always-on” web service that can instantaneously recruit enough compute resources keep runtime stable even for queries with several thousand of potential target sites.
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
Genomic information is increasingly used in medical practice giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. Here we introduce VariantSpark, which utilizes Hadoop/Spark along with its machine learning library, MLlib, providing the means of parallelisation for population-scale bioinformatics tasks. VariantSpark is the interface to the standard variant format (VCF), offers seamless genome-wide sampling of variants and provides a pipeline for visualising results.
To demonstrate the capabilities of VariantSpark, we clustered more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VariantSpark is 80% faster than the Spark-based genome clustering approach, ADAM, the comparable implementation using Hadoop/Mahout, as well as Admixture, a commonly used tool for determining individual ancestries. It is over 90% faster than traditional implementations using R and Python. These benefits of speed, resource consumption and scalability enables VariantSpark to open up the usage of advanced, efficient machine learning algorithms to genomic data.
The package is written in Scala and available at https://github.com/BauerLab/VariantSpark.
Population-scale high-throughput sequencing data analysisDenis C. Bauer
Unprecedented computational capabilities and high-throughput data collection methods promise a new era of personalised, evidence-based healthcare, utilising individual genomic profiles to tailor health management as demonstrated by recent successes in rare genetic disorders or stratified cancer treatments. However, processing genomic information at a scale relevant for the health-system remains challenging due to high demands on data reproducibility and data provenance. Furthermore, the necessary computational requirements requires a large investment associated with compute hardware and IT personnel, which is a barrier to entry for small laboratories and difficult to maintain at peak times for larger institutes. This hampers the creation of time-reliable production informatics environments for clinical genomics. Commercial cloud computing frameworks, like Amazon Web Services (AWS) provide an economical alternative to in-house compute clusters as they allow outsourcing of computation to third-party providers, while retaining the software and compute flexibility.
To cater for this resource-hungry, fast pace yet sensitive environment of personalized medicine, we developed NGSANE, a Linux-based, HPC-enabled framework that minimises overhead for set up and processing of new projects yet maintains full flexibility of custom scripting and data provenance when processing raw sequencing data either on a local cluster or Amazon’s Elastic Compute Cloud (EC2).
The primary goal of my trip to Seattle was to establish a collaboration with a world-leading group on data integration. But by having chosen Seattle, a hub for technology companies, I also learned about synergies between business and research: Ilya Shmulevich from the Institute for Systems Biology makes use of Amazon's ''Random Forest" implementation and Google's 600.000 CPU cluster for cancer genomic association discovery. I also met with experts from University of Washington and Microsoft research to learn about technological advancements to tackle BigData and commoditizing parallelization. Finally, I observed a government funded research agency invest in solutions geared towards their enterprise structure rather than adopt solutions designed for research institutes without active computational community. In conclusion: CSIRO has unique properties and skill-sets that many collaborators would be interested in benefiting from, in return such collaborations would propel CSIRO instantly to the forefront of technology, which in particular for the analysis of big, unstructured datasets could be very rewarding.
Allelic Imbalance for Pre-capture Whole Exome SequencingDenis C. Bauer
Exome sequencing has emerged as an economical way of focusing DNA sequencing efforts on the most functionally understood regions of the genome. Pre-capture pooling, where one bait library is used to pull down the exonic regions of several pooled samples simultaneously is a further financial improvement.
However, rare alleles in the pool might not be able to attract baits at the same rate as reference conform sequences can, and may hence be underrepresented. We investigated this potential issue by sequencing a hapmap family (4 individuals) using the pre-capture protocol from Illumina and Nimblegen. We did not observe clear evidence that heterozygote variants are missed but noted a trend for indels to be imbalanced.
Our findings do not provide clear evidence to rule out allelic imbalance or bias having an impact on research findings, this may be especially critical for low cellular cancer tissue where rare alleles are more ubiquitous.
The first steps of analysing sequencing data (2GS,NGS) has entered a transitional period where on one hand most analysis steps can be automated and standardized (pipeline), while on the other constantly evolving protocols and software updates makes maintaining these analysis pipelines labour intensive.
I propose a centralized system within CSIRO that is flexible to cater for different analyses while also being generic to efficiently disseminate labour intensive maintenance and extension amongst the user community.
Qbi Centre for Brain genomics (Informatics side)Denis C. Bauer
An overview of QBI’s production informatics framework with an emphasis on what service will be provided and how the resulting data is made available: from interactive quality control to integration with external data on the genome browser.
This seminar aims at answering the question of what to make of the identified variants, specifically how to evaluate the quality, prioritize and functionally annotate the variants.
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
Abstract: This session will focus on the first steps involved in identifying SNPs from whole genome, exome capture or targeted resequencing data: The different read mapping approaches to a DNA reference sequence will be introduced and quality metrics discussed.
Introduction to second generation sequencingDenis C. Bauer
An introduction to second generation sequencing will be given with focus on the basic production informatics: The approach of raw data conversion and quality control will be discussed.
An introduction to bioinformatics practices and aims will be given and contrasted against approaches from other fields. Most importantly, it will be discussed how bioinformatics fits into the discovery cycle for hypothesis driven neuroscience research.
Critical Run files can be missing/corrupt after the Run folder was transferred from the HiSeq storage to the cluster storage. This presentation discusses the issue and suggests four workarounds.
Deciphering the regulatory code in the genomeDenis C. Bauer
There are messages hidden within our genome, regulating when and how long a gene is switched on. The presentation describes a method, STREAM, targeted at deciphering this regulatory code.
This was our presentation for our imaginary product for the commercialization workshop. Note, all "research results" and illustrations are totally made up and and therefore not necessarily reflecting reality (== biological processes). This presentation was created as part of the learning experience of how to pitch biological research to venture capitalists.
The presentation was given at the CIBCB, 2005, in San Diego about our approach to predict recombination sites in protein sequence. Recombination is the method of choice for designing new proteins with desired new or enhanced properties.
The publication is :
Bauer, D.C., Bodén, M., Thier, R. and Gillam, E. M. “STAR: Predicting recombination sites from amino acid sequence.” BMC Bioinformatics, 2006 Oct 8; 7:437. PMID: 17026775
Care Instructions for Activewear & Swim Suits.pdfsundazesurf80
SunDaze Surf offers top swimwear tips: choose high-quality, UV-protective fabrics to shield your skin. Opt for secure fits that withstand waves and active movement. Bright colors enhance visibility, while adjustable straps ensure comfort. Prioritize styles with good support, like racerbacks or underwire tops, for active beach days. Always rinse swimwear after use to maintain fabric integrity.
The Fascinating World of Bats: Unveiling the Secrets of the Nightthomasard1122
The Fascinating World of Bats: Unveiling the Secrets of the Night
Bats, the mysterious creatures of the night, have long been a source of fascination and fear for humans. With their eerie squeaks and fluttering wings, they have captured our imagination and sparked our curiosity. Yet, beyond the myths and legends, bats are fascinating creatures that play a vital role in our ecosystem.
There are over 1,300 species of bats, ranging from the tiny Kitti's hog-nosed bat to the majestic flying foxes. These winged mammals are found in almost every corner of the globe, from the scorching deserts to the lush rainforests. Their diversity is a testament to their adaptability and resilience.
Bats are insectivores, feeding on a vast array of insects, from mosquitoes to beetles. A single bat can consume up to 1,200 insects in an hour, making them a crucial part of our pest control system. By preying on insects that damage crops, bats save the agricultural industry billions of dollars each year.
But bats are not just useful; they are also fascinating creatures. Their ability to fly in complete darkness, using echolocation to navigate and hunt, is a remarkable feat of evolution. They are also social animals, living in colonies and communicating with each other through a complex system of calls and body language.
Despite their importance, bats face numerous threats, from habitat destruction to climate change. Many species are endangered, and conservation efforts are necessary to protect these magnificent creatures.
In conclusion, bats are more than just creatures of the night; they are a vital part of our ecosystem, playing a crucial role in maintaining the balance of nature. By learning more about these fascinating animals, we can appreciate their importance and work to protect them for generations to come. So, let us embrace the beauty and mystery of bats, and celebrate their unique place in our world.
Johnny Depp Long Hair: A Signature Look Through the Yearsgreendigital
Johnny Depp, synonymous with eclectic roles and unparalleled acting prowess. has also been a significant figure in fashion and style. Johnny Depp long hair is a distinctive trademark among the various elements that define his unique persona. This article delves into the evolution, impact. and cultural significance of Johnny Depp long hair. exploring how it has contributed to his iconic status.
Follow us on: Pinterest
Introduction
Johnny Depp is an actor known for his chameleon-like ability to transform into a wide range of characters. from the eccentric Captain Jack Sparrow in "Pirates of the Caribbean" to the introspective Edward Scissorhands. His long hair is one constant throughout his evolving roles and public appearances. Johnny Depp long hair is not a style choice but a significant aspect of his identity. contributing to his allure and mystique. This article explores the journey and significance of Johnny Depp long hair. highlighting how it has become integral to his brand.
The Early Years: A Budding Star with Signature Locks
1980s: The Rise of a Young Heartthrob
Johnny Depp's journey in Hollywood began in the 1980s. with his breakout role in the television series "21 Jump Street." During this time, his hair was short, but it was already clear that Depp had a penchant for unique and edgy styles. By the decade's end, Depp started experimenting with longer hair. setting the stage for a lifelong signature.
1990s: From Heartthrob to Icon
The 1990s were transformative for Johnny Depp his career and personal style. Films like "Edward Scissorhands" (1990) and "Benny & Joon" (1993) saw Depp sporting various hair lengths and styles. But, his long, unkempt hair in "What's Eating Gilbert Grape" (1993) began to draw significant attention. This period marked the beginning of Johnny Depp long hair. which became a defining feature of his image.
The Iconic Roles: Hair as a Character Element
Edward Scissorhands (1990)
In "Edward Scissorhands," Johnny Depp's character had a wild and mane that complemented his ethereal and misunderstood persona. This role showcased how long hair Johnny Depp could enhance a character's depth and mystery.
Captain Jack Sparrow: The Pirate with Flowing Locks
One of Johnny Depp's iconic roles is Captain Jack Sparrow from the "Pirates of the Caribbean" series. Sparrow's long, dreadlocked hair symbolised his rebellious and unpredictable nature. The character's look, complete with beads and trinkets woven into his hair. was a collaboration between Depp and the film's costume designers. This style became iconic and influenced fashion trends and Halloween costumes worldwide.
Other Memorable Characters
Depp's long hair has also been featured in other roles, such as Ichabod Crane in "Sleepy Hollow" (1999). and Roux in "Chocolat" (2000). In these films, his hair added a layer of authenticity and depth to his characters. proving that Johnny Depp with long hair is more than a style—it's a storytelling tool.
Off-Screen Influenc
La transidentité, un sujet qui fractionne les FrançaisIpsos France
Ipsos, l’une des principales sociétés mondiales d’études de marché dévoile les résultats de son étude Ipsos Global Advisor “Pride 2024”. De ses débuts aux Etats-Unis et désormais dans de très nombreux pays, le mois de juin est traditionnellement consacré aux « Marches des Fiertés » et à des événements festifs autour du concept de Pride. A cette occasion, Ipsos a réalisé une enquête dans vingt-six pays dressant plusieurs constats. Les clivages des opinions entre générations s’accentuent tandis que le soutien à des mesures sociétales et d’inclusion en faveur des LGBT+ notamment transgenres continue de s’effriter.
MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANEDK PAGEANT
Amruthaa Uttam Jagdhane, a stunning woman from Pune, has won the esteemed title of Mrs. India 2024, which is given out by the Dk Exhibition. Her journey to this prestigious accomplishment is a confirmation of her faithful assurance, extraordinary gifts, and profound commitment to enabling women.
Have you ever wondered about the lost city of Atlantis and its profound connection to our modern world? Ruth Elisabeth Hancock’s podcast, “Visions of Atlantis,” delves deep into this intriguing topic in a captivating conversation with Michael Le Flem, author of the enlightening book titled “Visions of Atlantis.” This podcast episode offers a thought-provoking blend of historical inquiry, esoteric wisdom, and contemporary reflections. Let’s embark on a journey of discovery as we unpack the mysteries of ancient civilizations and their relevance to our present existence.
Exploring Ancient Mysteries Visions of Atlantis.pptx
Differential gene expression
1. [Pink Sherbet Photography] RNAseq analysis: Differential gene expression (2/2) Hopscotch and isoforms August 25, 2011
2. Reads->alignment to reference genome->transcript assembly Resulting file type: BAM, gff/bed “What transcripts are in my samples?” August 25, 2011 Transcript assembly Projects Fastq Mapping Quick recap: Mapping and transcript assembly Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
3. RNAseq analysis question Is there a difference in the transcriptome of two different conditions ? Quantify expression Quantify difference August 25, 2011 Condition1 Condition2
4. RNAseqvsExpression Array RNAseq can capture a larger dynamic range RNAseq can handle degraded samples Gain additional information New transcripts (New) isoforms Variants August 25, 2011 Flattening out Array RNA-seq Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009 PMID: 19015660
5. Challenges Strand-specific methods still biased Number of reads not necessarily correlate with transcript abundance Longer transcripts have more reads (fragmentation). Technical variability between runs causes different number of total reads. Lowly abundant does not mean non-functional How to quantify expression of isoforms August 25, 2011 Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011 PMID: 21191423 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
6. Production Informatics and Bioinformatics August 25, 2011 Produce raw sequence reads Basic Production Informatics Map to genome and generate raw genomic features (e.g. SNPs) Advanced Production Inform. Analyze the data; Uncover the biological meaning Bioinformatics Research Per one-flowcell project
7. Quantifying expression in RNAseq Long genes get more reads Normalize: fragments per kilobase of transcript per million mapped reads (FPKM) FPKM accounts for the dependency between paired-end reads August 25, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353. Oshlack A, Wakefield MJ. Transcript length bias in RNA-seq data confounds systems biology. Biol Direct. 2009 PMID: 19371405
8. Quantifying expression of overlapping isoforms We do not know where reads of overlapping isoformsacutally belong Alexa-Seq counting only the reads that map uniquely to a single isoform isoform-expression methods (cufflinks) likelihood function modeling the sequencing process (not very accurate for lowly expressed transcripts) 'exon intersection method’ (analogous to expression microarrays) counts reads mapped to its constitutive exons (reduce power for differential expression analysis) 'exon union method’ counts all reads mapped to any exon in any of the gene's isoforms (underestimates expression for alternatively spliced genes). August 25, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
9. Differentially expression What is a statistically significant difference between a set of measurements (expression of a gene) of two populations (conditions) First, estimate variability Observe biological variability (needs large numbers of replicates to sample the population). model biological variability model the count variance across replicates as a nonlinear function of the mean counts using various different parametric approaches (such as the normal and negative binomial distributions) (EdgeR, DESeq, Cuffdiff) August 25, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
10. Three things to remember RNAseq captures larger dynamic range (more sensitive) Additional information compared to arrays (e.g. isoforms) Need to make assumptions/compromises (quantification, few replicates) August 25, 2011 [cabbit]
11. Next Weeks: NGS Discussion group Jake’s topic August 25, 2011 Two Weeks: Abstract: This session will focus on identifying SNPs from whole genome, exome capture or targeted resequencing data. The approaches of mapping, local realigment, recalibration, SNP calling, and SNP recalibration will be introduced and quality metrics discussed.