The document discusses next generation sequencing methods and RNA sequencing. It covers topics like sequencing formats, data analysis workflows including mapping, clustering, assembly programs, finding new genes and correcting existing ones. It discusses input file types, calculating sequencing depth, available tools for alignment, output file formats, assembly programs, splice junction prediction, and applications of RNA sequencing like gene expression analysis and annotation.
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
An introduction to the commonly used formats for the next-generation sequencing data. ngs.plot is a popular tool for the visualization and data mining of the NGS data.
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
Next-generation sequencing format and visualization with ngs.plotLi Shen
Lecture given at the department of neuroscience, Icahn school of medicine at Mount Sinai. ngs.plot has been published in BMC genomics. Link: http://www.biomedcentral.com/1471-2164/15/284
The field of next-generation sequencing (NGS) has been experiencing explosive growth over the past several years and shows little sign of slowing down. The increasing capabilities and dramatically lowered costs have expanded NGS's reach beyond that of the human genome into nearly every corner of biological research. An overview of the platforms on the market today, including an assessment of their relative strengths and weaknesses, will be presented. The presentation will conclude with a peek into where the technology is going and what will be available in the future.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
Presentation to cover the data and file formats commonly used in next generation sequencing (high throughput sequencing) analyses. From nucleotide ambiguity codes, FASTA and FASTQ, quality scores to SAM and BAM, CIGAR strings and variant calling format. This was given as part of the EPIZONE Workshop on Next Generation Sequencing applications and Bioinformatics in Brussels, Belgium in April 2016.
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
Talk by A Tovchigrechko at BOSC2012: "MGTAXA: a toolkit and webserver for predicting taxonomy of the metagenomic sequences with Galaxy frontend and parallel computational backend"
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
An introduction to the commonly used formats for the next-generation sequencing data. ngs.plot is a popular tool for the visualization and data mining of the NGS data.
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
Next-generation sequencing format and visualization with ngs.plotLi Shen
Lecture given at the department of neuroscience, Icahn school of medicine at Mount Sinai. ngs.plot has been published in BMC genomics. Link: http://www.biomedcentral.com/1471-2164/15/284
The field of next-generation sequencing (NGS) has been experiencing explosive growth over the past several years and shows little sign of slowing down. The increasing capabilities and dramatically lowered costs have expanded NGS's reach beyond that of the human genome into nearly every corner of biological research. An overview of the platforms on the market today, including an assessment of their relative strengths and weaknesses, will be presented. The presentation will conclude with a peek into where the technology is going and what will be available in the future.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
Presentation to cover the data and file formats commonly used in next generation sequencing (high throughput sequencing) analyses. From nucleotide ambiguity codes, FASTA and FASTQ, quality scores to SAM and BAM, CIGAR strings and variant calling format. This was given as part of the EPIZONE Workshop on Next Generation Sequencing applications and Bioinformatics in Brussels, Belgium in April 2016.
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
Talk by A Tovchigrechko at BOSC2012: "MGTAXA: a toolkit and webserver for predicting taxonomy of the metagenomic sequences with Galaxy frontend and parallel computational backend"
NCI Cancer Genomics, Open Science and PMI: FAIR Warren Kibbe
Talk given to the NLM Fellows on July 8, 2016. Touches on Cancer Genomics, Open Science and PMI: FAIR in NCI genomics thinking and projects. Includes discussion of the Genomic Data Commons (GDC), Cancer Data Ecosystem, Data sharing, and the NCI cancer clinical trials open API.
NCI Cancer Imaging Program - Cancer Research Data EcosystemWarren Kibbe
Given to the NCI Cancer Imaging Program monthly telecon on January 9th, 2017. NCI Genomic Data Commons, Beau Biden Cancer Moonshot Blue Ribbon Panel, Cancer Research Data Ecosystem and the role of imaging in precision medicine
A micro-array is a tool for analyzing gene expression that consists of a small membrane or glass slide containing samples of many genes arranged in a regular pattern.
This was made by me while I was in Masters. I have made few animations. I hope it makes understanding better.
The content is made by searching through internet and referencing books. I do not claim any content in whole presentation except the animations made on the subject.
The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies
https://www.shamra.sy/academia/show/5b06e01c54e75
Learn from influencers. Influencers play a crucial role when it comes to marketing brands. ...
Use social media tools for research. ...
Use hashtag aggregators and analytics tools. ...
Know your hashtags. ...
Find a unique hashtag. ...
Use clear hashtags. ...
Keep It short and simple. ...
Make sure the hashtag is relevant.
Under the Hood of Alignment Algorithms for NGS ResearchersGolden Helix Inc
Most NGS analysis is founded on a very simple and powerful principle: look only at the differences of your data to a reference genome of your species. Alignment algorithms are the workhorse of this approach and accounts for the vast majority of the compute time necessary in a secondary analysis workflow. In this webcast, Gabe Rudy covers the history of alignment algorithms of short read, high-throughput sequencing data and the set of tools that represent the state of the art.
We will use the newly launched GenomeBrowse 2.0 visualization engine to review examples of different alignment artifacts, false-positive variant calls, and other alignment and variant meta-data.
What you can expect to learn:
- How all alignment algorithms are a trade-off of speed versus accuracy, and what those trade-offs can mean with your data.
- How the human reference sequence causes alignment artifacts, and how you can spot them.
- How BWA, BWA-MEM and BWA-SW differ.
- How local re-alignment works to improve variant calling, and when you will see it and won't see it in action in your data.
- How to read a CIGAR string and other per-alignment data to investigate alignments at a particular locus.
We will use the newly launched GenomeBrowse 2.0 visualization engine to review examples of different alignment artifacts, false-positive variant calls, and other alignment and variant meta-data.
Complementing Computation with Visualization in GenomicsFrancis Rowland
A look at Genome Assembly Visualization with ABySS-Explorer, as well as complementing genome browsing
(Using clustering and interactive data exploration)
[2017-05-29] DNASmartTagger : Development of DNA sequence tagging tools based on machine learning using public sequence annotation data, NIG International Symposium 2017.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
5. RNASeq
Catalogue all species of transcripts.
mRNA
Non-coding RNA
Small RNA
Splicing patterns or other post-transcriptional
modifications.
Quantify the expression levels.
6. Topics covered
Sequence formats
Calculate the sequencing depth of coverage
Data Analysis Workflow
Mapping programs
Output data files
SAM
SHRIMP
MAQ
Clustering and assembly programs
Finding new genes and correction of existing genes
Annotation of RNAseq data
8. Calculate the sequencing depth of
coverage
Read Length
Number of reads
GeneSpace size/genome size
Read Length * Number of Reads/GeneSpace (or genome size)
Problem: 12 million reads , read length = 50 bases, Total
GeneSpace=8 MB
12 * 10^6 * 50/8 * 10^6 = 75X
9. Part -1 : Alignment of the reads to the reference Genome
Raw Reads mapped to
QC by R
Sequence reference Bowtie,
ShortReads
Data BWA, Shrimp
Files(FastQ/
colorspace)
1. Filter out spike-
BEDTools
ins
1. Read Depth
2. Filter reads
of coverage
mapping multi
2. Manipulatio
locations
n of
3. Sam -> Bam
BED,SAM,
4. Remove PCR
BAM, GTF,
duplicates
GFF files
5. Sort, View,
pileup, merge
SNP
discovery,
indel
10. Part 2: Data Anlysis
Assembly of Assembly of
Mapped reads raw QCd
(cufflink) reads by
denovo
methods
Abyss, Velvet
Gene Model
Align correction/ju
Merging assembled nction
cufflink reads back to finding
outputs from genome(BLAT) TopHat,
different Transabyss
Splice
libraries Variants
(cuffcompare
)
Expression Analysis
Copy and differential
Number expression (cuffdiff,
Variation DEGseq, edgeR)
12. Mapping
One or two mis-matches < 35 bases
One insertion/deletion.
K-mer based seeding.
•Identification of Novel Transcripts.
•Transcript abundance.
13. Available tools for Nextgen
sequence alignment
BFAST: Blat like Fast Alignment Tool.
Bowtie: Burrows-Wheeler-Transformed (BWT)
index.
BWA: Gapped global alignment wrt query
sequences.
ELAND: Is part of Illumina distr. And runs on
single processor, Local Alignment.
SOAP: Short Oligonucleotide Alignment Program.
SSAHA: SSAHA (Sequence Search and
Alignment by Hashing Algorithm)
SHRiMP(Short Read Mapping algorithm)
SOCS: Rabin-Karp string search algorithm, which
14. Integrated Pipeline
• SOLiD™ System Analysis Pipeline Tool
(Corona Lite)
• CLCBio Genomic workbench.
• Partek
• Galaxy Server.
• ERANGE: Is a full package for RNASeq
and chipSeq data analysis
• DESEQ(used by edgeR package)
15. Output File Formats
SAM(Sequence Alignment and Mapping)
SAM BAM
Sorting/indexing BAM/SAM files
Extracting and viewing alignment
SNP calling(mpileup)
Text viewer(Tview)
1082_1988_1406_F3 16 scaffold_1 31452 255 48M *
0 0
TCCACGTCACCAGCAAGCCTCCGGTCAATCCGTCTGACTTGTCCTGTC
8E/./:R*
$BIG/!%GP9@MMK;@FMJIXVNSWNNUUOTXQNGFQUPN XA:i:0
MD:Z:48 NM:i:0 CM:i:5
0 -> the read is not paired and mapped, forward strand
4 -> unmapped read
16 -> mapped to the reverse strand http://samtools.sourceforge.net/SAM1.pdf
16. SHRiMP and MAQ Format
>947_1567_1384_F3 reftig_991 + 22901 22923 3 25 25 2020
18x2x3
A perfect match for 25-bp tags is: "25“
Edit String
A SNP at the 16th base of the tag is: "15A9“
A four-base insertion in the reference: "3(TGCT)20"
A four-base deletion in the reference: "5----20"
Two sequencing errors: "4x15x6" (i.e. 25 matches with 2
crossovers)
http://compbio.cs.toronto.edu/shrimp/README
ID19_190907_6_195_127_427 Contig0_2091311 60 + 0
0
30 30 30 0 0 1 4 35
GTGCAGCCATTTGCGT
ACaAGCaTCtCaaGctACt ?IIIIIIIIIIIIII@EI6<II6HB9I(8I6.G<-
17. Assembly program
Abyss
Supports multiple K values
Fast
Merging different K valued assembly possible
Trans-abyss pipeline runs on this
MIRA(Mimicking Intelligent Read Assembly)
Hybrid Denovo assembler
Genome Mapper
Velvet
22. Cufflink
Transcript Assembly
Expression levels with a reference GTF
Expression levels without GTF.
Merging experimental replicates(cuffcompare)
Differential Expression Analysis(cuffdiff)
23. Annotation of RNASeq Data
De novo Reads
Assembled mapped to
Reads (contigs) reference
assembled
Map Back to
genome
(BLAT)
Expressio
Train for n Profiling
Junction/no gene
vel prediction
transcripts/ Differential
Splice Expression CNV
variants analysis
26. Difference with other expression
sequencing
EST: Low throughput, expansive, NOT
quantitative.
SAGE, CAGE, MPSS: Highthroughput, digital
gene expression levels
Expansive
Sanger sequencing methods
A portion of transcript is analyzed
Isoforms are indistinguishable
27. Advantages:
Zero or very less background noise.
Sensitive to isoform discovery.
Both low and highly expressed genes can be
quantified.
Highly reproducible.
28. Transcripts discovered/Corrected
10,000 new Transcription start site discovered in
Rhesus macaque(Liu et al., NAR 2010)
602 transcriptionally active regions and numerous
introns in Candida albicans(Bruno et al., 2010,
Genome Research)
96% of the genes were corrected in Laccaria
bicolor(Larsen et al., PLoS One 2010).
16,923 regions in mouse (Martazavi et al., 2008).
3,724 novel isoforms (Trapanell 2010).
29. Bioinformatics Challenges
Store , retrieve and analyze large amounts of
data
Matching of reads to multiple locations
Short reads with higher copy number and long
reads representing less expressed genes.
30. References:
Wilhelm J. Ansorge, Next-generation DNA sequencing techniques, New
Biotechnology, Volume 25, Issue 4, April 2009, Pages 195-203
Zhong Wang, Mark Gerstein, and Michael Snyder. RNA-Seq: a
revolutionary tool for transcriptomics. Nat Rev Genet. 2009 January;
10(1): 57–63.
Peter E. Larsen et al., Using Deep RNA Sequencing for the Structural
Annotation of the Laccaria Bicolor Mycorrhizal TranscriptomePLoS One.
2010; 5(7): e9780
Wang et al. MapSplice: Accurate mapping of RNA-seq reads for splice
junction discovery, NAR, 2010
Denoeud et al., Annotating genomes with massive-scale RNA
sequencing, Genome Biology, 2008
Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren
MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and
quantification by RNA-Seq reveals unannotated transcripts and isoform
switching during cell differentiation Nature Biotechnology
doi:10.1038/nbt.1621
Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions
with RNA-Seq. Bioinformatics doi:10.1093/bioinformatics/btp120
Mortazavi et al. Nature Methods, May 2008
Editor's Notes
An overview of the MapSplice pipeline. The algorithm contains two phases: tag alignment (Step 1–Step 4) and splice inference (Step 5–Step 6). In the ‘tag alignment' phase, candidate alignments of the mRNA tags to the reference genome are determined. In the ‘splice inference' phase, splice junctions that appear in one or more tag alignments are analyzed to determine a splice significance score based on the quality and diversity of alignments that include the splice. Ambiguous candidate alignments are resolved by selecting the alignment with the overall highest quality match and highest confidence splice junctions.
Cap analysis of gene expression, Massively parallel signature sequencing , Serial analysis of gene expression