The document provides information about RNA-seq analysis using R and Bioconductor. It begins with an introduction to the BCBB branch and its services assisting researchers with bioinformatics and computational projects. The document then discusses RNA-seq, R, and Bioconductor individually before explaining how they can be used together for RNA-seq analysis. Step-by-step tutorials and resources are provided for differential expression analysis and other tasks using R packages like DESeq2.
INTRODUCTION
WHAT IS DATA AND DATABASE?
WHAT IS BIOLOGICAL DATABASE?
TYPES OF BIOLOGICAL DATABASE
PRIMARY DATABASE
Nucleic acid sequence database
Protein sequence database
SECONDARY DATABASE
COMPOSITE DATABASE
TERTIARY DATABASE
WHY NEED?
CONCLUSION
REFRENCES
Module 2 Sequence similarity.
Part of bioinformatics training session "Basic Bioinformatics concepts, databases and tools" - http://www.bits.vib.be/training
Biotechnology information system in india (btis net)KAUSHAL SAHU
Introduction to Bioinformatics
Bioinformatics in India
Biotechnology Information System Network
Objective
Structure of BTISnet in India
Apex centre
Centre of excellence
Research activities proposed to be undertaken by the CoEs
Distributed information centers(DICs)
Sub-Distribution
Sub-DIC National Institute of Technology, Raipur
BIF for Biology Teaching Through Bioinformatics (BTBI)
EMBnet India Node
Future planning
Conclusion
Reference
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...SELF-EXPLANATORY
This pdf is about the protein structure classification/domain prediction: SCOP and CATH (Bioinformatics).
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
INTRODUCTION
WHAT IS DATA AND DATABASE?
WHAT IS BIOLOGICAL DATABASE?
TYPES OF BIOLOGICAL DATABASE
PRIMARY DATABASE
Nucleic acid sequence database
Protein sequence database
SECONDARY DATABASE
COMPOSITE DATABASE
TERTIARY DATABASE
WHY NEED?
CONCLUSION
REFRENCES
Module 2 Sequence similarity.
Part of bioinformatics training session "Basic Bioinformatics concepts, databases and tools" - http://www.bits.vib.be/training
Biotechnology information system in india (btis net)KAUSHAL SAHU
Introduction to Bioinformatics
Bioinformatics in India
Biotechnology Information System Network
Objective
Structure of BTISnet in India
Apex centre
Centre of excellence
Research activities proposed to be undertaken by the CoEs
Distributed information centers(DICs)
Sub-Distribution
Sub-DIC National Institute of Technology, Raipur
BIF for Biology Teaching Through Bioinformatics (BTBI)
EMBnet India Node
Future planning
Conclusion
Reference
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...SELF-EXPLANATORY
This pdf is about the protein structure classification/domain prediction: SCOP and CATH (Bioinformatics).
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Learn from influencers. Influencers play a crucial role when it comes to marketing brands. ...
Use social media tools for research. ...
Use hashtag aggregators and analytics tools. ...
Know your hashtags. ...
Find a unique hashtag. ...
Use clear hashtags. ...
Keep It short and simple. ...
Make sure the hashtag is relevant.
The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies
https://www.shamra.sy/academia/show/5b06e01c54e75
Enabling Large Scale Sequencing Studies through Science as a ServiceJustin Johnson
Now
“Now” generation sequencing has drastically changed the traditional costs and infrastructure within the sequencing community. There are several technologies, platforms and algorithms that show promise, but it is not always intuitive where to start. This uncertainty is compounded by the fact that commonly used analysis tools are difficult to build, maintain, and run effectively. Sample acquisition and preparation is quickly becoming a bottleneck as projects move from small sample sizes to hundreds or even thousands of samples. We will present case studies highlighting information, methods, challenges and opportunities in leveraging large scale high throughput sequencing and bioinformatics. Specifically we will highlight a recent genome-wide study of methylation patterns in 1575 individuals with Schizophrenia. We will also discuss several cancer transcriptome and exome sequencing projects as well as a human pathogen transcriptome characterization project consisting of multiple organisms and almost a billion reads.
The Future
The Ion Torrent PGM machine is a very promising, rapid throughput, ultra scalable sequencer that could play an integral part in future human health studies. Applications such as microbial whole genome sequencing, metagenomic characterization of environmental and microbiome sample, and targeted resequencing projects stand to benefit from this technology over time. To date we have completed more than 25 runs on a single PGM and will comment on the setup as well as sequence data and analysis.
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
The i5K, an initiative to sequence the genomes of 5,000 insect and related arthropod species, is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process, and Apollo is serving as the platform to empower this community.
This presentation is an introduction to Apollo for the members of the i5K Pilot Project working on species of the order Hemiptera.
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
This workshop will address critical issues related to Transcriptomics data:
Processing raw Next Generation Sequencing (NGS) data:
1. Next Generation Sequencing data preprocessing:
Trimming technical sequences
Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
Conventional pipelines (looking at known transcripts)
Identification of novel isoforms
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
Principal Component Analysis
Clustering
4. Supervised analysis:
Differential expression analysis
Classification, gene signature construction
5. Gene set enrichment analysis
The workshop will include hands-on exercises utilizing public domain datasets:
breast cancer cell lines transcriptomic profiles (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
patient-derived xenograft (PDX) mouse model of tumor and stroma transcriptomic profiles (http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014&path[]=23533), and
processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
Team: The workshops are designed by the researchers at the Tauber Bioinformatics Research Center at University of Haifa, Israel in collaboration with academic centers across the US. Technical support for the workshops is provided by the Pine Biotech team. https://edu.t-bio.info/a-critical-approach-to-transcriptomic-data-analysis/
Detecting and Quantifying Low Level Variants in Sanger Sequencing TracesThermo Fisher Scientific
Automated fluorescent dye-terminator DNA Sequencing using capillary electrophoresis (also known as CE or Sanger sequencing) has been instrumental in the detailed characterization of the human genome and is now widely used as gold standard method for verification of mutation findings, notably in tumor samples. The primary information of the DNA sequencing process is the identification of the nucleotides and of possible sequence variants. A largely unexplored feature of fluorescent Sanger sequencing traces is the quantitative information embedded therein. With the growing need for quantifying somatic mutations in tumor tissue it is desirable to exploit the potential of the quantitative information obtained from sequencing traces.
Materials and Methods
To this end, we have developed a software tool that converts a Sanger sequencing trace file into a .comma separated value (.csv) file containing numerical data of peak data characteristics that can be explored and analyzed using conventional spreadsheet software. The web-based tool can be accessed at: http://apps.lifetechnologies.com/ab1peakreporter .
The output file contains the peak height and quality values for each nucleotide and peak height ratios for all 4 bases at any given locus allowing the detection and assessment of subtle changes at any given allele.
Results and Discussion
We demonstrate the utility of this tool by analyzing mixed DNA samples with known amounts of spiked in variant alleles from the human TP53 gene ranging from 2.5%, 5%, 7.5%, 10%, 15% and 25% and show that the minor alleles could be readily detected below the 10% level.
Conclusion
Enabling high sensitivity detection of minor alleles with a widely available and simple to use technology like Sanger sequencing will be useful for verification of results obtained from next generation sequencing (NGS) platforms.
The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput ...Mackenna Galicia
Bioinformatics combines the elements of biology, computer science, and statistics to work with genome sequencing. My project utilizes a sequence analysis technique, k-mer minimizers, to identify bacterium from a shotgun genomic DNA sample. We used the algorithm Bevel to compare DNA sequences against standardized reference genomes in the PATRIC whole genome bacterial database. Bevel is a sequence similarity tool that uses a minimizer database. Minimizers are representative k-mers, subsequences of length k observed to have the minimum hash value across a genomic region and are therefore unique and comparable to that genomic region. The two databases are queried against each other, resulting in a list of positions where two or more sequences match. I am developing two Python applications that first, process the results of the algorithm and secondly, return a score that enable the ranking of bacterium matches. The higher the score, the better the match between the unknown bacteria and the standardized reference genome. The goal of this experiment is to show that minimizers are a fast mean of characterizing bacterial shotgun assembly contigs.
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
The continuous evolution of NGS technology has led to an enormous diversification in NGS applications and dramatically decreased the costs to sequence a complete human genome.
In this presentation, we will discuss the following major topics:
• Basic overview of NGS sequencing technologies
• Next-generation sequencing workflow
• Spectrum of NGS applications
• QIAGEN universal NGS solutions
OVium Bio-Information Solutions use forefront algorithms to analyze key data resources such NCBI, EBLM and PDB to develop cell signal pathways.
OVium employs cloud and MPP computing solutions with homology and signal network mapping to develop chemical and protein pathways for discovery research.
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSGolden Helix Inc
GenomeBrowse, a free visualization tool for all types of sequence data, was introduced in 2012 to broad acclaim. Researchers using GenomeBrowse discovered a product far beyond the status quo with seamless navigation of sequence alignments and other genomic data using a fluid, fast, and intuitive interface that just "made sense." Recent updates to GenomeBrowse, including support for VCF files and BED files and the ability to export tables of data extracted from viewable annotation tracks, further improved the product and created new synergy with Golden Helix SNP & Variation Suite (SVS).
This webcast will demonstrate the ability of GenomeBrowse to stream sequence alignment data from the Amazon Cloud, seamlessly transitioning between whole genome views and base-pair resolution in the context of both public and custom annotation tracks. We will show how GenomeBrowse can be used in conjunction with SVS to highlight false variant calls, confirm the inheritance pattern of putative functional variants, and aid in the interpretation of a variant's impact. Examples of RNA-seq expression analysis, somatic variation in cancer, and family-based DNA-seq analysis will be included.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
1. Date
Maarten Leerkes PhD
Genome Analysis Specialist
Bioinformatics and Computational Biosciences Branch
Office of Cyber Infrastructure and Computational Biology
RNA-seq with R-bioconductor
Part 1.
2. BCBB: A Branch Devoted to Bioinformatics and
Computational Biosciences
§ Researchers’ time is increasingly important
§ BCBB saves our collaborators time and effort
§ Researchers speed projects to completion using
BCBB consultation and development services
§ No need to hire extra post docs or use external
consultants or developers
2
4. Contact BCBB…
§ “NIH Users: Access a menu of BCBB services on
the NIAID Intranet:
• http://bioinformatics.niaid.nih.gov/
§ Outside of NIH –
• search “BCBB” on the NIAID Public Internet Page:
www.niaid.nih.gov
– or – use this direct link
§ Email us at:
• ScienceApps@niaid.nih.gov
4
5. Seminar Follow-Up Site
§ For access to past recordings, handouts, slides visit this site from the
NIH network: http://collab.niaid.nih.gov/sites/research/SIG/
Bioinformatics/
5
1. Select a
Subject Matter
View:
• Seminar Details
• Handout and
Reference Docs
• Relevant Links
• Seminar
Recording Links
2. Select a
Topic
Recommended Browsers:
• IE for Windows,
• Safari for Mac (Firefox on a
Mac is incompatible with
NIH Authentication
technology)
Login
• If prompted to log in use
“NIH” in front of your
username
8. What is R
§ R is a programming language and software
environment for statistical computing and graphics.
The R language is widely used among statisticians
and data miners for developing statistical software[2]
[3] and data analysis.
8
9. What is R
§ R is an implementation of the S programming
language combined with lexical scoping semantics
inspired by Scheme. S was created by John
Chambers while at Bell Labs. There are some
important differences, but much of the code written for
S runs unaltered.
9
10. What is R
§ R is a GNU project. The source code for the R
software environment is written primarily in C, Fortran,
and R. R is freely available under the GNU General
Public License, and pre-compiled binary versions are
provided for various operating systems. R uses a
command line interface; there are also several
graphical front-ends for it.
10
16. What is RNAseq
§ RNA-seq (RNA Sequencing), also called Whole
Transcriptome Shotgun Sequencing (WTSS), is a
technology that uses the capabilities of next-
generation sequencing to reveal a snapshot of
RNA presence and quantity from a genome at a
given moment in time.
16
17. Topics
§ What is R
§ What is Bioconductor
§ What is RNAseq
§ Comes together in: RNA-seq with R-bioconductor
17
18. Different kinds of objects in R
§ Objects.
§ The following data objects exist in R:
§ vectors
§ lists
§ arrays
§ matrices
§ tables
§ data frames
§ Some of these are more important than others. And
there are more.
18
21. A data frame is used for storing data
tables. It is a list of vectors of equal length.
§ A data frame is a table, or two-dimensional array-like
structure, in which each column contains
measurements on one variable, and each row
contains one case. As we shall see, a "case" is not
necessarily the same as an experimental subject or
unit, although they are often the same.
21
22. Combine list of data frames into single data frame, add
column with list index: list of vectors of equal length.
22
24. Rna-seq with R
Demo: easyRNAseq
Source(“c:windowsmynamerna_seq_tutorial.R”)
source("/vol/maarten/rna_seq_tutorial2.R")
http://bioscholar.com/genomics/bioconductor-packages-analysis-rna-seq-data/
39. Numerous
possible
analysis
strategies
§ There
is
no
one
‘correct’
way
to
analyze
RNA-‐seq
data
§ Two
major
branches
• Direct
alignment
of
reads
(spliced
or
unspliced)
to
genome
or
transcriptome
• Assembly
of
reads
followed
by
alignment*
*Assembly is the only option when working with a creature with no genome sequence,
alignment of contigs may be to ESTs, cDNAs etc
or transcriptome
Image from Haas & Zody, 2010
45. RNA
sequencing:
abundance
comparisons
between
two
or
more
condi9ons
/
phenotypes
CondiCon
1
(normal
Cssue)
CondiCon
2
(diseased
Cssue)
Isolate
RNAs
Sequence
ends
100s
of
millions
of
paired
reads
10s
of
billions
bases
of
sequence
Generate
cDNA,
fragment,
size
select,
add
linkers
Samples
of
interest
Map
to
genome,
transcriptome,
and
predicted
exon
junc9ons
Downstream
analysis
51. Common
analysis
goals
of
RNA-‐Seq
analysis
(what
can
you
ask
of
the
data?)
§ Gene
expression
and
differenCal
expression
§ AlternaCve
expression
analysis
§ Transcript
discovery
and
annotaCon
§ Allele
specific
expression
• RelaCng
to
SNPs
or
mutaCons
§ MutaCon
discovery
§ Fusion
detecCon
§ RNA
ediCng
52. Back
to
the
demo
§ IntroducCon
to
RNA
sequencing
§ RaConale
for
RNA
sequencing
(versus
DNA
sequencing)
§ Hands
on
tutorial
53. Rna-seq with R
Demo: easyRNAseq
Source(“c:windowsmynamerna_seq_tutorial.R”)
source("/vol/maarten/rna_seq_tutorial2.R")
http://bioscholar.com/genomics/bioconductor-packages-analysis-rna-seq-data/
55. Deseq and DEseq2
§ method based on the negative binomial distribution,
with variance and mean linked by local regression
§ DEseq2:
§ No demo scripts available yet:
§ http://www.bioconductor.org/packages/release/bioc/
vignettes/DESeq2/inst/doc/DESeq2.pdf
55
80. Outline
§ IntroducCon
to
RNA
sequencing
§ RaConale
for
RNA
sequencing
(versus
DNA
sequencing)
§ Hands
on
tutorial
§ hQp://swcarpentry.github.io/r-‐novice-‐inflammaCon/
§ hQp://swcarpentry.github.io/r-‐novice-‐inflammaCon/02-‐func-‐R.html
§ hQp://www.bioconductor.org/help/workflows/
§ hQp://www.bioconductor.org/packages/release/data/experiment/
html/parathyroidSE.html
§ hQp://www.bioconductor.org/help/workflows/rnaseqGene/
81. About bioconductor
High-throughput sequence analysis with R and Bioconductor:
http://www.bioconductor.org/help/course-materials/2013/useR2013/
Bioconductor-tutorial.pdf
http://bioconductor.org/packages/2.13/data/experiment/vignettes/
RnaSeqTutorial/inst/doc/RnaSeqTutorial.pdf
Also helpful: http://www.bioconductor.org/help/course-materials/2002/
Summer02Course/Labs/basics.pdf