This workshop will address critical issues related to Transcriptomics data:
Processing raw Next Generation Sequencing (NGS) data:
1. Next Generation Sequencing data preprocessing:
Trimming technical sequences
Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
Conventional pipelines (looking at known transcripts)
Identification of novel isoforms
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
Principal Component Analysis
Clustering
4. Supervised analysis:
Differential expression analysis
Classification, gene signature construction
5. Gene set enrichment analysis
The workshop will include hands-on exercises utilizing public domain datasets:
breast cancer cell lines transcriptomic profiles (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
patient-derived xenograft (PDX) mouse model of tumor and stroma transcriptomic profiles (http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014&path[]=23533), and
processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
Team: The workshops are designed by the researchers at the Tauber Bioinformatics Research Center at University of Haifa, Israel in collaboration with academic centers across the US. Technical support for the workshops is provided by the Pine Biotech team. https://edu.t-bio.info/a-critical-approach-to-transcriptomic-data-analysis/
Pine Biotech conducts monthly informational workshops on the topics related to high-throughput data analysis, interpretation and integration. The workshops highlight our research tools and educational resources developed with collaborators in the US and across the world.
Excited to share our vision for bioinformatics education available for students and researchers that want to apply advanced multi-omics integration and machine learning to large biomedical datasets. Practice and learn from real-life projects.
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...Elia Brodsky
Overview of the Omics Logic Program for Bioinformatics Training conducted by Pine Biotech for the Louisiana Biomedical Research Network and the graduate studnets at LSU.
A collaborative model for bioinformatics education: combining biologically i...Elia Brodsky
Presented at the 6th Annual LA Conference on Computational Biology & Bioinformatics
Authors:
Kimberlee Mix*, Patricia Dorn*, Donald Hauber*, Scott McDermott**, Ryan Harvey** , Jack LeBien***, Sahil Sethi***, Julia Panov***, Avi Titievsky****, Elia Brodsky***
Departments of Biological Sciences*, Mathematics and Computer Science**, Loyola University New Orleans, 6363 St Charles Avenue, New Orleans, LA 70118
Pine Biotech, Inc***, 1441 Canal St. New Orleans, LA 70112
Tauber Bioinformatics Research Center****, University of Haifa Multi Purpose Building Room 225A Mount Carmel, Haifa 3498838 ISRAEL
Despite the growing impact of bioinformatics in the biological science community, integration of an on-site bioinformatics curriculum is cost prohibitive for many universities due to the necessary infrastructure and computational resources. Furthermore, many programs prioritize the technical aspects of bioinformatics over the biological concepts and logic of analyses, thus limiting the emphasis on critical thinking, problem solving, and in-depth inquiry. To address the gap in bioinformatics education and train students to approach complex biomedical problems, we present a new model for curriculum development that combines our unique online learning environment with traditional pedagogical approaches delivered through academic partnerships. The T-BioInfo platform (https://t-bio.info) allows users to combine computational analysis modules into pipelines to develop solutions for ‘omics data and machine learning problems. State-of-the-art tools for analysis, integration, and visualization of data are offered through a user-friendly interface. In parallel, online educational modules provide a theoretical framework for the analysis methods and experimental techniques. This model for bioinformatics training was implemented at Loyola University New Orleans, a liberal arts institution, for the first time in January 2018. Twelve undergraduate students and five faculty members participated in a new one-semester bioinformatics course. After completing a core set of online modules and pipelines, students conducted team research projects on topics such as patient derived xenograft (PDX) models, immune responses in cancer, and precision medicine. Gains in critical thinking and problem-solving skills were observed and participants were enthusiastic about engaging in bioinformatics research. In conclusion, our collaborative model for bioinformatics education combines best-practices in online and in-class learning with a powerful computational platform. This model could be implemented in undergraduate and graduate curricula to enhance research, build partnerships with industry, and strengthen the scientific workforce.
The OmicsLogic Genomics Program provides in-depth understanding of bioinformatics methods we will cover in the upcoming 2019 session: https://edu.t-bio.info/organizations/omicslogic-genomics-training-program/
Pine Biotech conducts monthly informational workshops on the topics related to high-throughput data analysis, interpretation and integration. The workshops highlight our research tools and educational resources developed with collaborators in the US and across the world.
Excited to share our vision for bioinformatics education available for students and researchers that want to apply advanced multi-omics integration and machine learning to large biomedical datasets. Practice and learn from real-life projects.
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...Elia Brodsky
Overview of the Omics Logic Program for Bioinformatics Training conducted by Pine Biotech for the Louisiana Biomedical Research Network and the graduate studnets at LSU.
A collaborative model for bioinformatics education: combining biologically i...Elia Brodsky
Presented at the 6th Annual LA Conference on Computational Biology & Bioinformatics
Authors:
Kimberlee Mix*, Patricia Dorn*, Donald Hauber*, Scott McDermott**, Ryan Harvey** , Jack LeBien***, Sahil Sethi***, Julia Panov***, Avi Titievsky****, Elia Brodsky***
Departments of Biological Sciences*, Mathematics and Computer Science**, Loyola University New Orleans, 6363 St Charles Avenue, New Orleans, LA 70118
Pine Biotech, Inc***, 1441 Canal St. New Orleans, LA 70112
Tauber Bioinformatics Research Center****, University of Haifa Multi Purpose Building Room 225A Mount Carmel, Haifa 3498838 ISRAEL
Despite the growing impact of bioinformatics in the biological science community, integration of an on-site bioinformatics curriculum is cost prohibitive for many universities due to the necessary infrastructure and computational resources. Furthermore, many programs prioritize the technical aspects of bioinformatics over the biological concepts and logic of analyses, thus limiting the emphasis on critical thinking, problem solving, and in-depth inquiry. To address the gap in bioinformatics education and train students to approach complex biomedical problems, we present a new model for curriculum development that combines our unique online learning environment with traditional pedagogical approaches delivered through academic partnerships. The T-BioInfo platform (https://t-bio.info) allows users to combine computational analysis modules into pipelines to develop solutions for ‘omics data and machine learning problems. State-of-the-art tools for analysis, integration, and visualization of data are offered through a user-friendly interface. In parallel, online educational modules provide a theoretical framework for the analysis methods and experimental techniques. This model for bioinformatics training was implemented at Loyola University New Orleans, a liberal arts institution, for the first time in January 2018. Twelve undergraduate students and five faculty members participated in a new one-semester bioinformatics course. After completing a core set of online modules and pipelines, students conducted team research projects on topics such as patient derived xenograft (PDX) models, immune responses in cancer, and precision medicine. Gains in critical thinking and problem-solving skills were observed and participants were enthusiastic about engaging in bioinformatics research. In conclusion, our collaborative model for bioinformatics education combines best-practices in online and in-class learning with a powerful computational platform. This model could be implemented in undergraduate and graduate curricula to enhance research, build partnerships with industry, and strengthen the scientific workforce.
The OmicsLogic Genomics Program provides in-depth understanding of bioinformatics methods we will cover in the upcoming 2019 session: https://edu.t-bio.info/organizations/omicslogic-genomics-training-program/
This HIBB presentation provides background information on bases, amino acids, proteins, nucleotides and DNA. The presentation then explains what bioinformatics is, lists some examples, and demonstrates some tools. It demonstrates tools which compare parts of human and chimp genes, and illustrate drug resistance analysis and HIV subtype analysis. It then discusses some ethical and clinical aspects to bioinformatics.
Uses of Artificial Intelligence in BioinformaticsPragya Pai
This presentation is about the usage of Artificial Intelligence in Bioinformatics. These slides give the basic knowledge about usage of Artificial Intelligence in Bioinformatics.
Slides for the afternoon session on "Introduction to Bioinformatics", delivered at the James Hutton Institute, 29th, 20th May and 5th June 2014, by Leighton Pritchard and Peter Cock.
Slides cover introductory guidance and links to resources, theory and use of BLAST tools, and a workshop featuring some common tools and tasks.
EG-CompBio presentation about Artificial Intelligence in Bioinformatics covering:
-AI (Types, Development)
-Deep Learning (Architecture)
-Bioinformatics Fields
-Input formats for AI
-AI Challenges in Biology
-Example: (Proteomics, Transcriptomics)
-Metagenomics: @ NU
-Taxonomic Classification
-Phenotype Classification
-How to begin in AI in Bioinformatics
An introduction to bioinformatics practices and aims will be given and contrasted against approaches from other fields. Most importantly, it will be discussed how bioinformatics fits into the discovery cycle for hypothesis driven neuroscience research.
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
This HIBB presentation provides background information on bases, amino acids, proteins, nucleotides and DNA. The presentation then explains what bioinformatics is, lists some examples, and demonstrates some tools. It demonstrates tools which compare parts of human and chimp genes, and illustrate drug resistance analysis and HIV subtype analysis. It then discusses some ethical and clinical aspects to bioinformatics.
Uses of Artificial Intelligence in BioinformaticsPragya Pai
This presentation is about the usage of Artificial Intelligence in Bioinformatics. These slides give the basic knowledge about usage of Artificial Intelligence in Bioinformatics.
Slides for the afternoon session on "Introduction to Bioinformatics", delivered at the James Hutton Institute, 29th, 20th May and 5th June 2014, by Leighton Pritchard and Peter Cock.
Slides cover introductory guidance and links to resources, theory and use of BLAST tools, and a workshop featuring some common tools and tasks.
EG-CompBio presentation about Artificial Intelligence in Bioinformatics covering:
-AI (Types, Development)
-Deep Learning (Architecture)
-Bioinformatics Fields
-Input formats for AI
-AI Challenges in Biology
-Example: (Proteomics, Transcriptomics)
-Metagenomics: @ NU
-Taxonomic Classification
-Phenotype Classification
-How to begin in AI in Bioinformatics
An introduction to bioinformatics practices and aims will be given and contrasted against approaches from other fields. Most importantly, it will be discussed how bioinformatics fits into the discovery cycle for hypothesis driven neuroscience research.
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
PICS: Pathway Informed Classification System for cancer analysis using gene e...David Craft
We introduce PICS (Pathway Informed Classification System) for classifying cancers based on tumor sample gene expression levels. The method clearly separates a pan-cancer dataset into their tissue of origin and is also able to sub-classify individual cancer datasets into distinct survival classes. Gene expression values are collapsed into pathway scores that reveal which biological activities are most useful for clustering cancer cohorts into sub-types. Variants of the method allow it to be used on datasets that do and do not contain non-cancerous samples. Activity levels of all types of pathways, broadly grouped into metabolic, cellular processes and signaling, and immune system, are useful for separating the pan-cancer cohort. In the clustering of specific cancer types, certain pathway types become more valuable depending on the site being studied. For lung cancer, signaling pathways dominate, for pancreatic cancer signaling and metabolic pathways, and for melanoma immune system pathways are the most useful. This work suggests the utility of pathway level genomic analysis and points in the direction of using pathway classification for predicting the efficacy and side effects of drugs and radiation.
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Thermo Fisher Scientific
Liquid biopsy diagnostic technologies have revolutionized cancer testing and therapeutic monitoring. Non-invasive sample collection removes the need for invasive and dangerous biopsies to diagnose cancer and monitor therapeutic efficacy. As liquid biopsy technologies become more sensitive, screening for early detection of cancer DNA using a blood test could become routine clinical practice. However, such technologies cannot be developed without high quality reference materials. In this study, ctDNA reference materials using the NIST Genome in a Bottle GM24385 cell line DNA were developed in a human plasma-EDTA matrix. The allelic frequency (AF), size and stability of the materials were analyzed.
Identification, annotation and visualisation of extreme changes in splicing w...Mar Gonzàlez-Porta
Talk for the ECCB'14 workshop: Analysis of differential isoform usage by RNA-seq: statistical methodologies and open software - Strasbourg, 7th September 2014
Ion Torrent™ semiconductor sequencing, combined with Ion AmpliSeq™ technology, provides simultaneous identification of copy number variants (CNVs), single nucleotide variants (SNVs), and small insertions and deletions (indels) from a research sample by means of a single integrated workflow. 100% of assayed CNV regions (n=34) were detected using a reference set of 31 samples with known chromosomal aberrations. Low-pass whole-genome sequencing data, with approximately 0.01x read coverage, allowed the rapid ≤10 hour analysis of aneuploidies from research samples with extremely low initial input DNA amounts—even from a single cell. Using a control set of 10 samples with known chromosomal aberrations, 100% of the copy number changes were found, ranging from gains or losses of whole chromosomes to subchromosomal alterations tens of megabases (Mb) in size. The Ion PGM™ System minimizes the high cost and complexity of next-generation sequencing and, with Ion Reporter™ Software, facilitates user-defined CNV and aneuploidy detection, with three sensitivity options so that copy number analysis workflows can be tuned to achieve desired levels of sensitivity and specificity.
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...QIAGEN
Liquid biopsies enable us to monitor the evolution of genetic aberrations in primary tumors as they shed the tumor cells into the circulation. The limitation is the ability to detect these low frequency genetic aberrations in a consistent manner to understand short- and long-term implications and how this information will be used in the clinic. This slidedeck will cover the challenges and solutions associated with multiple steps as one starts with liquid biopsy and move towards finding a new biomarker.
Speaker: Benedict C. S. Cross, PhD, Team leader (Discovery Screening), Horizon Discovery
CRISPR–Cas9 mediated genome editing provides a highly efficient way to probe gene function. Using this technology, thousands of genes can be knocked out and their function assessed in a single experiment. We have conducted over 150 of these complex and powerful screens and will use our experience to guide you through the process of screen design, performance and analysis.
We'll be discussing:
• How to use CRISPR screening for target ID and validation, understanding drug MOA and patient stratification
• The screen design, quality control and how to evaluate success of your screening program
• Horizon’s latest developments to the platform
• Horizon’s novel approaches to target validation screening
Presentation carried out by Sergi Beltran Agulló, from the CNAG, at the course: Identification and analysis of sequence variants in sequencing projects: fundamentals and tools .
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...Thermo Fisher Scientific
Thousands of genes are expressed in a controlled fashion in each eukaryotic cell
determining what a cell can do and dictate normal tissue function. The measurement of
the entire gene expression pattern of a given sample is critical in understanding the
natural homeostatic state of a healthy tissue, as well as providing useful information
when a system is altered due to environmental queues or potentially disease state.
Many technologies have been utilized to measure the entire gene expression profile of a
RNA test sample. DNA microarrays have become a key method to acquire a
comparative snapshot of the gene expression profile from test samples in a high
throughput manner. Quantitative PCR and newer sequencing techniques are popular
alternatives offering highly accurate gene expression measurements, but with limitations
due to cost and complex analysis needs.
To address the challenges of current sequencing based methods of global gene
expression profiling and take advantage of the simplicity of analysis that comes with
defined expression profiling content from technologies such as microarrays, we have
tested the Ion AmpliSeq™ Transcriptome Human Gene Expression Kit using RNA
isolated from invasive ductal tumor samples. This novel approach allows profiling the
global mRNA expression of human RNA in a highly multiplexed fashion using the Ion
Torrent sequencing platform. The results show detection of more genes than popular
microarray platforms with comparable differential gene expression measurements to
quantitative PCR (r = 0.96) and RNA-Seq methods (r = 0.94).
Data presented here demonstrates high on target mapping (>91% of reads) for all
human breast carcinoma libraries. Gene expression values correlated with R>0.99 for
all technical replicates. We saw >64% of the over 22,800 genes in the single pool panel
detected for all libraries. The most highly expressed genes include genes expected to
be over-expressed in breast tumor samples. The Ion AmpliSeq™ Transcriptome Human
Gene Expression Kit is a novel method to measure global gene expression profiles from
human RNA samples in a timely, cost effective, and high throughput manner resulting in
sensitive and accurate gene expression measurements.
Next generation sequencing of the whole transcriptome enables high resolution measurement of gene expression activity in different tissue and cell types. This methodology provides an in depth study of known transcripts and depending on the data analysis, allows identification of additional transcript types such as transcript variants, fusion transcripts, and small and long ncRNAs.
In this study we performed RNA-Seq using the Ion Torrent™ sequencing platform to compare the expression profile of testicular germ cell cancers (seminoma type, n=3) and normal testis (n=3). Using Partek Flow® 3.0 and TopHat/BowTie or Star aligners, we aligned the reads to the human genome and mapped sequences to the RefSeq database. Differentially expressed genes were identified and screened with additional germ cell tumors.
PCA analysis showed clear separation of the two sample types indicating biological differences. List of differentially expressed genes generated from TopHat/Bowtie and Star were similar. We identified a large number of genes that were up and down regulated with high degree of significance (p<0.01,>2X FC (fold change)). These included genes related to testicular tissue type, stem cell pluripotency (NANOG; POU5F1) and proliferation (KRAS, CCND2).
In addition, a number of differentially expressed noncoding RNAs were identified (SNORD12B, XIST). The method was validated on a small set of genes (n=20) using qPCR (TaqMan® Assays) and were found to be correlated. We used the OpenArray® platform to quickly and quantitatively screen 102 differentially expressed genes and 10 endogenous control genes across a number of different testicular germ cell cancer types.
We used a complete work flow solution from sample prep to NGS to qPCR to compare the expression profile of normal testis and seminoma type germ cell tumors. From the NGS experiments we identified a large number of differentially expressed genes for qPCR screening with samples from different types of germ cell tumors. Results from these screening studies will be presented.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic Data Analysis
1.
2. Special Thank you to:Dr. Vladimir Galatenko, Chief Scientist at the
Tauber Bioinformatics Research Center. His work is
focused on issues related to Big Data analysis and,
in particular, on integration of multi-omics datasets.
A special research interest of Dr. Galatenko is
related to feature selection which is vital for efficient
development of clinical test systems.
Julia Panov, Ph.D. student involved in a number of
neuroscience research projects, an experienced
bioinformatics user. She relies on the T-BioInfo
platform for regular processing and integration of
omics data, collaborating with TBRC research
group on platform development. Dr. Javeed Iqbal, UNMC
3. Biological Examples and Reference Data sets:
• “Modeling precision treatment of breast cancer”, Daemen et. al.
(https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
• “Whole transcriptome profiling of patient-derived xenograft models as a tool to identify both
tumor and stromal specific biomarkers” Bradford et. al.
(http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014
&path[]=23533), and
• Processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
4. 1. Next Generation Sequencing data pre-processing:
• Trimming technical sequences
• Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
• Conventional pipelines (looking at known transcripts)
• Identification of novel isoforms
Processing of NGS data:
28. RNA-seq: per-sample processing
Preprocessing:
• Adapters removal plus additional trimming
• Removing PCR duplicates
Mapping
• Mapping on the set of known transcripts
• Mapping on genome (and potential identification of novel transcripts)
• Combined strategy
Quantification of expression levels
28
29. RNA-seq: Comments
PCR removal should be used with caution to avoid removing natural
duplicates (valuable links:
http://www.cureffi.org/2012/12/11/how-pcr-duplicates-arise-in-next-generation-sequencing/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4965708/ - DNA-seq and variant calling
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4597324/ - RNA-seq, ChIP-seq data
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3871669/ - trimming
29
32. RNA-seq: expression level quantification
Standard measures
• read counts (raw, expected)
• FPKM – fragments per kilo base per million mapped reads:
Number of reads mapped on the gene /
((total number of mapped reads – in millions) x (gene length – in kilobases))
• TPM – transcripts per million
For one sample TPMg = C x FPKMg, where C is selected in such a way that sum of all
TPMg is one million. But constants C are different for different samples.
32
33. RNA-seq: expression level quantification
Alternative definition of TPM:
(Number of reads mapped on the gene x read mean length x 106) /
(gene length x T),
where T is the sum over all genes of
(Number of reads mapped on the gene x read mean length) /
gene length
Each term here represents the number of sampled transcripts corresponding to a gene, and T estimates the
total number of sampled transcripts (molecules). Thus, TPM is the estimate of the number of transcripts
corresponding to a gene in every million transcripts.
Details: Wagner G.P., Kin K., Lynch V.J. (Theory Biosci., 2012) https://www.ncbi.nlm.nih.gov/pubmed/22872506
33
34. RNA-seq: expression level quantification
Linear scale vs Log-scale
Relative differences are biologically more meaningful than absolute.
Computations are simplified if a log-scaling is performed:
Log-scaled measure = log2 (linear-scale measure + shift)
For relatively large values a difference equal to 1 in log-scale is a 2x difference in linear
scale; difference equal to 3 in log-scale is a 8x difference in linear scale, etc.; difference
equal to -1 in log-scale is a 2x difference in linear scale, but in the opposite direction.
34
73. Unsupervised analysis: K-means, 15 genes
“The SUM52PE cell line was derived from a pleural effusion and was found to be
negative for ER and PR expression, however the original primary tumor from this
patient was positive for both hormone receptors”.
Chavez KJ, Garimella SV, Lipkowitz S. Triple negative breast cancer cell lines: one tool in the
search for better treatment of triple negative breast cancer. Breast Dis. 2010; 32(1-2):35-48.
Ethier SP, Kokeny KE, Ridings JW, Dilts CA. erbB family receptor expression and growth regulation
in a newly isolated human breast cancer cell line. Cancer Res. 1996; 56(4): 899-907.
73
84. Differential expression analysis
Quantities related to the degree of differential
expression:
• Difference between mean expression levels – fold
change (please, pay attention to scale);
• Statistical significance – p-value, adjusted p-value
(e.g., FDR)
• Expression level magnitude (caution with low-
expressed genes from the analysis).
84
87. Gene set / pathway enrichment analysis
Possible options:
• Use only lists (thresholding required): one of the standard
tools here is The Database for Annotation, Visualization and
Integrated Discovery – DAVID
(https://david.ncifcrf.gov/home.jsp, https://david-
d.ncifcrf.gov/).
• Take into consideration degrees of differential expression;
• Additionally take into consideration pathway topology.
87
Welcome to our first workshop of this kind – we are constantly experimenting, so hopefully this experiment will be successful. Our goal is to share with you several important concepts around Next Generation Sequencing Analysis techniques, specifically how to process, analyze and annotate gene expression data.
Before we start, I would like to say a special thank you to Dr. Javeed Iqbal, whom I am sure you all know from University of Nebraska Medical Center. He has been a tremendous help organizing the venue and sharing updates about the workshop with many of you. Also, let me introduce our speakers today – Dr. Vladimir Galatenko, the chief scientist at the Tauber Bioinformatics Research Center. Together with Dr. Galatenko we invited Julia Panov, a Ph.D. student who regularly relies on the T-BioInfo platform in her research
In this workshop, we will utilize oncology-related public-domain datasets derived from cell lines, animal models and if we have time, will touch on TCGA data. I also want to mention that these are projects prepared as examples for this workshop, however one of our goals is to identify key topics of interest for future workshops and online courses we are developing. We would be happy to speak with you afterwards about topics of interest, pathologies or other types of data of interest.
We will cover important topics about Next Generation sequencing data: pre-processing and quantification of expression levels