This document discusses various techniques for normalizing gene expression data from microarray experiments, including total intensity normalization, normalization using regression techniques, and normalization using ratio statistics. It also covers calculating distances between gene expression profiles using measures like Euclidean distance and Pearson correlation coefficient. Finally, it examines clustering analysis methods like hierarchical and non-hierarchical clustering that can group genes or samples based on similar expression patterns.
Synopsis
Introduction
Some Facts
Types of SNPs
SNPs act as gene markers
Methods of Detection
Techniques to detect SNPs
Allelic Specific Cleavage
Differential Hybridization
Single Base Extension or minisequencing
Alternate Methods for Detecting SNPs
Mass Spectrometry
Microchips
SIGNIFICANCE OF SNPs
HAPLOTYPE
ADVANTAGES
Are SNP data available to the public?
Some important SNP database Resources
CONCLUSION
References
O.M.GSEA - An in-depth introduction to gene-set enrichment analysisShana White
An comprehensive overview of 'classic' gene-set enrichment analysis that was presented for a Biostatistics/Bioinformatics divisional seminar. Supplemental slides (58+) include details for running GSEA with a variety of options (GUI, R script, R package)
Comparative genomic hybridization is a molecular cytogenetic method for analysing copy number variations (CNVs) relative to ploidy level in the DNA of a test sample compared to a reference sample, without the need for culturing cells
Synopsis
Introduction
Some Facts
Types of SNPs
SNPs act as gene markers
Methods of Detection
Techniques to detect SNPs
Allelic Specific Cleavage
Differential Hybridization
Single Base Extension or minisequencing
Alternate Methods for Detecting SNPs
Mass Spectrometry
Microchips
SIGNIFICANCE OF SNPs
HAPLOTYPE
ADVANTAGES
Are SNP data available to the public?
Some important SNP database Resources
CONCLUSION
References
O.M.GSEA - An in-depth introduction to gene-set enrichment analysisShana White
An comprehensive overview of 'classic' gene-set enrichment analysis that was presented for a Biostatistics/Bioinformatics divisional seminar. Supplemental slides (58+) include details for running GSEA with a variety of options (GUI, R script, R package)
Comparative genomic hybridization is a molecular cytogenetic method for analysing copy number variations (CNVs) relative to ploidy level in the DNA of a test sample compared to a reference sample, without the need for culturing cells
Genome-wide association study (GWAS) technology has been a primary method for identifying the genes responsible for diseases and other traits for the past ten years. GWAS continues to be highly relevant as a scientific method. Over 2,000 human GWAS reports now appear in scientific journals. Our free eBook aims to explain the basic steps and concepts to complete a GWAS experiment.
The change in one nucleotide in a genome is known as single nucleotide polymorphism. There are assorted types of SNPs. SNPs can be detected by several analytical techniques i.e. DNA sequencing, microchip, HPLC and oligonucleotide ligation reaction.
Genotyping by Sequencing is a robust,fast and cheap approach for high throughput marker discovery.It has applications in crop improvement programs by enhancing identification of superior genotypes.
RNA-seq for DE analysis: detecting differential expression - part 5BITS
Part 5 of the training sesson 'RNA-seq for differential expression analysis' considers the algorithm used for detecting differential expression between conditions. See http://www.bits.vib.be
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
complete Single Nucleotide Polymorphiitsm Detection methods with Advance techniques with its applications
Single nucleotide polymorphisms are single base variations between genomes within a species.
There are at least 10 million polymorphic sites in the human genome.
SNPs can distinguish individuals from one another
Denaturing Gradient Gel Electrophoresis
Chemical Cleavage Of Mismatch
Single-stranded Conformation Polymorphism (SSCP)
MutS Protein-binding Assays
Mismatch Repair Detection (MRD)
Heteroduplex Analysis (HA)
Denaturing High Performance Liquid Chromatography (DHPLC)
UNG-Mediated T-Sequencing
RNA-Mediated Finger printing with MALDI MS Detection
Sequencing by Hybridization
Direct DNA Sequencing
Single-feature polymorphism (SFP)
Invader probe
Allele-specific oligonucleotide probes
PCR-based methods
Allele specific primers
Sequence Polymorphism-Derived (SPD) markers
Targeting induced local lesions in genomes (TILLinG)
Minisequencing primers
Allele-specific ligation probes
Genome-wide association study (GWAS) technology has been a primary method for identifying the genes responsible for diseases and other traits for the past ten years. GWAS continues to be highly relevant as a scientific method. Over 2,000 human GWAS reports now appear in scientific journals. Our free eBook aims to explain the basic steps and concepts to complete a GWAS experiment.
The change in one nucleotide in a genome is known as single nucleotide polymorphism. There are assorted types of SNPs. SNPs can be detected by several analytical techniques i.e. DNA sequencing, microchip, HPLC and oligonucleotide ligation reaction.
Genotyping by Sequencing is a robust,fast and cheap approach for high throughput marker discovery.It has applications in crop improvement programs by enhancing identification of superior genotypes.
RNA-seq for DE analysis: detecting differential expression - part 5BITS
Part 5 of the training sesson 'RNA-seq for differential expression analysis' considers the algorithm used for detecting differential expression between conditions. See http://www.bits.vib.be
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
complete Single Nucleotide Polymorphiitsm Detection methods with Advance techniques with its applications
Single nucleotide polymorphisms are single base variations between genomes within a species.
There are at least 10 million polymorphic sites in the human genome.
SNPs can distinguish individuals from one another
Denaturing Gradient Gel Electrophoresis
Chemical Cleavage Of Mismatch
Single-stranded Conformation Polymorphism (SSCP)
MutS Protein-binding Assays
Mismatch Repair Detection (MRD)
Heteroduplex Analysis (HA)
Denaturing High Performance Liquid Chromatography (DHPLC)
UNG-Mediated T-Sequencing
RNA-Mediated Finger printing with MALDI MS Detection
Sequencing by Hybridization
Direct DNA Sequencing
Single-feature polymorphism (SFP)
Invader probe
Allele-specific oligonucleotide probes
PCR-based methods
Allele specific primers
Sequence Polymorphism-Derived (SPD) markers
Targeting induced local lesions in genomes (TILLinG)
Minisequencing primers
Allele-specific ligation probes
NCI Cancer Genomics, Open Science and PMI: FAIR Warren Kibbe
Talk given to the NLM Fellows on July 8, 2016. Touches on Cancer Genomics, Open Science and PMI: FAIR in NCI genomics thinking and projects. Includes discussion of the Genomic Data Commons (GDC), Cancer Data Ecosystem, Data sharing, and the NCI cancer clinical trials open API.
NCI Cancer Imaging Program - Cancer Research Data EcosystemWarren Kibbe
Given to the NCI Cancer Imaging Program monthly telecon on January 9th, 2017. NCI Genomic Data Commons, Beau Biden Cancer Moonshot Blue Ribbon Panel, Cancer Research Data Ecosystem and the role of imaging in precision medicine
A micro-array is a tool for analyzing gene expression that consists of a small membrane or glass slide containing samples of many genes arranged in a regular pattern.
This was made by me while I was in Masters. I have made few animations. I hope it makes understanding better.
The content is made by searching through internet and referencing books. I do not claim any content in whole presentation except the animations made on the subject.
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptxPABOLU TEJASREE
• The loci controlling quantitative traits are called quantitative trait loci or QTL.
• Term first coined by Gelderman in 1975.
Principles of QTL mapping
• Genes and markers segregate via chromosome recombination during meiosis, thus allowing their analysis in the progeny.
• The detection of association between phenotype and genotype of markers.
• QTL analysis depends on the linkage disequilibrium.
• QTL analysis is usually undertaken in segregating mapping populations.
Key steps for the QTL mapping
• Collection of parental strains that differ for traits of interest
• Selection of molecular markers such as RFLP, SSR and SNP that distinguish between the two parents
• Development of a mapping population
• Genotyping and phenotyping of the mapping population
• Detection of QTL using a suitable statistical method
• For practical purposes, in general recombination events considered to be less than 10 recombinations per 100 meiosis, or a map distance of less than 10 centi Morgans(cM).
Identification of Differentially Expressed Genes by unsupervised Learning Methodpraveena06
Abstract-Microarrays are one of the latest breakthroughs in experimental molecular biology that allow monitoring of gene expression of tens of thousands of genes in parallel. Micro array analysis include many stages. Extracting samples from the cells, getting the gene expression matrix from the raw data, and data normalization which are low level analysis.Cluster analysis for genome-wide expression data from DNA micro array data is described as a high level analysis that uses standard statistical algorithms to arrange genes according to similarity patterns of expression levels. This paper presents a method for the number of clusters using the divisive hierarchical clustering, and k-means clustering of significant genes. The goal of this method is to identify genes that are strongly associated with disease in 12607 genes. Gene filtering is applied to identify the clusters. k-means shows that about four to seven genes or less than one percent of the genes account for the disease group which are the outliers, more than seventy percent falls as undefined group. The hierarchical clustering dendo gram shows clusters at two levels which shows again less than one percent of the genes are differentially expressed.
Metabolomics is a study of a complete set of metabolites in a specific cell or organism. Metabolomics analysis aims at simultaneous identification and quantitative analysis of intracellular metabolites. Since metabolomics is focused on a whole set of metabolites, it reflects the metabolomics activity of the organism, and hence, allows researchers to explore the biological system. An Accurate study on metabolomics relies on sensitive and sophisticated analytic platforms and bioinformatics analysis systems. With years of developing and refining our bioinformatics analysis system, Creative Proteomics offers comprehensive bioinformatics support to our clients’ research! https://www.creative-proteomics.com/services/bioinformatic-analysis-for-metabolomics-study.htm
The basics of data processing are to convert the original data file into a representation to help easily access the characteristics of each observed ion. These characteristics include ion retention time and m/z time, as well as ion intensity measurements in each raw data file. In addition to these basic features, data processing can also extract other information, such as the isotope distribution of ions. https://www.creative-proteomics.com/services/bioinformatic-univariate-analysis-service.htm
This slide contains the detailed information of bhageerath H tool for homology modelling (for tertiary structure prediction) designed by SCFBio, IIT Delhi.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
1.4 modern child centered education - mahatma gandhi-2.pptx
Microarray and its application
1. Made by - SHRUTI DABRAL
Measuring dissimilarity of
expression pattern
2.
3.
4.
5.
6.
7.
8. NORMALIZATION
There are three widely used techniques that can be used to normalize
gene-expression data from a single array hybridization. All of these
assume that all (or most) of the genes in the array, some subset of
genes, or a set of exogenous controls that have been ‘spiked’ into the
RNA before labelling, should have an average expression ratio equal
to one. The normalization factor is then used to adjust the data to
compensate for experimental variability and to ‘balance’ the
fluorescence signals from the two samples being compared.
Total intensity normalization
Total intensity normalization data relies on the assumption that the
quantity of initial mRNA is the same for both labeled samples.
Furthermore, one assumes that some genes are upregulated in the
query sample relative to the control and that others are
downregulated. For the hundreds or thousands of genes in the array,
these changes should balance out so that the total quantity of RNA
hybridizing to the array from each sample is the same. Under this
assumption, a normalization factor can be calculated and used to re-
scale the intensity for each gene in the array.
9. Normalization using regression techniques
For mRNA derived from closely related samples,
a significant fraction of the assayed genes would
be expected to be expressed at similar levels.
Normalization of these data is equivalent to
calculating the best-fit slope using regression
techniques and adjusting the intensities so that
the calculated slope is one. In many experiments,
the intensities are nonlinear, and local regression
techniques are more suitable, such as LOWESS
(LOcally WEighted Scatterplot Smoothing)
regression.
10. Normalization using ratio statistics
A third normalization option is a method based on
the ratio statistics described by Chen et al.20. They
assume that although individual genes might be up-
or downregulated, in closely related cells, the total
quantity of RNA produced is approximately the same
for essential genes, such as ‘housekeeping genes’.
Using this assumption, they develop an approximate
probability density for the ratio Tk = Rk /Gk (where
Rk and Gk are, respectively, the measured red and
green intensities for the kth array element). They then
describe how this can be used in an iterative process
that normalizes the mean expression ratio to one and
calculates confidence limits that can be used to
identify differentially expressed genes.
11.
12. Gene expression matrix that contains rows representing
genes and columns representing particular conditions.
Each cell contains a value, given in arbitrary units, that refl
ects the expression level of a gene under a corresponding
condition. B: Condition C4 is used as a reference and all
other conditions are normalized with respect to C4 to
obtain expression ratios. C: In this table all expression
ratios were converted into the log2 (expression ratio)
values. This representation has an advantage of treating
up-regulation and down-regulation on comparable scales.
D: Discrete values for the elements in Table 1.C. Genes
with log2 (expression ratio) values greater than 1 were
changed to 1, genes with values less than –1 were
changed to –1. Any value between –1 and 1 was changed
to 0.
13.
14.
15.
16.
17.
18. Distance measures
Analysis of gene expression data is primarily
based on comparison of gene expression profiles
or sample expression profiles. In order to
compare expression profiles, we need a measure
to quantify how similar or dissimilar are the
objects that are being considered. A variety of
distance measures can be used to calculate
similarity in expression profiles and these are
discussed below.
19. Euclidean distance
Euclidean distance is one of the common distance
measures used to calculate similarity between
expression profiles. The Euclidean distance
between two vectors of dimension 2, say A=[a1, a2]
and B=[b1, b2] can be calculated as:
D(A,B)=
In other words, the Euclidean distance between two
genes is the square root of the sum of the squares
of the distances between the values in each
condition (dimension).
(a1-b1)(a1-b1) + (a2-b2)(a2-b2)
20. Pearson correlation coefficient =One of the
most commonly used metrics to measure
similarity between expression profiles is the
Pearson correlation coeffi cient (PCC) (Eisen et
al. 1998). Given the expression ratios for two
genes under three conditions A=[a1, a2, a3] and
B=[b1, b2, b3], PCC can be computed as follows:
Step1: Compute mean of A and B
Step2: “Mean centre” expression profiles
Step3: Calculate PCC as the cosine of the angle
between the mean-centred profi les
22. Step3: Calculate PCC as the cosine of the angle between
the mean-centred profiles
23. The reason why we “mean centre” the expression
profiles is to make sure that we compare shapes of
the expression profiles and not their magnitude.
Mean centring maintains the shape of the profile,
but it changes the magnitude of the profile as shown
in . A PCC value of 1 essentially means that the
two genes have similar expression profiles and
a value of –1 means that the two genes have
exactly opposite expression profiles. A value of
0 means that no relationship can be inferred
between the expression profiles of genes. In
reality, PCC values range from –1 to +1. A PCC
value ≥ 0.7 suggests that the genes behave similarly
and a PCC value ≤ -0.7 suggests that the genes
have opposite behavior. The value of 0.7 is an
arbitrary cut-off, and in real cases this value can be
chosen depending on the dataset used.
24.
25.
26.
27.
28.
29. CLUSTER ANALYSIS
The term ‘cluster analysis’ actually encompasses
several different classification algorithms that can
be used to develop taxonomies (typically as part
of exploratory data analysis). Note that in this
classification, the higher the level of aggregation,
the less similar are members in the respective
class.
30.
31. Clustering methods can be hierarchical
(grouping objects into clusters and specifying
relationships among objects in a cluster,
resembling a phylogenetic tree) or non-
hierarchical (grouping into clusters without
specifying relationships between objects in a
cluster) as schematically represented in Figure 5.
Remember, an object may refer to a gene or a
sample, and a cluster refers to a set of objects
that behave in a similar manner.
32. Hierarchical methods
• Hierarchical clustering methods produce a tree or
dendrogram.
• They avoid specifying how many clusters are
appropriate
by providing a partition for each k obtained from
cutting
the tree at some level.
• The tree can be built in two distinct ways
– bottom-up: agglomerative clustering;
– top-down: divisive clustering.
33.
34. • Single-linkage clustering
The distance between two clusters, i and j, is
calculated as the minimum distance between a
member of cluster i and a member of cluster j.
Consequently, this technique is also referred to as
the minimum, or nearestneighbour, method. This
method tends to produce clusters that are ‘loose’
because clusters can be joined if any two
members are close together. In particular, this
method often results in ‘chaining’, or the
sequential addition of single samples to an
existing cluster. This produces trees with many
long, single-addition branches representing
clusters that have grown by accretion.
35. • Complete-linkage clustering.
Complete-linkage clustering is also known as the
maximum or furthest-neighbour method. The distance
between two clusters is calculated as the greatest
distance between members of the relevant clusters. Not
surprisingly, this method tends to produce very compact
clusters of elements and the clusters are often very
similar in size.
• Average-linkage clustering
The distance between clusters is calculated using
average values. There are, in fact, various methods for
calculating averages. The most common is the
unweighted pair-group method average (UPGMA). The
average distance is calculated from the distance between
each point in a cluster and all other points in another
cluster. The two clusters with the lowest average distance
are joined together to form a new cluster. Related
36. Centroid linkage clustering
In centroid linkage clustering, an average
expression profile (called a centroid) is calculated
in two steps. First, the mean in each dimension of
the expression profiles is calculated for all objects
in a cluster. Then, distance between the clusters
is measured as the distance between the average
expression profiles of the two clusters.
37. Distances between clusters used for
hierarchical clustering
• Calculation of the distance between two clusters
is based on the pairwise distances between
members of the clusters
– Mean-link: average of pairwise dissimilarities
– Single-link: minimum of pairwise dissimilarities.
– Complete-link: maximum& of pairwise
dissimilarities.
– Distance between centroids
• Complete linkage gives preference to
compact/spherical
clusters.
• Single linkage can produce long stretched clusters
38. Hierarchical clustering divisive
Hierarchical divisive clustering is the opposite of the
agglomerative method, where the entire set of objects is
considered as a single cluster and is broken down into two or
more clusters that have similar expression profiles. After this is
done, each cluster is considered separately and the divisive
process is repeated iteratively until all objects have been
separated into single objects. The division of objects into
clusters on each iterative step may be decided upon by
principal component analysis which determines a vector that
separates given objects. This method is less popular than
agglomerative clustering, but has successfully been used in
the analysis of gene expression data by Alon et al. (1999).
Non-hierarchical clustering One of the major criticisms of
hierarchical clustering is that there is no compelling evidence
that a hierarchical structure best suits grouping of the
expression profi les. An alternative to this method is a non-
hierarchical clustering, which requires predetermination of the
number of clusters. Non-hierarchical clustering then groups
existing objects into these predefi ned clusters rather than
organizing them into a hierarchical structure.
47. RELATING EXPRESSION DATA TO OTHER
BIOLOGICAL INFORMATION
Predicting binding sites
Predicting protein interactions and protein
functions
Predicting functionally conserved modules
‘Reverse-engineering’ of gene regulatory
networks