SlideShare a Scribd company logo
Made by - SHRUTI DABRAL
Measuring dissimilarity of
expression pattern
NORMALIZATION
There are three widely used techniques that can be used to normalize
gene-expression data from a single array hybridization. All of these
assume that all (or most) of the genes in the array, some subset of
genes, or a set of exogenous controls that have been ‘spiked’ into the
RNA before labelling, should have an average expression ratio equal
to one. The normalization factor is then used to adjust the data to
compensate for experimental variability and to ‘balance’ the
fluorescence signals from the two samples being compared.
 Total intensity normalization
Total intensity normalization data relies on the assumption that the
quantity of initial mRNA is the same for both labeled samples.
Furthermore, one assumes that some genes are upregulated in the
query sample relative to the control and that others are
downregulated. For the hundreds or thousands of genes in the array,
these changes should balance out so that the total quantity of RNA
hybridizing to the array from each sample is the same. Under this
assumption, a normalization factor can be calculated and used to re-
scale the intensity for each gene in the array.
 Normalization using regression techniques
For mRNA derived from closely related samples,
a significant fraction of the assayed genes would
be expected to be expressed at similar levels.
Normalization of these data is equivalent to
calculating the best-fit slope using regression
techniques and adjusting the intensities so that
the calculated slope is one. In many experiments,
the intensities are nonlinear, and local regression
techniques are more suitable, such as LOWESS
(LOcally WEighted Scatterplot Smoothing)
regression.
 Normalization using ratio statistics
A third normalization option is a method based on
the ratio statistics described by Chen et al.20. They
assume that although individual genes might be up-
or downregulated, in closely related cells, the total
quantity of RNA produced is approximately the same
for essential genes, such as ‘housekeeping genes’.
Using this assumption, they develop an approximate
probability density for the ratio Tk = Rk /Gk (where
Rk and Gk are, respectively, the measured red and
green intensities for the kth array element). They then
describe how this can be used in an iterative process
that normalizes the mean expression ratio to one and
calculates confidence limits that can be used to
identify differentially expressed genes.
 Gene expression matrix that contains rows representing
genes and columns representing particular conditions.
Each cell contains a value, given in arbitrary units, that refl
ects the expression level of a gene under a corresponding
condition. B: Condition C4 is used as a reference and all
other conditions are normalized with respect to C4 to
obtain expression ratios. C: In this table all expression
ratios were converted into the log2 (expression ratio)
values. This representation has an advantage of treating
up-regulation and down-regulation on comparable scales.
D: Discrete values for the elements in Table 1.C. Genes
with log2 (expression ratio) values greater than 1 were
changed to 1, genes with values less than –1 were
changed to –1. Any value between –1 and 1 was changed
to 0.
 Distance measures
Analysis of gene expression data is primarily
based on comparison of gene expression profiles
or sample expression profiles. In order to
compare expression profiles, we need a measure
to quantify how similar or dissimilar are the
objects that are being considered. A variety of
distance measures can be used to calculate
similarity in expression profiles and these are
discussed below.
 Euclidean distance
Euclidean distance is one of the common distance
measures used to calculate similarity between
expression profiles. The Euclidean distance
between two vectors of dimension 2, say A=[a1, a2]
and B=[b1, b2] can be calculated as:
D(A,B)=
In other words, the Euclidean distance between two
genes is the square root of the sum of the squares
of the distances between the values in each
condition (dimension).
(a1-b1)(a1-b1) + (a2-b2)(a2-b2)
 Pearson correlation coefficient =One of the
most commonly used metrics to measure
similarity between expression profiles is the
Pearson correlation coeffi cient (PCC) (Eisen et
al. 1998). Given the expression ratios for two
genes under three conditions A=[a1, a2, a3] and
B=[b1, b2, b3], PCC can be computed as follows:
Step1: Compute mean of A and B
Step2: “Mean centre” expression profiles
Step3: Calculate PCC as the cosine of the angle
between the mean-centred profi les
Step2: “Mean centre” expression
profiles
Step3: Calculate PCC as the cosine of the angle between
the mean-centred profiles
 The reason why we “mean centre” the expression
profiles is to make sure that we compare shapes of
the expression profiles and not their magnitude.
Mean centring maintains the shape of the profile,
but it changes the magnitude of the profile as shown
in . A PCC value of 1 essentially means that the
two genes have similar expression profiles and
a value of –1 means that the two genes have
exactly opposite expression profiles. A value of
0 means that no relationship can be inferred
between the expression profiles of genes. In
reality, PCC values range from –1 to +1. A PCC
value ≥ 0.7 suggests that the genes behave similarly
and a PCC value ≤ -0.7 suggests that the genes
have opposite behavior. The value of 0.7 is an
arbitrary cut-off, and in real cases this value can be
chosen depending on the dataset used.
 CLUSTER ANALYSIS
The term ‘cluster analysis’ actually encompasses
several different classification algorithms that can
be used to develop taxonomies (typically as part
of exploratory data analysis). Note that in this
classification, the higher the level of aggregation,
the less similar are members in the respective
class.
 Clustering methods can be hierarchical
(grouping objects into clusters and specifying
relationships among objects in a cluster,
resembling a phylogenetic tree) or non-
hierarchical (grouping into clusters without
specifying relationships between objects in a
cluster) as schematically represented in Figure 5.
Remember, an object may refer to a gene or a
sample, and a cluster refers to a set of objects
that behave in a similar manner.
Hierarchical methods
• Hierarchical clustering methods produce a tree or
dendrogram.
• They avoid specifying how many clusters are
appropriate
by providing a partition for each k obtained from
cutting
the tree at some level.
• The tree can be built in two distinct ways
– bottom-up: agglomerative clustering;
– top-down: divisive clustering.
 • Single-linkage clustering
The distance between two clusters, i and j, is
calculated as the minimum distance between a
member of cluster i and a member of cluster j.
Consequently, this technique is also referred to as
the minimum, or nearestneighbour, method. This
method tends to produce clusters that are ‘loose’
because clusters can be joined if any two
members are close together. In particular, this
method often results in ‘chaining’, or the
sequential addition of single samples to an
existing cluster. This produces trees with many
long, single-addition branches representing
clusters that have grown by accretion.
• Complete-linkage clustering.
Complete-linkage clustering is also known as the
maximum or furthest-neighbour method. The distance
between two clusters is calculated as the greatest
distance between members of the relevant clusters. Not
surprisingly, this method tends to produce very compact
clusters of elements and the clusters are often very
similar in size.
• Average-linkage clustering
The distance between clusters is calculated using
average values. There are, in fact, various methods for
calculating averages. The most common is the
unweighted pair-group method average (UPGMA). The
average distance is calculated from the distance between
each point in a cluster and all other points in another
cluster. The two clusters with the lowest average distance
are joined together to form a new cluster. Related
 Centroid linkage clustering
In centroid linkage clustering, an average
expression profile (called a centroid) is calculated
in two steps. First, the mean in each dimension of
the expression profiles is calculated for all objects
in a cluster. Then, distance between the clusters
is measured as the distance between the average
expression profiles of the two clusters.
Distances between clusters used for
hierarchical clustering
• Calculation of the distance between two clusters
is based on the pairwise distances between
members of the clusters
– Mean-link: average of pairwise dissimilarities
– Single-link: minimum of pairwise dissimilarities.
– Complete-link: maximum& of pairwise
dissimilarities.
– Distance between centroids
• Complete linkage gives preference to
compact/spherical
clusters.
• Single linkage can produce long stretched clusters
 Hierarchical clustering divisive
Hierarchical divisive clustering is the opposite of the
agglomerative method, where the entire set of objects is
considered as a single cluster and is broken down into two or
more clusters that have similar expression profiles. After this is
done, each cluster is considered separately and the divisive
process is repeated iteratively until all objects have been
separated into single objects. The division of objects into
clusters on each iterative step may be decided upon by
principal component analysis which determines a vector that
separates given objects. This method is less popular than
agglomerative clustering, but has successfully been used in
the analysis of gene expression data by Alon et al. (1999).
Non-hierarchical clustering One of the major criticisms of
hierarchical clustering is that there is no compelling evidence
that a hierarchical structure best suits grouping of the
expression profi les. An alternative to this method is a non-
hierarchical clustering, which requires predetermination of the
number of clusters. Non-hierarchical clustering then groups
existing objects into these predefi ned clusters rather than
organizing them into a hierarchical structure.
Example using R
 RELATING EXPRESSION DATA TO OTHER
BIOLOGICAL INFORMATION
 Predicting binding sites
 Predicting protein interactions and protein
functions
 Predicting functionally conserved modules
 ‘Reverse-engineering’ of gene regulatory
networks
Study material
 http://www.mrc-
lmb.cam.ac.uk/genomes/madanm/microarray/cha
pter-final.pdf
 http://www.dbbm.fiocruz.br/class/Lecture/d17/arra
y_2/NatureReviews_JQ.pdf

More Related Content

What's hot

Ppt snp detection
Ppt snp detectionPpt snp detection
Ppt snp detection
ManishaJaglan1
 
GWAS
GWASGWAS
SNP ppt.pptx
SNP ppt.pptxSNP ppt.pptx
SNP ppt.pptx
Dr Alok Bharti
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Karan Veer Singh
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminarVarsha Gayatonde
 
Single Nucleotide Polymorphism
Single Nucleotide PolymorphismSingle Nucleotide Polymorphism
Single Nucleotide Polymorphism
FazeehaAmjad
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
mikaelhuss
 
next generation sequemcing
next generation sequemcingnext generation sequemcing
next generation sequemcing
Shahzebkhan135
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
Senthil Natesan
 
Single Nucleotide Polymorphism
Single Nucleotide PolymorphismSingle Nucleotide Polymorphism
Single Nucleotide Polymorphism
Deepender Kumar
 
Genome wide Association studies.pptx
Genome wide Association studies.pptxGenome wide Association studies.pptx
Genome wide Association studies.pptx
AkshitaAwasthi3
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
ajay301
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
BITS
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
James McInerney
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
AGRF_Ltd
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications
Aneela Rafiq
 
Seminar presentation on snp (mulualem & janvier)
Seminar presentation on snp (mulualem & janvier)Seminar presentation on snp (mulualem & janvier)
Seminar presentation on snp (mulualem & janvier)Mulualem Teshome
 
Next generation sequencing
Next  generation  sequencingNext  generation  sequencing
Next generation sequencing
Nidhi Singh
 

What's hot (20)

Ppt snp detection
Ppt snp detectionPpt snp detection
Ppt snp detection
 
GWAS
GWASGWAS
GWAS
 
SNP ppt.pptx
SNP ppt.pptxSNP ppt.pptx
SNP ppt.pptx
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
 
Single Nucleotide Polymorphism
Single Nucleotide PolymorphismSingle Nucleotide Polymorphism
Single Nucleotide Polymorphism
 
Tetra Arm PCR
Tetra Arm PCRTetra Arm PCR
Tetra Arm PCR
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
next generation sequemcing
next generation sequemcingnext generation sequemcing
next generation sequemcing
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
 
Single Nucleotide Polymorphism
Single Nucleotide PolymorphismSingle Nucleotide Polymorphism
Single Nucleotide Polymorphism
 
Molecular marker
Molecular markerMolecular marker
Molecular marker
 
Genome wide Association studies.pptx
Genome wide Association studies.pptxGenome wide Association studies.pptx
Genome wide Association studies.pptx
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications
 
Seminar presentation on snp (mulualem & janvier)
Seminar presentation on snp (mulualem & janvier)Seminar presentation on snp (mulualem & janvier)
Seminar presentation on snp (mulualem & janvier)
 
Next generation sequencing
Next  generation  sequencingNext  generation  sequencing
Next generation sequencing
 

Viewers also liked

Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
Sucheta Tripathy
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNAUlises Urzua
 
Applications of microarray
Applications of microarrayApplications of microarray
Applications of microarray
prateek kumar
 
prediction methods for ORF
prediction methods for ORFprediction methods for ORF
prediction methods for ORF
karamveer prajapat
 
Translating Cancer Genomes and Transcriptomes for Precision Oncology
Translating Cancer Genomes and Transcriptomes for Precision Oncology Translating Cancer Genomes and Transcriptomes for Precision Oncology
Translating Cancer Genomes and Transcriptomes for Precision Oncology
Wafaa Mowlabaccus
 
DNA Fingerprinting
DNA FingerprintingDNA Fingerprinting
DNA Fingerprinting
biomedicz
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;applicationFyzah Bashir
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR
Warren Kibbe
 
DNA microarray
DNA microarrayDNA microarray
DNA microarray
S Rasouli
 
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
NCI Cancer Imaging Program - Cancer Research Data EcosystemNCI Cancer Imaging Program - Cancer Research Data Ecosystem
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
Warren Kibbe
 
Dna finger printing
Dna finger printingDna finger printing
Dna finger printing
kaniganti sirisha
 
Microarray CGH
Microarray CGHMicroarray CGH
Microarray CGH
Pinal Chaudhari
 
DNA microarray
DNA microarrayDNA microarray
DNA microarray
Mahe'ndra Reddy
 
Microarray
MicroarrayMicroarray
Microarray
ruchibioinfo
 
Dna microarray (dna chips)
Dna microarray (dna chips)Dna microarray (dna chips)
Dna microarray (dna chips)Rachana Tiwari
 
DNA microarray final ppt.
DNA microarray final ppt.DNA microarray final ppt.
DNA microarray final ppt.
Aashish Patel
 

Viewers also liked (18)

Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
Applications of microarray
Applications of microarrayApplications of microarray
Applications of microarray
 
Dna microarray application in vp research mehran
Dna microarray application in vp research  mehranDna microarray application in vp research  mehran
Dna microarray application in vp research mehran
 
prediction methods for ORF
prediction methods for ORFprediction methods for ORF
prediction methods for ORF
 
Translating Cancer Genomes and Transcriptomes for Precision Oncology
Translating Cancer Genomes and Transcriptomes for Precision Oncology Translating Cancer Genomes and Transcriptomes for Precision Oncology
Translating Cancer Genomes and Transcriptomes for Precision Oncology
 
DNA Fingerprinting
DNA FingerprintingDNA Fingerprinting
DNA Fingerprinting
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;application
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR
 
DNA microarray
DNA microarrayDNA microarray
DNA microarray
 
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
NCI Cancer Imaging Program - Cancer Research Data EcosystemNCI Cancer Imaging Program - Cancer Research Data Ecosystem
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
 
Dna finger printing
Dna finger printingDna finger printing
Dna finger printing
 
Microarray CGH
Microarray CGHMicroarray CGH
Microarray CGH
 
DNA microarray
DNA microarrayDNA microarray
DNA microarray
 
Microarray
MicroarrayMicroarray
Microarray
 
MICROARRAY
MICROARRAYMICROARRAY
MICROARRAY
 
Dna microarray (dna chips)
Dna microarray (dna chips)Dna microarray (dna chips)
Dna microarray (dna chips)
 
DNA microarray final ppt.
DNA microarray final ppt.DNA microarray final ppt.
DNA microarray final ppt.
 

Similar to Microarray and its application

Assessing the compactness and isolation of individual clusters
Assessing the compactness and isolation of individual clustersAssessing the compactness and isolation of individual clusters
Assessing the compactness and isolation of individual clustersperfj
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Avijit Famous
 
Literature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesLiterature Survey On Clustering Techniques
Literature Survey On Clustering Techniques
IOSR Journals
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis
Baivab Nag
 
Ch 4 Cluster Analysis.pdf
Ch 4 Cluster Analysis.pdfCh 4 Cluster Analysis.pdf
Ch 4 Cluster Analysis.pdf
YaseenRashid4
 
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptx
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptxQTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptx
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptx
PABOLU TEJASREE
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
Valerii Klymchuk
 
Identification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning MethodIdentification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning Method
praveena06
 
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForestWisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForestSheing Jing Ng
 
Survey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue ClassificationSurvey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue Classificationperfj
 
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdfphylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
alizain9604
 
Phylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny ofPhylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny of
bhavnesthakur
 
Bioinformatics Analysis of Metabolomics Data
Bioinformatics Analysis of Metabolomics DataBioinformatics Analysis of Metabolomics Data
Bioinformatics Analysis of Metabolomics Data
Creative Proteomics
 
Metabolomics Bioinformatics Analysis.pdf
Metabolomics Bioinformatics Analysis.pdfMetabolomics Bioinformatics Analysis.pdf
Metabolomics Bioinformatics Analysis.pdf
Creative Proteomics
 
s.analysis
s.analysiss.analysis
s.analysis
kavi ...
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerDennis Sweitzer
 
Analysis of gene expression
Analysis of gene expressionAnalysis of gene expression
Analysis of gene expression
university of education,Lahore
 
Gene Array Analyzer
Gene Array AnalyzerGene Array Analyzer
Gene Array Analyzer
university of education,Lahore
 

Similar to Microarray and its application (20)

Assessing the compactness and isolation of individual clusters
Assessing the compactness and isolation of individual clustersAssessing the compactness and isolation of individual clusters
Assessing the compactness and isolation of individual clusters
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Literature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesLiterature Survey On Clustering Techniques
Literature Survey On Clustering Techniques
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis
 
Ch 4 Cluster Analysis.pdf
Ch 4 Cluster Analysis.pdfCh 4 Cluster Analysis.pdf
Ch 4 Cluster Analysis.pdf
 
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptx
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptxQTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptx
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptx
 
Sequence alignment belgaum
Sequence alignment belgaumSequence alignment belgaum
Sequence alignment belgaum
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Devin Petersohn Poster
Devin Petersohn PosterDevin Petersohn Poster
Devin Petersohn Poster
 
Identification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning MethodIdentification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning Method
 
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForestWisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
 
Survey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue ClassificationSurvey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue Classification
 
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdfphylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
 
Phylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny ofPhylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny of
 
Bioinformatics Analysis of Metabolomics Data
Bioinformatics Analysis of Metabolomics DataBioinformatics Analysis of Metabolomics Data
Bioinformatics Analysis of Metabolomics Data
 
Metabolomics Bioinformatics Analysis.pdf
Metabolomics Bioinformatics Analysis.pdfMetabolomics Bioinformatics Analysis.pdf
Metabolomics Bioinformatics Analysis.pdf
 
s.analysis
s.analysiss.analysis
s.analysis
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzer
 
Analysis of gene expression
Analysis of gene expressionAnalysis of gene expression
Analysis of gene expression
 
Gene Array Analyzer
Gene Array AnalyzerGene Array Analyzer
Gene Array Analyzer
 

More from prateek kumar

docking
docking docking
docking
prateek kumar
 
RAPD, AFLP AND RFLP ANALYSIS
RAPD, AFLP AND RFLP ANALYSISRAPD, AFLP AND RFLP ANALYSIS
RAPD, AFLP AND RFLP ANALYSIS
prateek kumar
 
Bhageerath h
Bhageerath  h Bhageerath  h
Bhageerath h
prateek kumar
 
DNA sequencing
DNA sequencingDNA sequencing
DNA sequencing
prateek kumar
 
Genomic variation
Genomic variationGenomic variation
Genomic variation
prateek kumar
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
prateek kumar
 
2 d gel analysis
2 d gel analysis2 d gel analysis
2 d gel analysis
prateek kumar
 
2 d gel analysis
2 d gel analysis2 d gel analysis
2 d gel analysis
prateek kumar
 

More from prateek kumar (8)

docking
docking docking
docking
 
RAPD, AFLP AND RFLP ANALYSIS
RAPD, AFLP AND RFLP ANALYSISRAPD, AFLP AND RFLP ANALYSIS
RAPD, AFLP AND RFLP ANALYSIS
 
Bhageerath h
Bhageerath  h Bhageerath  h
Bhageerath h
 
DNA sequencing
DNA sequencingDNA sequencing
DNA sequencing
 
Genomic variation
Genomic variationGenomic variation
Genomic variation
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
2 d gel analysis
2 d gel analysis2 d gel analysis
2 d gel analysis
 
2 d gel analysis
2 d gel analysis2 d gel analysis
2 d gel analysis
 

Recently uploaded

CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 

Recently uploaded (20)

CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 

Microarray and its application

  • 1. Made by - SHRUTI DABRAL Measuring dissimilarity of expression pattern
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. NORMALIZATION There are three widely used techniques that can be used to normalize gene-expression data from a single array hybridization. All of these assume that all (or most) of the genes in the array, some subset of genes, or a set of exogenous controls that have been ‘spiked’ into the RNA before labelling, should have an average expression ratio equal to one. The normalization factor is then used to adjust the data to compensate for experimental variability and to ‘balance’ the fluorescence signals from the two samples being compared.  Total intensity normalization Total intensity normalization data relies on the assumption that the quantity of initial mRNA is the same for both labeled samples. Furthermore, one assumes that some genes are upregulated in the query sample relative to the control and that others are downregulated. For the hundreds or thousands of genes in the array, these changes should balance out so that the total quantity of RNA hybridizing to the array from each sample is the same. Under this assumption, a normalization factor can be calculated and used to re- scale the intensity for each gene in the array.
  • 9.  Normalization using regression techniques For mRNA derived from closely related samples, a significant fraction of the assayed genes would be expected to be expressed at similar levels. Normalization of these data is equivalent to calculating the best-fit slope using regression techniques and adjusting the intensities so that the calculated slope is one. In many experiments, the intensities are nonlinear, and local regression techniques are more suitable, such as LOWESS (LOcally WEighted Scatterplot Smoothing) regression.
  • 10.  Normalization using ratio statistics A third normalization option is a method based on the ratio statistics described by Chen et al.20. They assume that although individual genes might be up- or downregulated, in closely related cells, the total quantity of RNA produced is approximately the same for essential genes, such as ‘housekeeping genes’. Using this assumption, they develop an approximate probability density for the ratio Tk = Rk /Gk (where Rk and Gk are, respectively, the measured red and green intensities for the kth array element). They then describe how this can be used in an iterative process that normalizes the mean expression ratio to one and calculates confidence limits that can be used to identify differentially expressed genes.
  • 11.
  • 12.  Gene expression matrix that contains rows representing genes and columns representing particular conditions. Each cell contains a value, given in arbitrary units, that refl ects the expression level of a gene under a corresponding condition. B: Condition C4 is used as a reference and all other conditions are normalized with respect to C4 to obtain expression ratios. C: In this table all expression ratios were converted into the log2 (expression ratio) values. This representation has an advantage of treating up-regulation and down-regulation on comparable scales. D: Discrete values for the elements in Table 1.C. Genes with log2 (expression ratio) values greater than 1 were changed to 1, genes with values less than –1 were changed to –1. Any value between –1 and 1 was changed to 0.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.  Distance measures Analysis of gene expression data is primarily based on comparison of gene expression profiles or sample expression profiles. In order to compare expression profiles, we need a measure to quantify how similar or dissimilar are the objects that are being considered. A variety of distance measures can be used to calculate similarity in expression profiles and these are discussed below.
  • 19.  Euclidean distance Euclidean distance is one of the common distance measures used to calculate similarity between expression profiles. The Euclidean distance between two vectors of dimension 2, say A=[a1, a2] and B=[b1, b2] can be calculated as: D(A,B)= In other words, the Euclidean distance between two genes is the square root of the sum of the squares of the distances between the values in each condition (dimension). (a1-b1)(a1-b1) + (a2-b2)(a2-b2)
  • 20.  Pearson correlation coefficient =One of the most commonly used metrics to measure similarity between expression profiles is the Pearson correlation coeffi cient (PCC) (Eisen et al. 1998). Given the expression ratios for two genes under three conditions A=[a1, a2, a3] and B=[b1, b2, b3], PCC can be computed as follows: Step1: Compute mean of A and B Step2: “Mean centre” expression profiles Step3: Calculate PCC as the cosine of the angle between the mean-centred profi les
  • 21. Step2: “Mean centre” expression profiles
  • 22. Step3: Calculate PCC as the cosine of the angle between the mean-centred profiles
  • 23.  The reason why we “mean centre” the expression profiles is to make sure that we compare shapes of the expression profiles and not their magnitude. Mean centring maintains the shape of the profile, but it changes the magnitude of the profile as shown in . A PCC value of 1 essentially means that the two genes have similar expression profiles and a value of –1 means that the two genes have exactly opposite expression profiles. A value of 0 means that no relationship can be inferred between the expression profiles of genes. In reality, PCC values range from –1 to +1. A PCC value ≥ 0.7 suggests that the genes behave similarly and a PCC value ≤ -0.7 suggests that the genes have opposite behavior. The value of 0.7 is an arbitrary cut-off, and in real cases this value can be chosen depending on the dataset used.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.  CLUSTER ANALYSIS The term ‘cluster analysis’ actually encompasses several different classification algorithms that can be used to develop taxonomies (typically as part of exploratory data analysis). Note that in this classification, the higher the level of aggregation, the less similar are members in the respective class.
  • 30.
  • 31.  Clustering methods can be hierarchical (grouping objects into clusters and specifying relationships among objects in a cluster, resembling a phylogenetic tree) or non- hierarchical (grouping into clusters without specifying relationships between objects in a cluster) as schematically represented in Figure 5. Remember, an object may refer to a gene or a sample, and a cluster refers to a set of objects that behave in a similar manner.
  • 32. Hierarchical methods • Hierarchical clustering methods produce a tree or dendrogram. • They avoid specifying how many clusters are appropriate by providing a partition for each k obtained from cutting the tree at some level. • The tree can be built in two distinct ways – bottom-up: agglomerative clustering; – top-down: divisive clustering.
  • 33.
  • 34.  • Single-linkage clustering The distance between two clusters, i and j, is calculated as the minimum distance between a member of cluster i and a member of cluster j. Consequently, this technique is also referred to as the minimum, or nearestneighbour, method. This method tends to produce clusters that are ‘loose’ because clusters can be joined if any two members are close together. In particular, this method often results in ‘chaining’, or the sequential addition of single samples to an existing cluster. This produces trees with many long, single-addition branches representing clusters that have grown by accretion.
  • 35. • Complete-linkage clustering. Complete-linkage clustering is also known as the maximum or furthest-neighbour method. The distance between two clusters is calculated as the greatest distance between members of the relevant clusters. Not surprisingly, this method tends to produce very compact clusters of elements and the clusters are often very similar in size. • Average-linkage clustering The distance between clusters is calculated using average values. There are, in fact, various methods for calculating averages. The most common is the unweighted pair-group method average (UPGMA). The average distance is calculated from the distance between each point in a cluster and all other points in another cluster. The two clusters with the lowest average distance are joined together to form a new cluster. Related
  • 36.  Centroid linkage clustering In centroid linkage clustering, an average expression profile (called a centroid) is calculated in two steps. First, the mean in each dimension of the expression profiles is calculated for all objects in a cluster. Then, distance between the clusters is measured as the distance between the average expression profiles of the two clusters.
  • 37. Distances between clusters used for hierarchical clustering • Calculation of the distance between two clusters is based on the pairwise distances between members of the clusters – Mean-link: average of pairwise dissimilarities – Single-link: minimum of pairwise dissimilarities. – Complete-link: maximum& of pairwise dissimilarities. – Distance between centroids • Complete linkage gives preference to compact/spherical clusters. • Single linkage can produce long stretched clusters
  • 38.  Hierarchical clustering divisive Hierarchical divisive clustering is the opposite of the agglomerative method, where the entire set of objects is considered as a single cluster and is broken down into two or more clusters that have similar expression profiles. After this is done, each cluster is considered separately and the divisive process is repeated iteratively until all objects have been separated into single objects. The division of objects into clusters on each iterative step may be decided upon by principal component analysis which determines a vector that separates given objects. This method is less popular than agglomerative clustering, but has successfully been used in the analysis of gene expression data by Alon et al. (1999). Non-hierarchical clustering One of the major criticisms of hierarchical clustering is that there is no compelling evidence that a hierarchical structure best suits grouping of the expression profi les. An alternative to this method is a non- hierarchical clustering, which requires predetermination of the number of clusters. Non-hierarchical clustering then groups existing objects into these predefi ned clusters rather than organizing them into a hierarchical structure.
  • 39.
  • 40.
  • 41.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.  RELATING EXPRESSION DATA TO OTHER BIOLOGICAL INFORMATION  Predicting binding sites  Predicting protein interactions and protein functions  Predicting functionally conserved modules  ‘Reverse-engineering’ of gene regulatory networks
  • 48. Study material  http://www.mrc- lmb.cam.ac.uk/genomes/madanm/microarray/cha pter-final.pdf  http://www.dbbm.fiocruz.br/class/Lecture/d17/arra y_2/NatureReviews_JQ.pdf