SlideShare a Scribd company logo
1 of 14
GEO DATASET GDS4145: Subcutaneous Interferon-beta-
1b treatment in relapsing-remitting multiple sclerosis
(U133 A): peripheral mononuclear blood cells
GPL96(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL96): [HG-U133A]
Affymetrix Human Genome U133A Array
Primary Ref: Goertsches RH, Hecker M, Koczan D, Serrano-Fernandez P et al. Long-term genome-
wide blood RNA expression profiles yield novel molecular response candidates for IFN-beta-1b
treatment in relapsing remitting MS. Pharmacogenomics 2010 Feb;11(2):147-61. PMID: 20136355
By Boshika Tara
Introduction/Background
Data is from a 25 relapsing remitting multiple sclerosis patients that
were analyzed in a longitudinal transcriptional profile within 2 years of
rIFN-beta administration.
Post-therapy initiation, the authors identified 42 (day 2), 175 (month
1), 103 (month 12) and 108 (month 24) differentially expressed genes.
For this analysis I choose three timepoints, after the 12 month IFN-beta
injection_chipB, reason for this was to simplify the dataset that I was
working with, also because this was the midpoint of the two year
study.
Distribution of expression levels
Distribution of expression levels
The first task was to see how the expression levels are distributed for
this dataset
- Figure on the left is the distribution of all expression levels within
this timepoint
- Figure on the right is for probes whos expression is greater than
10,000
- Since the dataset is set to dataset_channel_count=1, it seems that
the data for this single channel is normalized, other words they are
absolute measurements of mRNA abundance
Correlational matrix
Next I created a correlational matrix, to
figure out how the samples expression
profile correlated with others. Also, to
look for duplicates and outliers.
Each sample has a Pearson's correlation
of 1 with itself - hence the 0 down the
diagonal.
It looks like there is a strong correlation
with sample expression, with an average
being around 0.97
The big exception is in the middle of the
graph(the big bright red line).
GSM601870 has the lowest correlation of
0.87, I removed this from my dataset
Histogram of Pearson’s correlation between different
samples
Higher correlation between samples is to
be expected. I also check to see what
genes were most highly expressed. Thes
genes were very highly expressed: ACTB,
RPL34, RPL19, RPL11. Interestingly ACTB
is beta actin, and that explains why the
levels would be high, all cells need beta
actin, as for other they all seem to be
ribosomal in nature.
Expression levels of Housekeeping gene
I decided to look at
expression levels of
GAPDH, a housekeeping
gene, to see how
consistent the expression
levels were. The graph on
the left is the absolute
levels of GAPDH, I one can
see it shows uneven
distribution, so I decided
to create histogram where
GAPDH levels were
relative to the genes.
GAPDH is very close to
being the highest ranked
gene in every sample,
which makes sense.
Expression levels of PRND and PRNP
Looked at non
housekeeping genes
expression levels, as by
contrast, a non-
housekeeping gene might
be more variable in both
absolute level and rank.
I choose PRNP and PRND,
both these genes play a
major role in cental
nervous system, and have
been know to play a role
in neurodegenerative
disorders(REF:Wikigenes)
Expression levels of PRND and PRNP
Looks like the absolute levels
of both PRNP and PRND,
vary substantially, while the
relative ranks compared to
other genes, PRNP seems to
have a more consistent
expression , compared to
PRND, which seem to be all
over the place. Also, PRNP,
seems to have higher
expression levels compared
to PRND, which make sense.
Principal Component Analysis
I choose a random subset of this dataset to run
my PCA analysis on. I ran PCA to see what
systematic variations were there in this dataset.
The plot is of the first two variations: PC1 vs PC2.
The most interesting aspect, of this graph is that
PC1 is there only to distinguish GSM601870
sample, which is the outlier from the correlational
matrix. This reconfirms that the sample is not
quite right, and should be excluded from the data
set.
Variance by PC
Overall, the PCA
analysis re confirms
the single outlier in
the sample set, rest of
the samples seem to
be very similar to each
other. Hence the inter-
sample correlation of
~.97 seen in the
correlational matrix.
This shows huge
stratification in
expression levels
between different
genes, as seen also in
the exponential
distribution plotted
earlier.
Comparison with mRNA-seq data
I decided to compare my dataset to mRNA-seq data
from Human BodyMap 2.0. It is important to see if
the data in some way relates to some other data
gathered using a different technology.
Since the microarray data were prepared from human
blood, I combine them with the blood FPKMs from
Human BodyMap.
Despite some outliers, there is a visible correlation
between each gene's average expression level in the
microarray data and its level in the Human BodyMap
2.0 mRNA-seq data.
Overall there is a reasonably strong correlation
between a gene's average level in the microarray data
and its FPKMs in the mRNA-seq data: Pearson's
correlation of rho=.73, and Spearman's rank
correlation of rho=.83
DAVID Analysis: RPL11, ACTB, RPL34 and
RPL19—highly expressed genes
Analysis
Overall, it seems that ribosomal genes are much more highly
expressed compared to other genes. Other set of genes that seem to
be upregulated are genes belonging to the tyrosine kinase family like
DDR1, a receptor tyrosine kinase, know to play a role in cell growth
and communication(ref: http://www.genecards.org/cgi-
bin/carddisp.pl?gene=DDR1)
Other interesting part here is the upregualation of ribosomal protein
S6, in MS, and downregulation after interferon treatment.
It seems logical to see a uptick in regulation in some of the ribosomal
genes, as most of the patients in this study seem to not respond fully
to the interferon treatment.

More Related Content

What's hot

Apparent microRNA-target-specific histone modification in mammalian spermatog...
Apparent microRNA-target-specific histone modification in mammalian spermatog...Apparent microRNA-target-specific histone modification in mammalian spermatog...
Apparent microRNA-target-specific histone modification in mammalian spermatog...Y-h Taguchi
 
Epigentecs and heterosis
Epigentecs and heterosisEpigentecs and heterosis
Epigentecs and heterosisBhavya Sree
 
Classification of Gene Expression Data by Gene Combination using Fuzzy Logic
Classification of Gene Expression Data by Gene Combination using Fuzzy LogicClassification of Gene Expression Data by Gene Combination using Fuzzy Logic
Classification of Gene Expression Data by Gene Combination using Fuzzy LogicIJARIIE JOURNAL
 
Gendoo: Functional profiling of gene and disease features for omics analysis
Gendoo: Functional profiling of gene and disease features for omics analysisGendoo: Functional profiling of gene and disease features for omics analysis
Gendoo: Functional profiling of gene and disease features for omics analysisTakeru Nakazato
 
Advancement of mutant population
Advancement of mutant populationAdvancement of mutant population
Advancement of mutant populationMahbubul Hassan
 
Epigenetic regulation in higher plants
Epigenetic regulation in higher plantsEpigenetic regulation in higher plants
Epigenetic regulation in higher plantsFOODCROPS
 
Early view-february-2015nicola-kerbyson
Early view-february-2015nicola-kerbysonEarly view-february-2015nicola-kerbyson
Early view-february-2015nicola-kerbysonPatricio Crespo
 
Codons Optimization by Creative Biogene
Codons Optimization by Creative BiogeneCodons Optimization by Creative Biogene
Codons Optimization by Creative BiogeneDonglin Bao
 
best presentation on codon bias and its appliaction
best presentation on codon bias and its appliactionbest presentation on codon bias and its appliaction
best presentation on codon bias and its appliactionAbasaheb Deshmukh
 
Epigenetic role in plant
Epigenetic role in plant Epigenetic role in plant
Epigenetic role in plant harshdeep josan
 
Alteration of BRCA1 expression affects alcohol-induced transcription of RNA P...
Alteration of BRCA1 expression affects alcohol-induced transcription of RNA P...Alteration of BRCA1 expression affects alcohol-induced transcription of RNA P...
Alteration of BRCA1 expression affects alcohol-induced transcription of RNA P...Juan Diego Villegas
 
Correction IARS syndrome using CRISPR/Cas9 in Japanese Black Cattle
Correction IARS syndrome using CRISPR/Cas9 in Japanese Black CattleCorrection IARS syndrome using CRISPR/Cas9 in Japanese Black Cattle
Correction IARS syndrome using CRISPR/Cas9 in Japanese Black CattleBoon Keat Ngan
 

What's hot (20)

Apparent microRNA-target-specific histone modification in mammalian spermatog...
Apparent microRNA-target-specific histone modification in mammalian spermatog...Apparent microRNA-target-specific histone modification in mammalian spermatog...
Apparent microRNA-target-specific histone modification in mammalian spermatog...
 
Epigentecs and heterosis
Epigentecs and heterosisEpigentecs and heterosis
Epigentecs and heterosis
 
Austin Neurology & Neurosciences
Austin Neurology & NeurosciencesAustin Neurology & Neurosciences
Austin Neurology & Neurosciences
 
Classification of Gene Expression Data by Gene Combination using Fuzzy Logic
Classification of Gene Expression Data by Gene Combination using Fuzzy LogicClassification of Gene Expression Data by Gene Combination using Fuzzy Logic
Classification of Gene Expression Data by Gene Combination using Fuzzy Logic
 
IGEM poster
IGEM posterIGEM poster
IGEM poster
 
Gendoo: Functional profiling of gene and disease features for omics analysis
Gendoo: Functional profiling of gene and disease features for omics analysisGendoo: Functional profiling of gene and disease features for omics analysis
Gendoo: Functional profiling of gene and disease features for omics analysis
 
mbb355 final
mbb355 finalmbb355 final
mbb355 final
 
Advancement of mutant population
Advancement of mutant populationAdvancement of mutant population
Advancement of mutant population
 
Epigenetic regulation in higher plants
Epigenetic regulation in higher plantsEpigenetic regulation in higher plants
Epigenetic regulation in higher plants
 
Epigenomics gyanika
Epigenomics   gyanikaEpigenomics   gyanika
Epigenomics gyanika
 
Early view-february-2015nicola-kerbyson
Early view-february-2015nicola-kerbysonEarly view-february-2015nicola-kerbyson
Early view-february-2015nicola-kerbyson
 
Codon usage/bias
Codon usage/biasCodon usage/bias
Codon usage/bias
 
Codons Optimization by Creative Biogene
Codons Optimization by Creative BiogeneCodons Optimization by Creative Biogene
Codons Optimization by Creative Biogene
 
CODON BIAS
CODON BIASCODON BIAS
CODON BIAS
 
Gene mutation
Gene mutationGene mutation
Gene mutation
 
best presentation on codon bias and its appliaction
best presentation on codon bias and its appliactionbest presentation on codon bias and its appliaction
best presentation on codon bias and its appliaction
 
Epigenetic role in plant
Epigenetic role in plant Epigenetic role in plant
Epigenetic role in plant
 
Alteration of BRCA1 expression affects alcohol-induced transcription of RNA P...
Alteration of BRCA1 expression affects alcohol-induced transcription of RNA P...Alteration of BRCA1 expression affects alcohol-induced transcription of RNA P...
Alteration of BRCA1 expression affects alcohol-induced transcription of RNA P...
 
Correction IARS syndrome using CRISPR/Cas9 in Japanese Black Cattle
Correction IARS syndrome using CRISPR/Cas9 in Japanese Black CattleCorrection IARS syndrome using CRISPR/Cas9 in Japanese Black Cattle
Correction IARS syndrome using CRISPR/Cas9 in Japanese Black Cattle
 
Poster
PosterPoster
Poster
 

Similar to RAnalysis

RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationIJAEMSJORNAL
 
Screening Of Mdr1 [Autosaved]
Screening Of  Mdr1 [Autosaved]Screening Of  Mdr1 [Autosaved]
Screening Of Mdr1 [Autosaved]Pooja1923
 
ShRNA-specific regulation of FMNL2 expression in P19 cells
ShRNA-specific regulation of FMNL2 expression in P19 cellsShRNA-specific regulation of FMNL2 expression in P19 cells
ShRNA-specific regulation of FMNL2 expression in P19 cellsYousefLayyous
 
Effect of moderate prenatal ethanol exposure on the differential expression o...
Effect of moderate prenatal ethanol exposure on the differential expression o...Effect of moderate prenatal ethanol exposure on the differential expression o...
Effect of moderate prenatal ethanol exposure on the differential expression o...Liliana Monjares
 
High similarity among ChEC-seq datasets.pdf
High similarity among ChEC-seq datasets.pdfHigh similarity among ChEC-seq datasets.pdf
High similarity among ChEC-seq datasets.pdfCornell University
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing ResearchTanmay Ghai
 
MathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaperMathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaperMathias Hibbard
 
MASSIVELY PARELLEL SIGNATURE SEQUENCING
MASSIVELY PARELLEL SIGNATURE SEQUENCINGMASSIVELY PARELLEL SIGNATURE SEQUENCING
MASSIVELY PARELLEL SIGNATURE SEQUENCINGAashish Patel
 
2016-08-20-srep31431
2016-08-20-srep314312016-08-20-srep31431
2016-08-20-srep31431Arpad Palfi
 
Brown and Feder 2005
Brown and Feder 2005Brown and Feder 2005
Brown and Feder 2005Rebecca Brown
 
Molecular Correlates Of Drug Abuse Comorbidity
Molecular Correlates Of Drug Abuse ComorbidityMolecular Correlates Of Drug Abuse Comorbidity
Molecular Correlates Of Drug Abuse ComorbidityAlan Lesselyong
 
RNA samples are extracted from leaf tissue and root tissue of a plant-.pdf
RNA samples are extracted from leaf tissue and root tissue of a plant-.pdfRNA samples are extracted from leaf tissue and root tissue of a plant-.pdf
RNA samples are extracted from leaf tissue and root tissue of a plant-.pdfgazender686
 
ASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary AnalysisASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary AnalysisJames Warren
 
1 At least 2 questions from this section will be on the .docx
1 At least 2 questions from this section will be on the .docx1 At least 2 questions from this section will be on the .docx
1 At least 2 questions from this section will be on the .docxmercysuttle
 
"Non-coding RNA mediated epigenetic regulation of agronomic traits in crop pl...
"Non-coding RNA mediated epigenetic regulation of agronomic traits in crop pl..."Non-coding RNA mediated epigenetic regulation of agronomic traits in crop pl...
"Non-coding RNA mediated epigenetic regulation of agronomic traits in crop pl...NileshJoshi74
 

Similar to RAnalysis (20)

Gene Array Analyzer
Gene Array AnalyzerGene Array Analyzer
Gene Array Analyzer
 
Analysis of gene expression
Analysis of gene expressionAnalysis of gene expression
Analysis of gene expression
 
Sot2007
Sot2007Sot2007
Sot2007
 
Reference 1
Reference 1Reference 1
Reference 1
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
 
Screening Of Mdr1 [Autosaved]
Screening Of  Mdr1 [Autosaved]Screening Of  Mdr1 [Autosaved]
Screening Of Mdr1 [Autosaved]
 
ShRNA-specific regulation of FMNL2 expression in P19 cells
ShRNA-specific regulation of FMNL2 expression in P19 cellsShRNA-specific regulation of FMNL2 expression in P19 cells
ShRNA-specific regulation of FMNL2 expression in P19 cells
 
Effect of moderate prenatal ethanol exposure on the differential expression o...
Effect of moderate prenatal ethanol exposure on the differential expression o...Effect of moderate prenatal ethanol exposure on the differential expression o...
Effect of moderate prenatal ethanol exposure on the differential expression o...
 
High similarity among ChEC-seq datasets.pdf
High similarity among ChEC-seq datasets.pdfHigh similarity among ChEC-seq datasets.pdf
High similarity among ChEC-seq datasets.pdf
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing Research
 
MathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaperMathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaper
 
MASSIVELY PARELLEL SIGNATURE SEQUENCING
MASSIVELY PARELLEL SIGNATURE SEQUENCINGMASSIVELY PARELLEL SIGNATURE SEQUENCING
MASSIVELY PARELLEL SIGNATURE SEQUENCING
 
2016-08-20-srep31431
2016-08-20-srep314312016-08-20-srep31431
2016-08-20-srep31431
 
Brown and Feder 2005
Brown and Feder 2005Brown and Feder 2005
Brown and Feder 2005
 
Final poster (002)
Final poster (002)Final poster (002)
Final poster (002)
 
Molecular Correlates Of Drug Abuse Comorbidity
Molecular Correlates Of Drug Abuse ComorbidityMolecular Correlates Of Drug Abuse Comorbidity
Molecular Correlates Of Drug Abuse Comorbidity
 
RNA samples are extracted from leaf tissue and root tissue of a plant-.pdf
RNA samples are extracted from leaf tissue and root tissue of a plant-.pdfRNA samples are extracted from leaf tissue and root tissue of a plant-.pdf
RNA samples are extracted from leaf tissue and root tissue of a plant-.pdf
 
ASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary AnalysisASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary Analysis
 
1 At least 2 questions from this section will be on the .docx
1 At least 2 questions from this section will be on the .docx1 At least 2 questions from this section will be on the .docx
1 At least 2 questions from this section will be on the .docx
 
"Non-coding RNA mediated epigenetic regulation of agronomic traits in crop pl...
"Non-coding RNA mediated epigenetic regulation of agronomic traits in crop pl..."Non-coding RNA mediated epigenetic regulation of agronomic traits in crop pl...
"Non-coding RNA mediated epigenetic regulation of agronomic traits in crop pl...
 

RAnalysis

  • 1. GEO DATASET GDS4145: Subcutaneous Interferon-beta- 1b treatment in relapsing-remitting multiple sclerosis (U133 A): peripheral mononuclear blood cells GPL96(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL96): [HG-U133A] Affymetrix Human Genome U133A Array Primary Ref: Goertsches RH, Hecker M, Koczan D, Serrano-Fernandez P et al. Long-term genome- wide blood RNA expression profiles yield novel molecular response candidates for IFN-beta-1b treatment in relapsing remitting MS. Pharmacogenomics 2010 Feb;11(2):147-61. PMID: 20136355 By Boshika Tara
  • 2. Introduction/Background Data is from a 25 relapsing remitting multiple sclerosis patients that were analyzed in a longitudinal transcriptional profile within 2 years of rIFN-beta administration. Post-therapy initiation, the authors identified 42 (day 2), 175 (month 1), 103 (month 12) and 108 (month 24) differentially expressed genes. For this analysis I choose three timepoints, after the 12 month IFN-beta injection_chipB, reason for this was to simplify the dataset that I was working with, also because this was the midpoint of the two year study.
  • 4. Distribution of expression levels The first task was to see how the expression levels are distributed for this dataset - Figure on the left is the distribution of all expression levels within this timepoint - Figure on the right is for probes whos expression is greater than 10,000 - Since the dataset is set to dataset_channel_count=1, it seems that the data for this single channel is normalized, other words they are absolute measurements of mRNA abundance
  • 5. Correlational matrix Next I created a correlational matrix, to figure out how the samples expression profile correlated with others. Also, to look for duplicates and outliers. Each sample has a Pearson's correlation of 1 with itself - hence the 0 down the diagonal. It looks like there is a strong correlation with sample expression, with an average being around 0.97 The big exception is in the middle of the graph(the big bright red line). GSM601870 has the lowest correlation of 0.87, I removed this from my dataset
  • 6. Histogram of Pearson’s correlation between different samples Higher correlation between samples is to be expected. I also check to see what genes were most highly expressed. Thes genes were very highly expressed: ACTB, RPL34, RPL19, RPL11. Interestingly ACTB is beta actin, and that explains why the levels would be high, all cells need beta actin, as for other they all seem to be ribosomal in nature.
  • 7. Expression levels of Housekeeping gene I decided to look at expression levels of GAPDH, a housekeeping gene, to see how consistent the expression levels were. The graph on the left is the absolute levels of GAPDH, I one can see it shows uneven distribution, so I decided to create histogram where GAPDH levels were relative to the genes. GAPDH is very close to being the highest ranked gene in every sample, which makes sense.
  • 8. Expression levels of PRND and PRNP Looked at non housekeeping genes expression levels, as by contrast, a non- housekeeping gene might be more variable in both absolute level and rank. I choose PRNP and PRND, both these genes play a major role in cental nervous system, and have been know to play a role in neurodegenerative disorders(REF:Wikigenes)
  • 9. Expression levels of PRND and PRNP Looks like the absolute levels of both PRNP and PRND, vary substantially, while the relative ranks compared to other genes, PRNP seems to have a more consistent expression , compared to PRND, which seem to be all over the place. Also, PRNP, seems to have higher expression levels compared to PRND, which make sense.
  • 10. Principal Component Analysis I choose a random subset of this dataset to run my PCA analysis on. I ran PCA to see what systematic variations were there in this dataset. The plot is of the first two variations: PC1 vs PC2. The most interesting aspect, of this graph is that PC1 is there only to distinguish GSM601870 sample, which is the outlier from the correlational matrix. This reconfirms that the sample is not quite right, and should be excluded from the data set.
  • 11. Variance by PC Overall, the PCA analysis re confirms the single outlier in the sample set, rest of the samples seem to be very similar to each other. Hence the inter- sample correlation of ~.97 seen in the correlational matrix. This shows huge stratification in expression levels between different genes, as seen also in the exponential distribution plotted earlier.
  • 12. Comparison with mRNA-seq data I decided to compare my dataset to mRNA-seq data from Human BodyMap 2.0. It is important to see if the data in some way relates to some other data gathered using a different technology. Since the microarray data were prepared from human blood, I combine them with the blood FPKMs from Human BodyMap. Despite some outliers, there is a visible correlation between each gene's average expression level in the microarray data and its level in the Human BodyMap 2.0 mRNA-seq data. Overall there is a reasonably strong correlation between a gene's average level in the microarray data and its FPKMs in the mRNA-seq data: Pearson's correlation of rho=.73, and Spearman's rank correlation of rho=.83
  • 13. DAVID Analysis: RPL11, ACTB, RPL34 and RPL19—highly expressed genes
  • 14. Analysis Overall, it seems that ribosomal genes are much more highly expressed compared to other genes. Other set of genes that seem to be upregulated are genes belonging to the tyrosine kinase family like DDR1, a receptor tyrosine kinase, know to play a role in cell growth and communication(ref: http://www.genecards.org/cgi- bin/carddisp.pl?gene=DDR1) Other interesting part here is the upregualation of ribosomal protein S6, in MS, and downregulation after interferon treatment. It seems logical to see a uptick in regulation in some of the ribosomal genes, as most of the patients in this study seem to not respond fully to the interferon treatment.