3. Microarray Dataset
Expression data from type 2 diabetic and non-diabetic isolated human islets of Langerhans.
The islets of Langerhans are the regions of the pancreas that contain its endocrine (i.e., hormone-
producing) cells.
In GEO Datasets, its access id:3882[uid]. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE25724
Platform is Affymetrix Human Genome U133A Array.
Expression profiling by array over 9 different age, 2 disease state, 2 gender sets.
In order to evaluate the differences in the transcriptome of type 2 diabetic human islets
compared to non-diabetic islet samples.
7 non-diabetic islets and 6 type-II diabetic islets are sampled.
Organism on which the experiment is conducted is «homo sapiens».
4. Publication
Motivation
Phosphoinositide 3-kinases (PI3Ks) are critical regulators of pancreatic β cell mass and survival, whereas
their involvement in insulin secretion is more controversial. Furthermore, of the different PI3Ks, the
class II isoforms were detected in β cells, although their role is still not well understood.
Here they want to show that down-regulation of the class II PI3K isoform PI3K-C2α specifically impairs
insulin granule exocytosis.
Result
They obtained from the data that the mRNA for PI3K-C2α may be down-regulated in islets of
Langerhans from type 2 diabetic compared with non-diabetic individual.
Their results reveal a critical role for PI3K-C2α in β cells and suggest that down-regulation of PI3K-C2α
may be a feature of type 2 diabetes.
5. Preprocessing
Intensity filtering is done.
Threshold the intensity at the minimum value 10.
Quantile normalization is applied to data before the analysis.
It avoids systematic (non-biological) effects.
This allows comparisons across different chips.
Exclude a gene.
Less than 20% of expression data values have at least a 1.5 fold-change in either direction from the
gene’s median value are excluded.
Log intensity variations are calculated and those whose p-values are greater than 0,05 excluded.
At the end of preprocessing step, 6931 genes are passed the filtering step over 22280 genes.
6. Hypothesis and Investigations
Is there any correlation between the expression levels of cases and controls ?
Identify down regulated and upregulated genes.
Is the expression of a gene different in a set in one condition (cases) compared to another
condition (controls)?
Find the diferentially expressed genes.
8. Pairwise Correlation Plot
[GSM631755 - GSM631761]: controls
[GSM631762 - GSM631767]: cases
There is a high positive correlations within
groups in terms of expression levels.
10. Clustering
Combine most similar samples into
agglomerative clusters, build tree of genes.
[GSM631755 - GSM631761]: controls
[GSM631762 - GSM631767]: cases
When the correlation coefficient is high (0.80),
first level splitting occurs.
At the right branch, all cases are similar to each
other but GSM31758 is control. However, it
differs from others since it is alone in the
branch. There is no need to exclude sample.
At the left branch, all cases are control.
11. Class Comparison
SAM
Target proportion of FDR: 0,01
Number of Permutations:120
Percentile: 95%
There are 153 significant genes diferential
expressed among cases and controls.
All of them downregulated.
INDEPENDENT SAMPLES T-TEST
Max. proportion of FDR: 0,01
Confidence Level: 95%
There are 14 genes diferential expressed
among cases and controls.
9 genes downregulated.
5 genes upregulated.
4 genes mutual
13. Pathway Analysis
After determining a list of genes involved in a given biological process the next step is to map
these genes to known pathways.
153 downregulated genes derived from SAM input to DAVID. 3 significant pathways found.