From the sample tissue, RNA is extracted and chopped into million of small pieces call fragments. The length of these fragments are around 400 bp. These RNA fragments are then converted to cDNA. Then these cDNA fragments are sequenced or read from both ends, so it so called paired-end reads These reads are then mapped to a reference genome using package Tophat.
Next step is to measure gene expression for each gene and transcript by a so called FPKM. For example here is a gene with three exons and this gene have two transcripts or isoforms.
I have to remind you that some genes are actually do not produce a single protein instead a single gene could produce different forms of protein by so called alternative splicing which produces different version of RNA from a single gene.
For example in this gene have three exons and transcript one comes from first and third exon, and second transcripts are produced by exon 1 and 2. The small dots here are the mapped cDNA, so for each transcript we can obtain number of fragments that are mapped to that transcript. The abudance of RNA propotionate to the number of reads. FPKM is actually number of fragments devided by the gene length and the total number op mapped reads.
After for each transcripts FPKM is obtain, a gene level FPKM can be obtained by summing up FPKM of all transcript of the gene.
For the analysis, we split the data into two sets, a discovery and validation set. In the discovery set, we start with classification of the samples into molecular subtypes using Swedish microarray data as training using K nearest neighbor. Since the gene expression measurement in the two platform are different, before classification we normalized them together using median and variance normalization. Then we select genes which can classify the sample best.
Once the subtypes are obtained, next is to obtain subtype specific transcript using transcript level FPKM. Here in order to obtain transcript that are up or down regulated in a specific subtype, robust contrast tests were performed.