Medi sapiens egfr_example_gene_report_2011-10-28_version4
IST Online ®MediSapiensExample Gene Report for EGFRThe downloadable gene report package includes several ﬁles withinformation from the selected gene. There are nine plots in the pdfformat and two data tables in the txt format (easily opened forexample in Excel).This example report demonstrates the plot types and the contentsof the data tables in detail. All the data examples used in this reportconcern the gene EGFR (ENSG00000146648).WWW.MEDISAPIENS.COM
DotplotThe dotplot is a body-wide expression proﬁle of the gene across di erent kinds of tissues and cell lines. The y-axis shows the expression level of the gene. The x-axis containsall available samples in an anatomically and pathologically ordered fashion, thus each dot represents the expression of the gene in one sample. The plot is divided intosegments, separated by vertical lines, so that starting from the left, the segments are healthy tissues, malignant tissues, other diseases, healthy tissue cell lines and cancer celllines. Within these segments the samples are categorized according to the anatomical system of origin (marked with colored bars below the plot), and further into speciﬁcanatomies or cancer types.Samples whose expression di ers from the norm within segments are additionally colored (legend at the top left corner of each segment). The samples qualiﬁed for coloringare those where the tissue type has an expression level 1 standard deviation higher than the average expression of all tissues of the same type (healthy, cancer, or otherdisease), or the 90th percentile of expression in the tissue is equal or higher than 2 times interquartile range plus the 75th percentile of the same type. However, no anatomyor cancer type is colored if there are less than ten datapoints per tissue type.In this example plot (Image 1) the most notable observation is the higher expression of EGFR in both lung cancers and gliomas of the nervous system compared to thecorresponding healthy tissues. Also head and neck cancers, mesotheliomas and kidney cancers show elevated expression. EGFR overexpression in several cancer types hasbeen conﬁrmed in numerous publications (for example: Santarpia M et al., 2011, PMID: 21951562; Bronte G et al., 2011, PMID: 21622099; Mazzoleni S et al., 2010, PMID:20858720). The plot also suggests that EGFR might be tissue-speciﬁcally expressed in bladder, as expression is high in both normal and malignant bladder tissues.Image 1: Dotplot for EGFR
Tissue bloxplotThe tissue boxplot is a standard box-whisker plot that visualizes the genes expression in healthy and cancer tissues. All tissues and cancer types with at least ﬁve samples areshown. Green boxes indicate healthy tissues, red boxes indicate cancers. The boxes are arranged so that healthy tissues are separate from cancers, and within these distinctionsthey are grouped so that similar tissues or cancers are next to each other. The overall ordering mirrors that of the dotplot.The bottom of the box is the 25th percentile of the data, the top of the box is the 75th percentile, and the horizontal line is the median. The whiskers extend to 1.5 times theinterquartile range from the edges of the box, and any data points beyond this are considered outliers, marked by hollow circles.The ﬁrst thing to catch attention in our example boxplot (Image 2) is the very high expression of EGFR in placenta compared to all other healthy tissues. Several cancerous tissuesexhibit high expression, including head and neck cancers, mesotheliomas, gliomas, kidney cancers and bladders cancers, like seen in the dotplot. The boxplot also reveals severaloutliers in lung adenocarcinoma, renal cancers and bladder transitional cell carcinoma, which might indicate di erent patient subpopulations within these cancers.Image 2: Tissue boxbplot for EGFR
Cell line plotsThe cell line plots show the expression of thegene in various cell lines. The x-axis indicates theexpression of the gene. In the ﬁrst plot (Image 3),the y-axis lists the cell lines grouped intodi erent tissue types, so that one dot corre-sponds to one sample, as in the dotplot. Thesecond plot (Image 4) contains only the highestexpressing 20 % of the cell lines from the leftplot. Some cell lines might be represented bymultiple dots if there are several samplesavailable. The plot can be used for obtaining anoverview of the gene’s expression in di erent celllines, or comparing in vivo and in vitro expres-sions of tissue samples. The cell line names alsofor the lower expressing 80 % are available in theﬁle containing the expression values that isincluded in the downloadable gene report.The plot in Image 3 shows cell lines reachinghigh EGFR expression values especially inrespiratory system, gastrointestinal system, oralcavity, kidney, cervix, central nervous system,breast and bladder. Many of these are consistentwith the observations from the dotplot and theboxplot. The plot in Image 4 allows for a moredetailed examination and for example selectionof individual cell lines for experiments. In Image4 it is also easy to see if di erent samples fromthe same cell line vary greatly in their expres- Image 3: Cell line plot for EGFRsions, or to quickly spot if certain cell lines ofinterest are among the top 20 per centexpressed samples.
Image 4: Cell line plot for EGFR with the top 20 % expression
PhenoplotPhenoplots show the expression values of cancer samples in the selected gene with several types of clinical data, within each of the cancer datasets. These datasets aregroups of preselected samples from a particular cancer that have some interesting clinical data associated with them. The genes expression in these samples is shown at they-axel, and the x-axel contains the clinical data. Each sample is represented by a single datapoint. Each type of clinical data has a separate segment within the phenoplot, withdistinct values as their own columns.In addition to the single datapoints, the expression values are also presented as similar red box-whisker plots as in the Tissue boxplot. The boxes represent the samples withdistinct phenotypic values in the clinical data segments. No box is shown if there are less than ﬁve distinct phenotypic values. The bottom of the box is the 25th percentile ofthe data, the top of the box is the 75th percentile, and the horizontal line is the median. The whiskers extend to 1.5 times the interquartile range from the edges of the box.Currently there are ﬁve di erent cancer datasets available for phenoplot, all included in the gene report. The clinical data associated with each of the di erent datasets vary,but contain all available data that is relevant to the particular cancer type.Image 5: EGFR phenoplot with breast cancer dataset
The breast cancer phenoplot contains the following clinical parameters: grade and Elston grade, T and N stages, patient relapse status, and subdivision of the samples into themolecular breast cancer subtypes. In the example breast cancer phenoplot (Image 5) some di erences in EGFR expression can be seen between samples in the di erentmolecular subtypes: the basal subtype includes the highest expressing samples, whereas the luminal subtypes only contain samples with very low expression. The phenoplotalso suggests there might be correlation between higher EGFR expression and higher Elston grade, as samples with Elston grade 3 are noticeably higher expressed thansamples with Elston grade 1. These observations could be utilized as guidelines in the planning of further research.The phenoplots of the other four cancer datasets are shown below. The clinical parameters included in them are the following:Colorectal cancer:stage, Dukes stage, grade, TNM stages, microsatellite stability / microsatellite instability status, and necrosis percentage in the sample.Lung cancer:grade, stage, TNM stages, history of tobacco use, years of tobacco use, family history of cancer, and subdivision of the samples into di erent tumor types.Glioblastoma:necrosis status of the sample, and subdivision of the samples into di erent molecular subtypes.Ovarian cancer:grade, stage, TNM stages, mutation statuses in the BRAF, ERBB2 and KRAS genes, and subdivision of the samples into di erent tumor types.Image 6: Colorectal cancer phenoplot
Image 7: Lung cancer phenoplotImage 8: Glioblastoma phenoplot
Data table with expression valuesThe data table contains all samples where an expression value for the selected gene is available. In addition, other information for the samples is included. A few rows from an exam-ple table are shown in Image 10. The columns of the data table are described below. Some of the columns only apply to certain types of samples. Missing values are marked as “NA”.Image 10: An example data tablesample_origin: The tissue type or disease where the sample was taken from m_stage: M stage of the cancersample_anatomy: The anatomical location of the sample; in healthy tissues grade: Grade of the cancer the value is the same as in sample_origin; in cancer samples refers to the site of the primary tumor metastatic: Designates the metastatic status of the sample; primary means the sample is from a primary tumor; metastaticin_vitro_vs_in_vivo: Designates if the sample originates from in vivo or a cell line means the sample is from a primary tumor that hasexperiment_type: Indicates if the sample is from healthy tissue (Healthy), metastasized; and metastasis means the sample is from a cancer tissue (Malignant), or some other diseased tissue metastasis (Other disease) metastasis_site: The anatomical site of the metastasized tumor. Theicd10: The ICD-10 disease classiﬁcation code primary tumors location is given in the sample anatomyicdo: The ICD-O-3 oncology classiﬁcation morphology code column.cellline_name: Cell line name vital_status: Designates if the patient was alive or dead at the end of the experiment. In dead patients it is speciﬁed if death wasage_year: Age of the patient in years caused by the disease in question (“Dead, caused bysex: Gender of the patient disease”), or if the patient died of other causes.race: Ethnicity of the patient survival_day: The survival time of the patient in the experiment, in dayst_stage: T stage of the cancer expression_value: The samples expression valuen_stage: N stage of the cancer internal_sample_id: The id used for the patient in the MediSapiens database
Correlation listThe correlation list contains calculations of the Pearson correlation coe cients for the selected gene and all other genes in the database. The calculations were done bothusing original expression values (straight correlation) and log2 transformed values (log2 transformed correlation). The log2 correlation compares relative changes, rather thanthe absolute change. This lessens the e ect of large numbers on the correlation value.An example of the correlation list is shown in Image 11. The columns of the correlation list are described below. In the p value columns “0” is interpreted as smaller than 10e-16.Image 11: Rows from a correlation listensg_id: The Ensembl id of the genehugo_id: The HUGO name of the gener: The correlation coe cient in the straight calculationp_value: The p value for the straight correlationn: The number of samples in the calculationsr_log: The correlation coe cient in the log2 transformed calculationp_value_log: The p value for the log2 correlation
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.