Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
ADVANCING TRANSMART ANALYTICAL
CAPABILITIES WITH KNOWLEDGE CONTENT
tranSMART Community Meeting
Sirimon O’Charoen
sirimon.o...
4 EXAMPLES OF tranSMART USE CASES
• Use case 1: Leveraging public datasets

• Use case 2: Finding information on variant a...
USE CASE 1
Leveraging Public Datasets

3
Up-regulated in
an asthma study

Where else is
IL-33 gene
significantly
expressed?

4
What are
other genes
significantly
expressed in
Ulcerative
Colitis?

5
REG1A

6
SLC6A14

7
MICROARRAY REPOSITORY:
PROCESSING PROCEDURE
A. Search for Datasets in public
databases & Data loading

(A)
(B)

B. Quality...
MICROARRAY REPOSITORY:
QC PROCEDURE
• Datasets undergo rigorous
quality control during processing
• An assay is removed fr...
MICROARRAY REPOSITORY:
ANNOTATION PROCEDURE
Additional manual annotation of datasets increases granularity & numbers of
gr...
USE CASE 2
Finding Information on Variant and Mutation

11
How does 17p13 deletion correlated to thalidomide
response in chronic lymphocytic leukemia (CLL) patients?

WBC Reduction ...
What are other diseases or drugs that 17p13
deletion is associated to?

13
GENE VARIANT RECORD

14
GENE VARIANT ASSOCIATIONS

15
GENE VARIANT API
Significance of genotype-phenotype relationships across the translational pipeline
IDENTIFY ACTIONABLE
GE...
GENE VARIANT API: PROCEDURE
SOURCES

SOURCE
SELECTION
VS.
REJECTION

SELECTION
CRITERIA BY
THE ANALYST

conference abstrac...
USE CASE 3
Biological Interpretation

18
INVASIVE BREAST CANCER STUDY

Find predictors for
treatment response

19
MARKER SELECTION WORKFLOW

RCB 0/I

RCB III
20
ENRICHMENT ANALYSIS
Significant signaling
pathways enriched
with differentially
expressed genes in
Responders vs.
Non-resp...
PATHWAY MAPS

Brca1 and Brca2 in breast cancer

Cell cycle: Start of DNA replication in early S phase

22
PATHWAY MAPS

PR action in breast cancer -stimulation of cell
growth and proliferation
Epigenetic alterations in ovarian c...
USE CASE 4
Implementing Classification Model

24
PATIENT STRATIFICATION MODEL

25
IMPLEMENTING MODEL IN tranSMART
Associated clinical phenotype

Sample

Molecular Subtype

Model

Biomarkers
Stratification...
SYSTEMS BIOLOGY TOOLS
Network/Pathways based Approach–

•
•
METABASE
The most comprehensive data available

Drug Targets
D...
EXAMPLES OF NETWORK APPROACHES

Subnetworks
(Chuang et al., 2012)

Systems Biology
(Zhang et al., 2011)

Probabilistic Inf...
FEEDBACKS FROM
tranSMART USERS

29
STATISTICAL ANALYSIS QUESTIONS
• How to control Type 1 error rate?
– Testing set vs. Validation set

• How to perform long...
SCIENTIFIC QUESTIONS
• How to set QC framework around uploaded data?
Community developed QC standard?
• How to do across s...
FEATURE WISH LIST
• Multiple improvement to R advance workflows

• Sending results (gene list, patient subsets) from an
ad...
Upcoming SlideShare
Loading in …5
×

tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analytical Capabilities with Knowledge Content

652 views

Published on

tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analytical Capabilities with Knowledge Content
Sirimon Ocharoen, Thomson Reuters
To effectively analyze data in tranSMART, biological analysis/knowledge-based approach is needed. Through a case study, we will demonstrate how system biology content can be integrated in tranSMART to enable functional analysis and biological interpretation. We will also share our experience and user feedbacks from various projects.

Published in: Health & Medicine, Technology
  • Be the first to comment

  • Be the first to like this

tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analytical Capabilities with Knowledge Content

  1. 1. ADVANCING TRANSMART ANALYTICAL CAPABILITIES WITH KNOWLEDGE CONTENT tranSMART Community Meeting Sirimon O’Charoen sirimon.ocharoen@thomsonreuters.com November 6th, 2013
  2. 2. 4 EXAMPLES OF tranSMART USE CASES • Use case 1: Leveraging public datasets • Use case 2: Finding information on variant and mutation • Use case 3: Biological interpretation • Use case 4: Implementing classification model • Feedbacks from tranSMART users 2
  3. 3. USE CASE 1 Leveraging Public Datasets 3
  4. 4. Up-regulated in an asthma study Where else is IL-33 gene significantly expressed? 4
  5. 5. What are other genes significantly expressed in Ulcerative Colitis? 5
  6. 6. REG1A 6
  7. 7. SLC6A14 7
  8. 8. MICROARRAY REPOSITORY: PROCESSING PROCEDURE A. Search for Datasets in public databases & Data loading (A) (B) B. Quality Control (QC) testing of Raw Assays (filtering out of unsuitable defective Assays) C. GCRMA Processing of QC-approved Assays D. Assays Annotation: i. Assignment of experimental meta-data values to the Assays ii. Assignment of experimental Assays Groups and their Comparisons E. Statistical analysis of defined Comparisons: i. Differential expression testing ii. Calculation of Fold Changes iii. Functional Descriptors calculation (D) (C) (E) Optional: cutoffs 8
  9. 9. MICROARRAY REPOSITORY: QC PROCEDURE • Datasets undergo rigorous quality control during processing • An assay is removed from the dataset if it’s identified as an outlier by the majority of qc metrics • Users are able to see which tests the datasets passed/failed 9
  10. 10. MICROARRAY REPOSITORY: ANNOTATION PROCEDURE Additional manual annotation of datasets increases granularity & numbers of groups and comparisons METACORE 10
  11. 11. USE CASE 2 Finding Information on Variant and Mutation 11
  12. 12. How does 17p13 deletion correlated to thalidomide response in chronic lymphocytic leukemia (CLL) patients? WBC Reduction at Day 7 12 No abberation vs. 17p13 deletion
  13. 13. What are other diseases or drugs that 17p13 deletion is associated to? 13
  14. 14. GENE VARIANT RECORD 14
  15. 15. GENE VARIANT ASSOCIATIONS 15
  16. 16. GENE VARIANT API Significance of genotype-phenotype relationships across the translational pipeline IDENTIFY ACTIONABLE GENE VARIANTS DISEASE RECORD A. Establish variant significance B. Characterize the variant C. Asses the utility of the variant: RESPONSE RECORD O OH O O VARIANT DISEASE • Understanding Disease Mechanism • Treatment & Response Disease Profiling Diagnosis Prognosis Screening, Risk VARIANT DRUG DISEASE Predicting Efficacy / Toxicity Monitoring Efficacy / Toxicity Selection for Therapy Resistance MANUALLY CURATED CONTENT FROM A RANGE OF SOURCES DISCOVERY HTP Studies Candidate Studies VALIDATION Preclinical in vitro & animal studies Clinical studies in patient segments APPLICATION FDA approvals Clinical guidelines 16
  17. 17. GENE VARIANT API: PROCEDURE SOURCES SOURCE SELECTION VS. REJECTION SELECTION CRITERIA BY THE ANALYST conference abstracts, patents, peer reviewed journal articles, clinical trial registries, clinical guidelines, and authority approval documents (ex. FDA) •Retrospective selection or prospective screening for frontfile. Items are screened by a text-mining tool to identify and remove items that have no relevance to the Gene Variant API. •All articles not removed are sent to manual selection by trained annotators who follow the policy. •A clear study design, and valuable results are required by the analyst. The item must satisfy requirements of evidence-based medicine in order to be taken into consideration. •Statistics and / or statement by the author of the variant effect on health are required. If both components are absent, the item is rejected. 17
  18. 18. USE CASE 3 Biological Interpretation 18
  19. 19. INVASIVE BREAST CANCER STUDY Find predictors for treatment response 19
  20. 20. MARKER SELECTION WORKFLOW RCB 0/I RCB III 20
  21. 21. ENRICHMENT ANALYSIS Significant signaling pathways enriched with differentially expressed genes in Responders vs. Non-responders Estrogen/Progesteron signaling Cell Cycle regulation DNA damage repair Epigenetic regulation of gene expression NFIB nuclear factor IB type- a potential biomarker in breast neoplasms STK24 –induction of apoptosis ESR1- Estrogen receptor 1 PGR – Progesteron receptor CDKN2A Cyclin-dependent kinase inhibitor 2A Geminin – DNA replication inhibitor MCM2/5- a regulatory subunit inhibiting the helicase complex CDC45L- Cell cycle control protein MSH6 and MSH2 - MutS homologues, proteins involved in DNA repair RAD50 – DNA repair protein (homologues recombination-dependent repair) DNMT1 – DNA methylation enzyme EZH2- Histone methylation enzyme HDAC2- Histone acetylation enzyme 21
  22. 22. PATHWAY MAPS Brca1 and Brca2 in breast cancer Cell cycle: Start of DNA replication in early S phase 22
  23. 23. PATHWAY MAPS PR action in breast cancer -stimulation of cell growth and proliferation Epigenetic alterations in ovarian cancer 23
  24. 24. USE CASE 4 Implementing Classification Model 24
  25. 25. PATIENT STRATIFICATION MODEL 25
  26. 26. IMPLEMENTING MODEL IN tranSMART Associated clinical phenotype Sample Molecular Subtype Model Biomarkers Stratification rule Mechanism Drug target Pathways Standard tranSMART Additional functionality in tranSMART MetaCore Cytoscape Subnetworks Applicable for both One Mind and Orion projects
  27. 27. SYSTEMS BIOLOGY TOOLS Network/Pathways based Approach– • • METABASE The most comprehensive data available Drug Targets Drug Repositioning • Biomarker Identification • Biological Mechanism Reconstruction SYSTEMS BIOLOGY TOOL LIBRARY State of the art methods Drug Combinations Prognostic Biomarkers • OMICs data + other data types including clinical response • • Statistical Approach Predictive Biomarkers
  28. 28. EXAMPLES OF NETWORK APPROACHES Subnetworks (Chuang et al., 2012) Systems Biology (Zhang et al., 2011) Probabilistic Inference Pathway Activity (Su et al., 2009) (Lee et al., 2007) Pathway Based (Kim et al., 2012) RRFE (Johannes et al., 2010)
  29. 29. FEEDBACKS FROM tranSMART USERS 29
  30. 30. STATISTICAL ANALYSIS QUESTIONS • How to control Type 1 error rate? – Testing set vs. Validation set • How to perform longitudinal analysis? – Regression models • How to identify covariance variables? Which variable has the highest correlation with the outcome? – Multivariate analysis • Other analysis methods/workflows 30
  31. 31. SCIENTIFIC QUESTIONS • How to set QC framework around uploaded data? Community developed QC standard? • How to do across study analysis (easier)? • How to do across species analysis? • How to the community report these (and bugs)? 31
  32. 32. FEATURE WISH LIST • Multiple improvement to R advance workflows • Sending results (gene list, patient subsets) from an advanced workflow back to summary statistics • Saving a workflow (history/output) • Using gene expression data to create subsets • Viewing specific subject records • Adding data types (i.e. date, longitudinal measurement) • Improving exported tables and many more …. 32

×