TM4P
Translational Medicine for Patients

TM Data Hub Project
Implementation of a Translational Medicine Data Integration ...
Content of the presentation
●

Update on Sanofi latest achievements
1. IT security assessment of tranSMART
2. Improvement ...
Context – tranSMART at Sanofi
●

Pilot experience with tranSMART from September 2011 till June 2012
● Evaluate tranSMART c...
tranSMART IT security
assessment
Feedback
Special thanks to Vincent Rossetto
and the IS Security Team!
Part 1 – Scope and Context
●

●

Objective of Security Risk Assessment: Protect R&D information
● Mission of R&D IS Securi...
Part 2 – tranSMART risk assessment results
●

tranSMART strength overview
● No trivial system accounts found. No default d...
Part 3 – Application Security weaknesses
●

XSS attack: Certain parameters (tags) are
prone to store cross-site scripting ...
Part 4 – Recommendations and good practices
●
●
●
●

Use good development practices to avoid XSS attacks and privilege esc...
Loading of SNP data
Latest achievements

tranSMART Community Meeting – Nov 06, 2013 |

9
Loading of SNP genotyping data
●

Modification of loader.jar (from tranSMART-ETL repository)
● Correction of errors
● Load...
New tranSMART release
under development (‘RC2’)
Improvements – New features

tranSMART Community Meeting – Nov 06, 2013 |
...
tranSMART RC2 – Scope outline
●

Accommodate new data types
●
●
●
●

●

●
●

miRNA data (qPRC and microarray)
Proteomic da...
tranSMART RC2 – Key points
●

RC2 is built ‘on top of’ Sanofi RC1 release
● ETL: impact of changes = high (Kettle scripts ...
tranSMART x MonGo DB
integration
Objective and timeline

tranSMART Community Meeting – Nov 06, 2013

|

14
MongoDB integration with tranSMART (1/2)
●
●

MongoDB is a NoSQL document oriented database
Main need for tranSMART: Physi...
MongoDB integration with tranSMART (2/2)
●

Timelines
● Integration with Sanofi RC2 release (backend + UI): Q4-2013
● Test...
Conclusion
Any questions?

Thank you!
Acknowledgement: Sherry Cao, Jike Cui, Angelo DeCristofano, Christophe Gibault, Lars...
Additional slides

tranSMART Community Meeting – Nov 06, 2013

|

18
tranSMART RC1 – Summary
●

Released in March 2013
● Code base available in Github

●

Main improvements delivered in tranS...
RC1 – New organization of tranSMART UI
●

Two main tabs – synchronized with each other:

Global view of all the data avail...
tranSMART RC2 – Requirements (1/2)
Area

Req #
1
2

3
Data
loading /
ETL
pipelines

Security

4
5
6
7
8
9
10
11
12
13
14
1...
tranSMART RC2 – Requirements (2/2)
Area

Req #
20
21
Analytics –
22
Advanced
23
Workflows
24
25

Analytics –
Grid View

26...
Risk Assessment methodology

tranSMART Community Meeting – Nov 06, 2013

|

23
Upcoming SlideShare
Loading in …5
×

tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery

1,084 views

Published on

tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART’s Application to Clinical Biomarker Discovery Studies in Sanofi
Sherry Cao, Sanofi
This presentation will discuss challenges we are encountering in clinical biomarker discovery
study and how we are using tranSMART to help to address them.

Published in: Health & Medicine, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,084
On SlideShare
0
From Embeds
0
Number of Embeds
189
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery

  1. 1. TM4P Translational Medicine for Patients TM Data Hub Project Implementation of a Translational Medicine Data Integration Platform tranSMART Community Meeting Developer Stream, Nov 06-2013 Charlotte Raillère (tranSMART Expert) Claire Virenque (Project Manager) | 1
  2. 2. Content of the presentation ● Update on Sanofi latest achievements 1. IT security assessment of tranSMART 2. Improvement of SNP (subject level) data loading ● Update on work-in-progress 3. New release under development (‘RC2’) 4. tranSMART x MongoDB integration tranSMART Community Meeting – Nov 06, 2013 | 2
  3. 3. Context – tranSMART at Sanofi ● Pilot experience with tranSMART from September 2011 till June 2012 ● Evaluate tranSMART capabilities to support clinical biomarker research ● Implementation project launched in September 2012 ● Identify tranSMART improvements, which are of highest value for Sanofi ● Implement tranSMART improvements through two successive tranSMART Release Candidates (RC) • RC1 is available since March, 2013 – code base available in Github • RC2 building is in progress • RC2 is expected to move into production mode in Q2 next year ● Working version of tranSMART available for our early adopter business units ● Obj = Meet their ongoing needs related to translational research data integration. ● Support for data curation & loading is also provided. tranSMART Community Meeting – Nov 06, 2013 | 3
  4. 4. tranSMART IT security assessment Feedback Special thanks to Vincent Rossetto and the IS Security Team!
  5. 5. Part 1 – Scope and Context ● ● Objective of Security Risk Assessment: Protect R&D information ● Mission of R&D IS Security team – Control and Assess the risks on R&D information asset Risk assessment methodology ● ‘Ethical’ hacking – penetration testing • From vulnerability scans to exploitation • Using free tools (Nessus, BackTrack, Metasploit, Sharepoint perl script) • With no account on Sanofi systems neither sanofi standard workstation ● ● ● Without access account, try to gain high level access (admin account, sensitive data) Risk Classification: Four grades ● From ‘High’: Risk with important consequences on Sanofi activities – can happen or be caused easily ● Till ‘Negligible’: Risk with minor consequences – requires expert knowledge or favorable context Recommendations: Remediation Action Plan ● With prioritization of the recommendations | 5
  6. 6. Part 2 – tranSMART risk assessment results ● tranSMART strength overview ● No trivial system accounts found. No default database accounts found. ● Web servers are running under low privileges. User authentication cannot be bypassed. • Authentication through Sanofi’s Active Directory ● Main risks identified ● Credential disclosure (database, Tomcat, Jboss…) ● Session hi-jacking ● Privilege elevation ● Application malevolence (XSS) ● Impact ● Sensitive data disclosure ● Technical information disclosure ● Identity usurpation | 6
  7. 7. Part 3 – Application Security weaknesses ● XSS attack: Certain parameters (tags) are prone to store cross-site scripting attacks. ● ● This vulnerability can be exploited to take control of another administrator’s browser or more probably to lead phishing or viral spreading attacks Admin session hijacking XSS alert : • <script>alert(String.fromCharCode(88, 83, 83, 32, 97, 116, 116, 97, 99, 107, 32, 105, 110, 32, 112, 114, 111, 103, 114, 101, 115, 115))</script> ● Privilege escalation: Basic users can access some administrative features ● The following URL must not be accessible to users with standard account: • • /transmart/secureObjectAccess/manageAccess /transmart/secureObjectAccess/manageAccessBySecObj | 7
  8. 8. Part 4 – Recommendations and good practices ● ● ● ● Use good development practices to avoid XSS attacks and privilege escalation ● Based on development standards such as OWASP Ensure compliance of application accounts with company’s password policy ● ● LDAP authentication using AD (preferred) Or set up specific application password policy (pwd complexity, pwd expiration, time out…) Encrypt tranSMART authentication (https) ● Avoid sniffing attacks and credential disclosure Avoid default or weak accounts ● Administrative console (Jboss, Tomcat, Axis2) must have complex and secret password • • ● ● Risk: Exploit vulnerability to access admin areas and compromise the application (crafted application Consequence: Can impact the application availability or the data confidentiality & integrity. Database accounts (DBA, application) must have complex and secret password • • Risk: Exploit vulnerability to access the Web application database Conquequence: Can impact the data confidentiality & integrity Sensitize users on security topics ● Lock Workstation or log off from tranSMART session to avoid unauthorized access | 8
  9. 9. Loading of SNP data Latest achievements tranSMART Community Meeting – Nov 06, 2013 | 9
  10. 10. Loading of SNP genotyping data ● Modification of loader.jar (from tranSMART-ETL repository) ● Correction of errors ● Loading speeded up • Some inserts replaced by batch inserts • Parameters modified to insert/select data ● Less constraints on file format • Columns from the annotation file can be described in property files • New class to load SNP data from Illumina platform ● Loading of three studies with SNP data from Illumina platform (> 1million SNP) ● 4 patients → 40 minutes ● 30 patients → 5 hours ● 1500 patients → 80 hours ● Estimation (on-going) Integration of SNP loading in ICE (tranSMART Curation & Loading Tool) done tranSMART Community Meeting – Nov 06, 2013 | 10
  11. 11. New tranSMART release under development (‘RC2’) Improvements – New features tranSMART Community Meeting – Nov 06, 2013 | 11
  12. 12. tranSMART RC2 – Scope outline ● Accommodate new data types ● ● ● ● ● ● ● miRNA data (qPRC and microarray) Proteomic data (RBM data, mass spec data) Metabolomic data RNA sequencing data Accommodate serial data (time courses, doses responses, etc.) Enable sequential loading of data for a study Enhance critical current analytics Developments in-progress. Partnership w/ Cognizant and The Hyve. Completion of RC2 developments planned for January, 2014. Developments will be contributed back to the community. ● Box Plot, Line Graph, Correlation Analysis, Grid View ● Plus adaptation of analytics to new data types ● Enhance data export features Click here for further details on RC2 enhancements tranSMART Community Meeting – Nov 06, 2013 | 12
  13. 13. tranSMART RC2 – Key points ● RC2 is built ‘on top of’ Sanofi RC1 release ● ETL: impact of changes = high (Kettle scripts converted into Groovy, new ETL pipelines, mapping files modified) ● Data model: impact = high (creation of new tables for new data types, etc.) ● UI: impact = low ● Our goal is to converge towards the GPL version ● RC1 was merged with ‘Core DB’ & ‘Core API’ enhancements (from GPL1.1) • Start of the modularization of tranSMART ● New data types are implemented in a modular fashion. • This should help to the future merging of RC2 with open source code base Limit deviation from the open source code base Do not duplicate efforts Maximally benefit from public tranSMART development efforts Contribute back all developments to the community tranSMART Community Meeting – Nov 06, 2013 | 13
  14. 14. tranSMART x MonGo DB integration Objective and timeline tranSMART Community Meeting – Nov 06, 2013 | 14
  15. 15. MongoDB integration with tranSMART (1/2) ● ● MongoDB is a NoSQL document oriented database Main need for tranSMART: Physical storage of unstructured data (i.e., files) ● Any files that are uploaded and visible through the Browse tab of the Sanofi RC1 (raw data files, study related documentation such as clinical protocol, etc.) ● Currently, files are stored on tranSMART app server… Limited storage capacity.  Objective: Move storage of unstructured data from tranSMART server to MongoDB db ● Why MongoDB ? ● Ability to store huge volume of unstructured files ● Horizontal scalability ● Easy installation process tranSMART Community Meeting – Nov 06, 2013 | 15
  16. 16. MongoDB integration with tranSMART (2/2) ● Timelines ● Integration with Sanofi RC2 release (backend + UI): Q4-2013 ● Testing in Q1-2014 tranSMART Community Meeting – Nov 06, 2013 | 16
  17. 17. Conclusion Any questions? Thank you! Acknowledgement: Sherry Cao, Jike Cui, Angelo DeCristofano, Christophe Gibault, Lars Greiffenberg, Manfred Hendlich, Rainer Kappes, Adam Palermo, Annick Peleraux, David Peyruc, Charlotte Raillère, Vincent Rossetto, Claire Virenque Making a difference in Healthcare with Information Technologies. tranSMART Community Meeting – Nov 06, 2013 | 17
  18. 18. Additional slides tranSMART Community Meeting – Nov 06, 2013 | 18
  19. 19. tranSMART RC1 – Summary ● Released in March 2013 ● Code base available in Github ● Main improvements delivered in tranSMART RC1: Topic 1: Data Management • • • • • • Topic 2: tranSMART User Interface • Topic 3: Data Searching and Analysis Ability to organize data within a hierarchical structure (Program/Study/Assay) with new tagging capabilities Synonym management for several dictionaries (e.g. compounds, genes, diseases) New capabilities for posting, searching and exporting files New functionality to load gene expression analysis results Better support for time points/series Improvement of tranSMART curation and loading tool & pipelines Simplification of tranSMART UI: – All searching functionalities centralized – Synchronization of the browser and analysis modules • Improvement of data searching capabilities: – Integrated search / filter for querying any data available (levels 1 to 4) – More search / filter criteria • Implementation of standard analytics from GPL1.0 tranSMART Community Meeting – Nov 06, 2013 | 19
  20. 20. RC1 – New organization of tranSMART UI ● Two main tabs – synchronized with each other: Global view of all the data available From level 1 data (uncurated/raw files) to levels 3-4 data (analysis results, findings) Run analysis on subject-level data (former Dataset Explorer) Navigate within Programs > Studies > Assays , Analysis and File Folders (see next slide) Browse level 2 (processed) data – incl. clinical / preclinical / molecular data, etc. Search data using dictionaries Search subject-level data Create new Programs > Studies > Assays and Files Folders, and annotate (tag) them Select data subsets (cohorts) Export files Run basic statistical and genomic analyses on those subsets (standard features from tranSMART v1.0) Visualize gene expression analysis results Export out data subsets tranSMART Community Meeting – Nov 06, 2013 | 20
  21. 21. tranSMART RC2 – Requirements (1/2) Area Req # 1 2 3 Data loading / ETL pipelines Security 4 5 6 7 8 9 10 11 12 13 14 15 16 Analytics – Advanced Workflows 17 18 19 Requirement Optimize the clinical ETL pipeline to accelerate loading time for large clinical studies Enable incremental loading of data for a given study Enable loading of ‘serial’ high and low dimensional data (time course, dose response, different sampling conditions, etc.) Improve samples handling Enable loading of RBM subject-level data as high dimensional data Enable loading of microarray miRNA subject-level data as high dimensional data. Enable loading of qPCR miRNA subject-level data Enable loading of mass spec proteomic subject-level data as high dimensional data. Enable loading of metabolomic subject-level data as high dimensional data. Improve SNP subject-level data loading – in particular, accelerate loading time Enable loading of RNA sequencing subject-level data (gene-level expression quantification) Optimize the management of annotation files for omic data Set up user authentication through the company’s Active Directory Implement security rules and user permissions in Browse tab (RC1 feature) Allow better analysis of ‘serial’ high and low dimensional data using existing analytics Improve the Line Graph analytics: • Enable Line Graph to use high dimensional data •Better handle x axis • Add option to plot individual data in addition to group means or medians. Improve sub categorization of high dimensional data (tissue, time points, etc.) in the high dimensional data node selection screen in Advanced Workflows – linked to req #3 Improve the Boxplot analytics – make individual box plots for each variable when dragging multiple nodes in field ‘Dependent Variable’, and present output in table format Improve the Correlation Analysis analytics Sprint # 2 4 2 1 3 2 2 3 3 Done 2 Done Done 1 2 4 2 4 4 tranSMART Community Meeting – Nov 06, 2013 | 21
  22. 22. tranSMART RC2 – Requirements (2/2) Area Req # 20 21 Analytics – 22 Advanced 23 Workflows 24 25 Analytics – Grid View 26 27 Requirement Allow analysis of RBM data using existing analytics for high dimensional data Allow analysis of microarray miRNA data using existing analytics Allow analysis of qPCR miRNA and mRNA data using existing analytics Allow analysis of mass spectrometry subject-level data using existing analytics Allow analysis of metabolomic subject-level data using existing analytics Allow analysis of RNA sequencing data using existing analytics Improve Grid View • • • • Sprint # 3 2 2 3 3 2 Enable categorical variables in a single column Enable column deletion, row or column selection Enable export of selection Automatically include variables used in Advanced Workflows 3 Display sample ID related to patient ID in Grid View Improve export of data • Improve performances (response time) when exporting large data volume 1 28 • Add advanced filters to allow users to limit the exported data to subset of clinical fields, genes… • Add ability to better categorize the data available for a study (clinical, gene expression, etc.) • Harmonize with Grid View export capabilities 2+4 Tagging Gene sign. 29 30 31 1 2 Done UI 32 Add ability to preview a file in browser (IE8 and Firefox) Add dictionaries for miRNA, proteins, metabolites In Gene Signature/List tab, add gene symbols – linked to req #12 Improve consistency and synchronization of data trees in Browse (Program Explorer panel) and in Analyze (Navigate Terms panel) Secure file indexing After running a free text search in Browse tab, when clicking on bold items in Program Explorer panel, highlight in right hand side Browse panel: Export 33 Search 34 2 Done • String found in metadata (including in file names) • Files containing that string tranSMART Community Meeting – Nov 06, 2013 3 | 22
  23. 23. Risk Assessment methodology tranSMART Community Meeting – Nov 06, 2013 | 23

×