tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

TranSMART Core
From tool to ecosystem
Kees van Bochove
tranSMART Workshop Amsterdam
June 17, 2013

Today, we have a chance to
writehistory.

•Microarray data analysis support
•Microarray data analysis support
•Load public microarray data from GEO data from GEO
•Load public microarray
•Store and retrievesaved analyses
•Store and retrievesaved analyses
•Search on gene name,on gene name etc.
•Search disease name, disease name e
•Genomicvariants and VCF support VCF support
•Genomicvariants and
•Load TCGA studies we have accesswe have access to
•Load TCGA studies to
•Load 1000 Genomes1000 Genomes data
•Load data

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

costs $ 0!
No-brainer!

Ehm.. wait a minute…

Let’s have a look how these
scientists in academia are doing.
Theylove to collaborate right?!

In 2003…
(Ancienthistory; beforeFacebook)

YetAnother ‘New’ Web-basedSolutionfor
the Management of Microarray Data ?!

NotInventedHereSyndrome

Image from Rob Hooft, CTO NetherlandsBioinformatics Centre
http://nothinkingbeyondthispoint.blogspot.nl/2011/11/decision-tree-for-scientific.html

Whatabout all these great FP6,
FP7, IMI, … projects?

Source code of major projects is
readilyavailableonGitHub

But… I’mafraidit’sstill up to you and
me to put the piecestogether.

Phenotype Database
Written in Grails, supports several types of
omics data, provides data integration and
visualization, has R, Groovy and PHP API’s.
Sounds familiar?

http://phenotypefoundation.org

Sofar…
• TranSMART has a huge business potential. It’s
nosilverbulletthough.
• Scientistssometimes have troublereusingeachothers’ work. Especiallywhenit
comes to open source software.

Do they?
Time to look at some succes stories.

R and Bioconductor
Whodoesn’tlove R?

Website looks as if dates from Stone Age.
Must bethoseLaTeX-lovingphysicists.

Veryactivecommunity, and…
lots of packages.

Governance of R community
BrianRipley: “The R Project is governedby aselfperpetuatingoligarchy, a groupwith a lot of
power. R was principallydevelopedfor the
benefit of the core team.”

As citedon http://blog.revolutionanalytics.com/2011/08/brian-ripley-onthe-r-development-process.html

Galaxy is the most widelyused open
sourcebioinformatics web interface AFAIK.
Probably in nosmallamountthanks to
theircontinuousdedication to
improving the UI.
Butthere’ssomethingelse.

• An open source CMS (Content Management
System) written in Python, nowadays backing
thousands of productiongrade websites
• Startedby 2 developers in 2000, nowanactive
open source project withhundreds of
activedevelopers
• In 2004, the Plone Foundation was formed to
formalize IP and secure the future of Plone
• PloneCollective has hundreds of plugins

What do all these successstories
have in common?
BioconductorPackages
GalaxyToolshed
PloneCollective
Drupal Modules

LessonsfortranSMART
TranSMARTneeds a marketplace and a
thrivingcommunity to survive.
To get to a functioningmarketplace, we
need a well-designed core.

TranSMARTContributions - Pharma
• Janssen
– Initialversion of tranSMART
– Genomics viewer using IGV and GenePattern
– Faceted Search interface (resultsbrowsing)

• Millenium
– Loading TCGA andmany GEO studies
– R interface forinteractingwith data directly in R
– Several R analyses availabledirectly in GUI

TranSMARTContributions - Pharma
• Sanofi
– Cleaner user interface
– Added metadata layerfor all concepts
– Study/Program categorization& file management

• Pfizer
– GWAS upload (VCF), data storage and analysis
– Enhanced data export capabilities

This is a mess.
Anotherreasonwhy we needthat
core.

Start the Core: I2B2 Refactoring
1. I2B2 was integratedwithtranSMART, but the
I2B2 API abstractionswereleaked all over the
place in the tranSMARTapplication.
2. We agreed in the London meeting that all
partieswould set some time apart
forworkingon the core.
3. Combined, it made sense to start working at
the clinical data API, properlyusing the I2B2
API wherepossible, and re-implement all I2B2
functionality in a new ‘core-db’ plugin.

The firstversion of core-integration
was completed half April.
Bythen, all webservice calls to whatformerly
was anoutdatedversion of the I2B2 Ontology
and CRC cells, were handled by the
newlyimplementedcore-dbplugin.
Also, a set of tests was written in the
process and API documentationgenerated.

In the long run, I believeforming a
gooddistributedworkinggroupon the
core API is a more important
delivery of this workshop
thancrunching out a stable 1.1
version.
That’show we writethathistory

CurrenttranSMARTArchitecture

Kees van Bochove - The Hyve

TranSMART’s Strong Points
• Powerful, ready to go user interface
forcommon analyses (survival analysis, gene
expressionheatmapsetc.)
• Leverages i2b2 data model forclinical data and
offers unified view over different studies
• Uses a lot of good open
sourcetechnologyunder the hood (Grails, R,
SOLR, Pentaho) 
leveragingexistingcommunitydevelopments

TranSMART Building Blocks
• R: open source statistics package with CRAN,
an active repository in which many algorithms
and statistical packages are published
• Grails: a rapid application development
framework in Groovy leveraging Java
technology such as Hibernate, Spring, Quartz
• I2b2: domain specific open source package for
storing and querying clinical data
• GenePattern, maybe soon: Galaxy, KNIME?

TranSMART’sWeaknesses
• Largemonolithic codebase
withlittlemodularizationbeyond the
standardGrails MVC setup
• Code quality is problematic, especiallyJavaScript
• Test coverage is low, nofunctional / web tests and
little unit and integration tests
• No clearinternalAPI’s, only a service level that
does the plumbing.
• I2b2 integrationviolates i2b2 abstractions

tranSMART Plans
• Use a clearly modularized architecture with
separation of clinical, high dimensional, search
and metadata storage; workflow execution
enginges and knowledge repository
• Define clear API and rewrite current
implementations with good test coverage
• Use i2b2 data model, re-harmonize with latest
i2b2 APIs, and don’t use i2b2 binaries directly
• Separate analysis definitions and abstract from
workflow execution engine
http://prezi.com/t6twshyctdsk/transmart-core-refactoring

Target tranSMARTArchitecture

Kees van Bochove - The Hyve

Further reading
• Description of core API efforts:
http://thehyve.nl/rewiring-transmart
• In depthdescription of i2b2 refactoring:
http://thehyve.nl/inital-work-on-transmarts-core
• Overview of tranSMART Core API sofar:
http://thehyve.github.io/transmart-core-api/
• Example of continuousintegration test suite
(ofcore-db): https://ci.ctmmtrait.nl/browse/TMCOREDB-JOB1-51/test

tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Similar to tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives (20)

More from David Peyruc

More from David Peyruc (20)

Recently uploaded

Recently uploaded (20)

tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives