TranSMART Core
From tool to ecosystem
Kees van Bochove
tranSMART Workshop Amsterdam
June 17, 2013
Today, we have a chance to
writehistory.
•Microarray data analysis support
•Microarray data analysis support
•Load public microarray data from GEO data from GEO
•L...
There has to be a betterway.
costs $ 0!
No-brainer!

Ehm.. wait a minute…
Let’s have a look how these
scientists in academia are doing.
Theylove to collaborate right?!
In 2003…
(Ancienthistory; beforeFacebook)
YetAnother ‘New’ Web-basedSolutionfor
the Management of Microarray Data ?!
NotInventedHereSyndrome

Image from Rob Hooft, CTO NetherlandsBioinformatics Centre
http://nothinkingbeyondthispoint.blogs...
Whatabout all these great FP6,
FP7, IMI, … projects?
Source code of major projects is
readilyavailableonGitHub
But… I’mafraidit’sstill up to you and
me to put the piecestogether.
Phenotype Database
Written in Grails, supports several types of
omics data, provides data integration and
visualization, h...
share

reuse

specialize
Writinggood software is hard.
Sofar…
• TranSMART has a huge business potential. It’s
nosilverbulletthough.
• Scientistssometimes have troublereusingeach...
Do they?
Time to look at some succes stories.
R and Bioconductor
Whodoesn’tlove R?
Website looks as if dates from Stone Age.
Must bethoseLaTeX-lovingphysicists.
Veryactivecommunity, and…
lots of packages.
Governance of R community
BrianRipley: “The R Project is governedby aselfperpetuatingoligarchy, a groupwith a lot of
power...
Galaxy
Galaxy is the most widelyused open
sourcebioinformatics web interface AFAIK.
Probably in nosmallamountthanks to
theirconti...
GalaxyToolshed
• An open source CMS (Content Management
System) written in Python, nowadays backing
thousands of productiongrade websites...
What do all these successstories
have in common?
BioconductorPackages
GalaxyToolshed
PloneCollective
Drupal Modules
LessonsfortranSMART
TranSMARTneeds a marketplace and a
thrivingcommunity to survive.
To get to a functioningmarketplace, w...
There is alsoanotherreason.
TranSMARTContributions - Pharma
• Janssen
– Initialversion of tranSMART
– Genomics viewer using IGV and GenePattern
– Face...
TranSMARTContributions - Pharma
• Sanofi
– Cleaner user interface
– Added metadata layerfor all concepts
– Study/Program c...
This is a mess.
Anotherreasonwhy we needthat
core.
Start the Core: I2B2 Refactoring
1. I2B2 was integratedwithtranSMART, but the
I2B2 API abstractionswereleaked all over the...
The firstversion of core-integration
was completed half April.
Bythen, all webservice calls to whatformerly
was anoutdated...
In the long run, I believeforming a
gooddistributedworkinggroupon the
core API is a more important
delivery of this worksh...
CurrenttranSMARTArchitecture

Kees van Bochove - The Hyve
TranSMART’s Strong Points
• Powerful, ready to go user interface
forcommon analyses (survival analysis, gene
expressionhea...
TranSMART Building Blocks
• R: open source statistics package with CRAN,
an active repository in which many algorithms
and...
TranSMART’sWeaknesses
• Largemonolithic codebase
withlittlemodularizationbeyond the
standardGrails MVC setup
• Code qualit...
tranSMART Plans
• Use a clearly modularized architecture with
separation of clinical, high dimensional, search
and metadat...
Target tranSMARTArchitecture

Kees van Bochove - The Hyve
Further reading
• Description of core API efforts:
http://thehyve.nl/rewiring-transmart
• In depthdescription of i2b2 refa...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives
Upcoming SlideShare
Loading in...5
×

tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

271
-1

Published on

tranSMART Community Meeting 5-7 Nov 13 - Session 1:
Chilly-Mazarin Meeting Objectives
Sherry Cao and Keith Elliston

Published in: Health & Medicine, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
271
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

  1. 1. TranSMART Core From tool to ecosystem Kees van Bochove tranSMART Workshop Amsterdam June 17, 2013
  2. 2. Today, we have a chance to writehistory.
  3. 3. •Microarray data analysis support •Microarray data analysis support •Load public microarray data from GEO data from GEO •Load public microarray •Store and retrievesaved analyses •Store and retrievesaved analyses •Search on gene name,on gene name etc. •Search disease name, disease name e •Genomicvariants and VCF support VCF support •Genomicvariants and •Load TCGA studies we have accesswe have access to •Load TCGA studies to •Load 1000 Genomes1000 Genomes data •Load data $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  4. 4. There has to be a betterway.
  5. 5. costs $ 0! No-brainer! Ehm.. wait a minute…
  6. 6. Let’s have a look how these scientists in academia are doing. Theylove to collaborate right?!
  7. 7. In 2003… (Ancienthistory; beforeFacebook)
  8. 8. YetAnother ‘New’ Web-basedSolutionfor the Management of Microarray Data ?!
  9. 9. NotInventedHereSyndrome Image from Rob Hooft, CTO NetherlandsBioinformatics Centre http://nothinkingbeyondthispoint.blogspot.nl/2011/11/decision-tree-for-scientific.html
  10. 10. Whatabout all these great FP6, FP7, IMI, … projects?
  11. 11. Source code of major projects is readilyavailableonGitHub
  12. 12. But… I’mafraidit’sstill up to you and me to put the piecestogether.
  13. 13. Phenotype Database Written in Grails, supports several types of omics data, provides data integration and visualization, has R, Groovy and PHP API’s. Sounds familiar? http://phenotypefoundation.org
  14. 14. share reuse specialize
  15. 15. Writinggood software is hard.
  16. 16. Sofar… • TranSMART has a huge business potential. It’s nosilverbulletthough. • Scientistssometimes have troublereusingeachothers’ work. Especiallywhenit comes to open source software.
  17. 17. Do they? Time to look at some succes stories.
  18. 18. R and Bioconductor Whodoesn’tlove R?
  19. 19. Website looks as if dates from Stone Age. Must bethoseLaTeX-lovingphysicists.
  20. 20. Veryactivecommunity, and… lots of packages.
  21. 21. Governance of R community BrianRipley: “The R Project is governedby aselfperpetuatingoligarchy, a groupwith a lot of power. R was principallydevelopedfor the benefit of the core team.” As citedon http://blog.revolutionanalytics.com/2011/08/brian-ripley-onthe-r-development-process.html
  22. 22. Galaxy
  23. 23. Galaxy is the most widelyused open sourcebioinformatics web interface AFAIK. Probably in nosmallamountthanks to theircontinuousdedication to improving the UI. Butthere’ssomethingelse.
  24. 24. GalaxyToolshed
  25. 25. • An open source CMS (Content Management System) written in Python, nowadays backing thousands of productiongrade websites • Startedby 2 developers in 2000, nowanactive open source project withhundreds of activedevelopers • In 2004, the Plone Foundation was formed to formalize IP and secure the future of Plone • PloneCollective has hundreds of plugins
  26. 26. What do all these successstories have in common? BioconductorPackages GalaxyToolshed PloneCollective Drupal Modules
  27. 27. LessonsfortranSMART TranSMARTneeds a marketplace and a thrivingcommunity to survive. To get to a functioningmarketplace, we need a well-designed core.
  28. 28. There is alsoanotherreason.
  29. 29. TranSMARTContributions - Pharma • Janssen – Initialversion of tranSMART – Genomics viewer using IGV and GenePattern – Faceted Search interface (resultsbrowsing) • Millenium – Loading TCGA andmany GEO studies – R interface forinteractingwith data directly in R – Several R analyses availabledirectly in GUI
  30. 30. TranSMARTContributions - Pharma • Sanofi – Cleaner user interface – Added metadata layerfor all concepts – Study/Program categorization& file management • Pfizer – GWAS upload (VCF), data storage and analysis – Enhanced data export capabilities
  31. 31. This is a mess. Anotherreasonwhy we needthat core.
  32. 32. Start the Core: I2B2 Refactoring 1. I2B2 was integratedwithtranSMART, but the I2B2 API abstractionswereleaked all over the place in the tranSMARTapplication. 2. We agreed in the London meeting that all partieswould set some time apart forworkingon the core. 3. Combined, it made sense to start working at the clinical data API, properlyusing the I2B2 API wherepossible, and re-implement all I2B2 functionality in a new ‘core-db’ plugin.
  33. 33. The firstversion of core-integration was completed half April. Bythen, all webservice calls to whatformerly was anoutdatedversion of the I2B2 Ontology and CRC cells, were handled by the newlyimplementedcore-dbplugin. Also, a set of tests was written in the process and API documentationgenerated.
  34. 34. In the long run, I believeforming a gooddistributedworkinggroupon the core API is a more important delivery of this workshop thancrunching out a stable 1.1 version. That’show we writethathistory
  35. 35. CurrenttranSMARTArchitecture Kees van Bochove - The Hyve
  36. 36. TranSMART’s Strong Points • Powerful, ready to go user interface forcommon analyses (survival analysis, gene expressionheatmapsetc.) • Leverages i2b2 data model forclinical data and offers unified view over different studies • Uses a lot of good open sourcetechnologyunder the hood (Grails, R, SOLR, Pentaho)  leveragingexistingcommunitydevelopments
  37. 37. TranSMART Building Blocks • R: open source statistics package with CRAN, an active repository in which many algorithms and statistical packages are published • Grails: a rapid application development framework in Groovy leveraging Java technology such as Hibernate, Spring, Quartz • I2b2: domain specific open source package for storing and querying clinical data • GenePattern, maybe soon: Galaxy, KNIME?
  38. 38. TranSMART’sWeaknesses • Largemonolithic codebase withlittlemodularizationbeyond the standardGrails MVC setup • Code quality is problematic, especiallyJavaScript • Test coverage is low, nofunctional / web tests and little unit and integration tests • No clearinternalAPI’s, only a service level that does the plumbing. • I2b2 integrationviolates i2b2 abstractions
  39. 39. tranSMART Plans • Use a clearly modularized architecture with separation of clinical, high dimensional, search and metadata storage; workflow execution enginges and knowledge repository • Define clear API and rewrite current implementations with good test coverage • Use i2b2 data model, re-harmonize with latest i2b2 APIs, and don’t use i2b2 binaries directly • Separate analysis definitions and abstract from workflow execution engine http://prezi.com/t6twshyctdsk/transmart-core-refactoring
  40. 40. Target tranSMARTArchitecture Kees van Bochove - The Hyve
  41. 41. Further reading • Description of core API efforts: http://thehyve.nl/rewiring-transmart • In depthdescription of i2b2 refactoring: http://thehyve.nl/inital-work-on-transmarts-core • Overview of tranSMART Core API sofar: http://thehyve.github.io/transmart-core-api/ • Example of continuousintegration test suite (ofcore-db): https://ci.ctmmtrait.nl/browse/TMCOREDB-JOB1-51/test
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×