27th of October 2016
Piotr Zakrzewski – The Hyve
TranSMART Pro 17.1 project
Technical Overview
2
What does 17.1 mean for future
development?
Improved ease of development
● Clean up of repositories (single repo)
● One step build
● Dependencies update
● Rest api improvements
● Consolidation and extension of the star schema to
better fit tranSMART and new data types
● Documentation
3
What does 17.1 mean for future
development?
Improved ease of development
● Clean up of repositories (single repo)
● One step build
● Dependencies update
● Rest api improvements
● Consolidation and extension of the star schema to
better fit tranSMART and new data types
● Documentation
4
Repository Structure
Before you can
deploy it here ...
5
Repository Structure
core-api core-db rest-api
R
modules
core-api
transmart
data
legacy db
you need all of these ...
...and these...
6
Repository Structure
16.2:
- TranSMART 16.2 spans 10 core repositories
- Building & testing tranSMART requires a special
setup (that resides in yet another repository)
17.1:
- Single repository with all core components
necessary for building working tranSMART WAR
file
7
What does 17.1 mean for future
development?
Improved ease of development
● Clean up of repositories (single repo)
● One step build
● Dependencies update
● Rest api improvements
● Consolidation and extension of the star schema to
better fit tranSMART and new data types
● Documentation
8
Versioning of Artifacts
16.2:
- Most components are versioned as
SNAPSHOTs
- core-api, core-db, rest-api, transmartApp and
all other core components need to match strictly
in revision in order to work
17.1:
- Single repository: all changes to different
components come in a single PR
9
Build Process
16.2:
- Transmart 16.2 (Grails 2) uses Gant scripts for building
- git-repo used for fetching all repositories
- custom groovy script (dependency manager) needed for
dev setup
17.1:
- Gradle build system (comes with Grails 3)
- One step build (also with database setup)
- just git clone && ./gradlew build
10
Test Setup
16.2:
- Custom script matching branches during travis
run
- Different way to run tests locally and on travis
- No reliable way to run tests for all components
- Tested on H2 in-memory database
17.1:
- ./gradlew test both locally and on travis
- tested against Oracle and Postgres
11
- Default option for Grails 3.X
- Very versatile build system
- Also very popular (gained momentum due to adoption by
Android)
- Especially suitable for multi-project, multi-language
builds like tranSMART
12
What does 17.1 mean for future
development?
Improved ease of development
● Clean up of repositories (single repo)
● One step build
● Dependencies update
● Rest api improvements
● Consolidation and extension of the star schema to
better fit tranSMART and new data types
● Documentation
13
Java 7 to Java 8
tranSMART is still running on Java 7 which is no longer
supported, even for security updates since April 2015.
Java 7 reached its end of life
14
Groovy 2.4 and Grails 3
- Java 8 supports invokeDynamic, which should increase
performance of many groovy dynamic calls
- Many workarounds accounting for old Grails and
Hibernate versions bugs no longer necessary
- Upgrade allowed us to adopt better build system: Gradle
15
What does 17.1 mean for future
development?
Improved ease of development
● Clean up of repositories (single repo)
● One step build
● Dependencies update
● Rest api improvements
● Consolidation and extension of the star schema to
better fit tranSMART and new data types
● Documentation
16
REST-API versioning
● TranSMART REST-api is used in production
● Several clients and third-party apps
● But development needs to continue …
17
REST-API versioning
- in 17.1 REST-api versioning is introduced
- Versioning is done on the url level
- GET /studies becomes GET /v1/studies
- only minor influence on existing clients (change of base
url configuration to include version)
18
Current REST-API documentation
19
Open API (previously Swagger)
20
What does 17.1 mean for future
development?
Improved ease of development
● Clean up of repositories (single repo)
● One step build
● Dependencies update
● Rest api improvements
● Consolidation and extension of the star schema to
better fit tranSMART and new data types
● Documentation
21
Db schema as of now (16.2)
22
Db schema as of now (16.2)
Some facts about the current schema:
Study exists only as string ids sprinkled around the star
schema (no table for study)
Concepts and patients belong to a study (cannot be
shared)
Combination of patient-concept yields a single
observation
23
Db schema of 17.1
24
Db schema of 17.1
Most important Consequences of 17.1 changes:
Concepts and patients can be shared between studies
more straightforward cross trial comparison (trial-visit
dimension) and longitudinal data (start date) support
Much redundancy and inconsistencies removed
25
Hypercube
- Introduction of longitudinal data
requires a whole different
approach
- Modifiers used to store
time point. Both relative and
absolute allowed
- Each observation has effectively an additional dimension
(hence the Hypercube)
26
How to query a Hypercube ?
27
Impact on backwards compatibility
- Old UI will work only with old data, new data (especially
longitudinal) will not be supported
- Old ui will not make use of new cross-trial functionality
- Migration path will be provided between 16.2 and 17.1
28
New UI however will support the longitudinal
data and other features
29
What does 17.1 mean for future
development?
Improved ease of development
● Clean up of repositories (single repo)
● One step build
● Dependencies update
● Rest api improvements
● Consolidation and extension of the star schema to
better fit tranSMART and new data types
● Documentation
30
Documentation
- one of the project deliverables is documentation on the
database schema
- REST-api documented with Open-API
- Documentation as part of git repository
31
Conclusion
17.1 aside from many new features is also a major
clean-up that will make future developments easier
Backup slides
33
34
Arvados Keep
35
Performance Benchmarks
- Goal: safeguarding performance of REST-api
- Implemented as a Gradle task (single command)
- Should help developers spot falls in performance after
new changes
- Reference setup on Amazon will be available to make
benchmarks comparable
36
Other changes
- Multiple observations per concept-patient support
- Categorial variables no longer loaded per value (e.g.
variable Treated being two variables: yes and no)
- Several new tables to accommodate new HDD data type
(RNAseq measurement per transcript) and table to store
generic links to external resources (files)

tranSMART 17.1 technical overview

  • 1.
    27th of October2016 Piotr Zakrzewski – The Hyve TranSMART Pro 17.1 project Technical Overview
  • 2.
    2 What does 17.1mean for future development? Improved ease of development ● Clean up of repositories (single repo) ● One step build ● Dependencies update ● Rest api improvements ● Consolidation and extension of the star schema to better fit tranSMART and new data types ● Documentation
  • 3.
    3 What does 17.1mean for future development? Improved ease of development ● Clean up of repositories (single repo) ● One step build ● Dependencies update ● Rest api improvements ● Consolidation and extension of the star schema to better fit tranSMART and new data types ● Documentation
  • 4.
    4 Repository Structure Before youcan deploy it here ...
  • 5.
    5 Repository Structure core-api core-dbrest-api R modules core-api transmart data legacy db you need all of these ... ...and these...
  • 6.
    6 Repository Structure 16.2: - TranSMART16.2 spans 10 core repositories - Building & testing tranSMART requires a special setup (that resides in yet another repository) 17.1: - Single repository with all core components necessary for building working tranSMART WAR file
  • 7.
    7 What does 17.1mean for future development? Improved ease of development ● Clean up of repositories (single repo) ● One step build ● Dependencies update ● Rest api improvements ● Consolidation and extension of the star schema to better fit tranSMART and new data types ● Documentation
  • 8.
    8 Versioning of Artifacts 16.2: -Most components are versioned as SNAPSHOTs - core-api, core-db, rest-api, transmartApp and all other core components need to match strictly in revision in order to work 17.1: - Single repository: all changes to different components come in a single PR
  • 9.
    9 Build Process 16.2: - Transmart16.2 (Grails 2) uses Gant scripts for building - git-repo used for fetching all repositories - custom groovy script (dependency manager) needed for dev setup 17.1: - Gradle build system (comes with Grails 3) - One step build (also with database setup) - just git clone && ./gradlew build
  • 10.
    10 Test Setup 16.2: - Customscript matching branches during travis run - Different way to run tests locally and on travis - No reliable way to run tests for all components - Tested on H2 in-memory database 17.1: - ./gradlew test both locally and on travis - tested against Oracle and Postgres
  • 11.
    11 - Default optionfor Grails 3.X - Very versatile build system - Also very popular (gained momentum due to adoption by Android) - Especially suitable for multi-project, multi-language builds like tranSMART
  • 12.
    12 What does 17.1mean for future development? Improved ease of development ● Clean up of repositories (single repo) ● One step build ● Dependencies update ● Rest api improvements ● Consolidation and extension of the star schema to better fit tranSMART and new data types ● Documentation
  • 13.
    13 Java 7 toJava 8 tranSMART is still running on Java 7 which is no longer supported, even for security updates since April 2015. Java 7 reached its end of life
  • 14.
    14 Groovy 2.4 andGrails 3 - Java 8 supports invokeDynamic, which should increase performance of many groovy dynamic calls - Many workarounds accounting for old Grails and Hibernate versions bugs no longer necessary - Upgrade allowed us to adopt better build system: Gradle
  • 15.
    15 What does 17.1mean for future development? Improved ease of development ● Clean up of repositories (single repo) ● One step build ● Dependencies update ● Rest api improvements ● Consolidation and extension of the star schema to better fit tranSMART and new data types ● Documentation
  • 16.
    16 REST-API versioning ● TranSMARTREST-api is used in production ● Several clients and third-party apps ● But development needs to continue …
  • 17.
    17 REST-API versioning - in17.1 REST-api versioning is introduced - Versioning is done on the url level - GET /studies becomes GET /v1/studies - only minor influence on existing clients (change of base url configuration to include version)
  • 18.
  • 19.
  • 20.
    20 What does 17.1mean for future development? Improved ease of development ● Clean up of repositories (single repo) ● One step build ● Dependencies update ● Rest api improvements ● Consolidation and extension of the star schema to better fit tranSMART and new data types ● Documentation
  • 21.
    21 Db schema asof now (16.2)
  • 22.
    22 Db schema asof now (16.2) Some facts about the current schema: Study exists only as string ids sprinkled around the star schema (no table for study) Concepts and patients belong to a study (cannot be shared) Combination of patient-concept yields a single observation
  • 23.
  • 24.
    24 Db schema of17.1 Most important Consequences of 17.1 changes: Concepts and patients can be shared between studies more straightforward cross trial comparison (trial-visit dimension) and longitudinal data (start date) support Much redundancy and inconsistencies removed
  • 25.
    25 Hypercube - Introduction oflongitudinal data requires a whole different approach - Modifiers used to store time point. Both relative and absolute allowed - Each observation has effectively an additional dimension (hence the Hypercube)
  • 26.
    26 How to querya Hypercube ?
  • 27.
    27 Impact on backwardscompatibility - Old UI will work only with old data, new data (especially longitudinal) will not be supported - Old ui will not make use of new cross-trial functionality - Migration path will be provided between 16.2 and 17.1
  • 28.
    28 New UI howeverwill support the longitudinal data and other features
  • 29.
    29 What does 17.1mean for future development? Improved ease of development ● Clean up of repositories (single repo) ● One step build ● Dependencies update ● Rest api improvements ● Consolidation and extension of the star schema to better fit tranSMART and new data types ● Documentation
  • 30.
    30 Documentation - one ofthe project deliverables is documentation on the database schema - REST-api documented with Open-API - Documentation as part of git repository
  • 31.
    31 Conclusion 17.1 aside frommany new features is also a major clean-up that will make future developments easier
  • 33.
  • 34.
  • 35.
    35 Performance Benchmarks - Goal:safeguarding performance of REST-api - Implemented as a Gradle task (single command) - Should help developers spot falls in performance after new changes - Reference setup on Amazon will be available to make benchmarks comparable
  • 36.
    36 Other changes - Multipleobservations per concept-patient support - Categorial variables no longer loaded per value (e.g. variable Treated being two variables: yes and no) - Several new tables to accommodate new HDD data type (RNAseq measurement per transcript) and table to store generic links to external resources (files)