SlideShare a Scribd company logo
transmart-data
Management of tranSMART’s Environment

Gustavo Lopes
The Hyve B.V.

November 6, 2013

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

1 / 22
Outline

1

Problems
Reproductibility
Versioning Control
Automation
Why?!
tranSMART Foundation’s
Version

2

3

Gustavo Lopes (The Hyve B.V.)

Solution: transmart-data
General Description
Configuration
Database Schema Management
Seed Data
ETL
RModules Analyses’
Rserve
Solr
transmartApp Configuration
Limitations

transmart-data

November 6, 2013

2 / 22
Typical Branch Distribution

Grails Code

Database

transmartApp (without full
repo history, always with
wrong ancestry information
⇒ merging quite difficult)
RModules (if you’re lucky),
but analyses definitions in
DB not provided

SQL scripts on top of GPL
1.0 dump or later. Probably
insufficent/won’t apply
Stored procedures for ETL.
Overlapping definitions with
yours, but no history ⇒
merging quite difficult
Manual fixups always
required (even if just
permissions/synonyms)

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

3 / 22
Typical Branch Distribution (II)

ETL

Solr/Rserve/Configuration
High variablity in strategies
Instructions/sample data
rarely provided

Solr
schemas/dataimport.xml
perpetually forgotten

Kettle scripts are
problematic

Idem for information on R
packages
Sample configuration rarely
provided

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

4 / 22
Versioning Control

Version control used ONLY for Grails Code. . .
But often squashed and with wrong ancestor information.
Forget about database, Solr, most of ETL.

Result
Merges are very difficult.
Changes cannot easily be tracked
Changes’ wherefores are unknown
Regressions are introduced (no conflicts)
Collaboration is based on e-mail attachments

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

5 / 22
Automation
Even with all the pieces. . .
Setting up a new branch takes days;
weeks for non-basic functionality
No reproductibility in the process!

Result
Devs driven away from fully local
environment (too much work)
Robust environment for CI passed over
(too much work)
Bugs cannot be reliably reproduced (see
also: no consistent usage of VCS)
Time wasted with deployment specific
mistakes/inconsistencies
Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

6 / 22
Why?!

The “source code” for a work means
the preferred form of the work for
making modifications to it.
— GPL v3, section 1

Is everyone holding back “source code”?
More likely explanation:
No appropriate tooling being used
Guillaume Duchenne (public domain)

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

7 / 22
Situation for tranSMART 1.1
The situation is much better!
Some problems remain, though.

The Good
Create/populate DB
is easy
Most stuff is
versioned
CI for builds
Image available
Public issue tracking

Gustavo Lopes (The Hyve B.V.)

The Bad
No Oracle support
Changes to DB scripts/seed data are
ad hoc (lax structure)
No mechanism to support/compare
schemas with other branches
R analyses are json blobs in TSVs
No VCS for Solr or Rserve/images’ setup
Set up Sol/Rserve is time-consuming
Population of DB with sample data is still
time-consuming
Config changes required for dev

transmart-data

November 6, 2013

8 / 22
Description of transmart-data

We developed transmart-data to address most of these problems:
transmart-data is a set of
scripts for managing tranSMART’s environment and
certain application data (e.g. Solr schemas, DDL, seed data), which
is used by scripts and sometimes generated by them.
It has a makefile based interface.

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

9 / 22
transmart-data: Purposes

Purposes of transmart-data:
1

Allow setting up a complete dev environment quickly (< 30 min)

2

Bring versioning to the database schema and Solr files

3

Setup Solr runtime

4

Invoke ETL pipelines

5

Setup Rserve

Target audience: Programmers

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

10 / 22
transmart-data: Non-purposes

Non-purposes of transmart-data:
1

Setup a production environment
(some components can be used)

2

New users evaluating tranSMART
(use an pre-built image)

3

Building transmartApp or its plugin dependencies
(build them yourself or use artifacts from Bamboo/Nexus)

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

11 / 22
Configuration
Environment variable based configuration
cp v a r s . s a m p l e v a r s
vim v a r s #e d i t f i l e
source v a r s

Gustavo Lopes (The Hyve B.V.)

PGHOST=/tmp
PGPORT=5432
PGDATABASE=t r a n s m a r t
PGUSER=$USER
PGPASSWORD=
TABLESPACES=$HOME/ pg / t a b l e s p a c e s /
PGSQL BIN=$HOME/ pg / b i n /
ORAHOST=l o c a l h o s t
ORAPORT=1521
ORASID=o r c l
ORAUSER=” s y s a s s y s d b a ”
ORAPASSWORD=mypassword
ORACLE MANAGE TABLESPACES=0
#c o n t i n u e s . . .

transmart-data

November 6, 2013

12 / 22
Database Schema Management
Support for Oracle and Postgres

Oracle

Postgres
Uses pg dump(all)

Queries dba * tables

Parses the dump files

Dumps DDL w/
DBMS METADATA

#Dump
make −C p o s t g r e s / d d l dump
make −C p o s t g r e s / d d l /
GLOBAL e x t e n s i o n s . s q l
roles . sql

#Dump
make −C o r a c l e / d d l dump
#Load
make o r a c l e

#Load
make −C p o s t g r e s / d d l l o a d

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

13 / 22
Seed Data
Only Postgres for now
#Dump
#T a b l e s t o dump i n p o s t g r e s / d a t a/<schema> l s t
make −C p o s t g r e s / d a t a dump
make −C p o s t g r e s /common m i n i m i z e d i f f s
#Load
make −C p o s t g r e s / d a t a l o a d
#Load DDL and d a t a
make p o s t g r e s

Only for basic stuff with no ETL!
Pretty fast (DDL+data loaded in 10s)

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

14 / 22
ETL (I)
Unified interface for ETL

Prepare dataset

Load dataset

1

Prepare ETL-specific source
files

2

Prepare file with ETL
specific params

3

Upload dataset to CDN
(optional)

For each new ETL pipeline,
support must be added

Gustavo Lopes (The Hyve B.V.)

make −C s a m p l e s /{ o r a c l e ,
p o s t g r e s } l o a d <type>
<s t u d y i d >
#Example :
make −C s a m p l e s / p o s t g r e s
load clinical GSE8581

Everything is automated!

transmart-data

November 6, 2013

15 / 22
ETL (II)
Show TM CZ logs:
$ make -C samples/postgres showdblog
make: Entering directory `/home/gustavo/repos/transmart-data/samples/postgres'
groovy -cp postgresql-9.2-1003.jdbc4.jar ../common/dump_audit.groovy postgres `tput cols`
Procedure
| Description
| Stat |
Recs |
Date | Time spent
-----------------------------------------------------------------------------------------------------alysis_data.kjb | GSE8581
| DONE |
1 | 2013-10-15 13:23:22. |
0.0
.load_ext_files | Drop null samples rows
| Done |
0 | 2013-10-15 13:23:23. |
0.450529
.load_ext_files | Drop null cohorts rows
| Done |
0 | 2013-10-15 13:23:23. |
0.043125
.load_ext_files | Drop null analysis rows
| Done |
0 | 2013-10-15 13:23:23. |
0.066097
.load_ext_files | Read analysis file
| Done |
1 | 2013-10-15 13:23:23. |
0.048055
.load_ext_files | Read cohort file
| Done |
3 | 2013-10-15 13:23:23. |
0.085535
.load_ext_files | Read samples file
| Done |
57 | 2013-10-15 13:23:23. |
0.049993
.load_ext_files | Write rwg_cohorts_ext
| Done |
3 | 2013-10-15 13:23:23. |
0.099452
.load_ext_files | Write rwg_analysis_ext
| Done |
1 | 2013-10-15 13:23:23. |
0.047331
.load_ext_files | Write rwg_samples_ext
| Done |
57 | 2013-10-15 13:23:23. |
0.044567
.load_ext_files | Read analysis data file
| Done | 436898 | 2013-10-15 13:23:27. |
3.911089
.load_ext_files | Drop null analysis_data rows
| Done | 382223 | 2013-10-15 13:23:27. |
0.067765
.load_ext_files | Write rwg_analysis_data_ext
| Done | 54675 | 2013-10-15 13:23:28. |
1.332746
IMPORT_FROM_EXT | Start FUNCTION
| Done |
0 | 2013-10-15 13:23:29. |
0.117319
IMPORT_FROM_EXT | Delete existing records from TM_ | Done |
0 | 2013-10-15 13:23:29. |
0.035825
IMPORT_FROM_EXT | Delete existing records from TM_ | Done |
0 | 2013-10-15 13:23:29. |
6.26E-4
IMPORT_FROM_EXT | Delete existing records from TM_ | Done |
0 | 2013-10-15 13:23:29. |
4.84E-4
IMPORT_FROM_EXT | Insert records from TM_LZ.Rwg_An | Done |
1 | 2013-10-15 13:23:29. |
0.001079
IMPORT_FROM_EXT | Update bio_assay_analysis_id on | Done |
0 | 2013-10-15 13:23:29. |
0.030793
IMPORT_FROM_EXT | Insert records from TM_LZ.Rwg_Co | Done |
3 | 2013-10-15 13:23:29. |
8.28E-4
... (continues)

Errors are also shown (if any)
Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

16 / 22
RModules Analyses’(tsApp-DB)
Situation in transmartApp-DB:
u p d a t e searchapp . plugin_module
s e t params = ' {" id ":" survivalAnalysis " ," converter ":{" R ":[" source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y
|| Common / dataBuilders . R ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common /
E xt ra ct Concepts . R ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common / collapsingData . R ' ')
" ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common / BinData . R ' ') " ," source ( ' ' ||
P L U G I N S C R I P T D I R E C T O R Y || Survival / Bui ldS urv iva lDa ta . R ' ') " ," tSurvivalData . build ( n 
tinput . dataFile = ' ' || T E M P F O L D E RD I R E C T O R Y || Clinical / clinical . i2b2trans ' ' , n 
tconcept . time = ' ' || TIME || ' ' , n  tconcept . category = ' ' || CATEGORY || ' ' , n  tconcept .
eventYes = ' ' || EVENTYES || ' ' , n  tbinning . enabled = ' ' || BINNING || ' ' , n  tbinning . bins = ' ' ||
NUMBERBINS || ' ' , n  tbinning . type = ' ' || BINNINGTYPE || ' ' , n  tbinning . manual = ' ' ||
BINNINGMANUAL || ' ' , n  tbinning . binrangestring = ' ' || B I NN IN G RA NG E ST R IN G || ' ' , n  tbinning
. variabletype = ' ' || B IN N I N G V A R I AB L E T Y P E || ' ' , n  tinput . gexFile = ' ' ||
T E M P F O L D E R D I R E CT O R Y || mRNA / Processed_Data / mRNA . trans ' ' , n  tinput . snpFile = ' ' ||
T E M P F O L D E R D I R E CT O R Y || SNP / snp . trans ' ' , n  tconcept . category . type = ' ' || TYPEDEP || ' ' , n
 tgenes . category = ' ' || GENESDEP || ' ' , n  tgenes . category . aggregate = ' ' || AGGREGATEDEP
|| ' ' , n  tsample . category = ' ' || SAMPLEDEP || ' ' , n  ttime . category = ' ' || TIMEPOINTSDEP
|| ' ' , n  tsnptype . category = ' ' || SNPTYPEDEP || ' ')  n  t "]} ," name ":" Survival Analysis " ,"
d a t a F i l e I n p u t M a p p i n g ":{" CLINICAL . TXT ":" TRUE " ," SNP . TXT ":" snpData " ," MRNA_DETAILED . TXT
":" mrnaData "} ," dataTypes ":{" subset1 ":[" CLINICAL . TXT "]} ," pivotData ": false ," view ":"
S u r v i v a lAnalysis " ," processor ":{" R ":[" source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Survival /
C o x R e g r e s s i o n L oa d e r . r ' ') " ," CoxRegression . loader ( input . filename = ' ' outputfile ' ') " ,"
source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Survival / S u r v i v a l Cu r v e L o a d e r . r ' ') " ," SurvivalCurve
. loader ( input . filename = ' ' outputfile ' ' , concept . time = ' ' || TIME || ' ') "]} ," renderer ":{"
GSP ":"/ survivalAnalysis / s u r v i v a l A n a l y s i s O u t p u t "} ,... ( goes on ) '
where module_name = ' p gs u rv iv a lA n al ys i s ';

Not very nice...
Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

17 / 22
RModules Analyses’ (transmart-data)
In transmart-data:
One file per analysis
Files can be generated from DB data
Sanely formatted
But we really want to remove this from the DB!
array (
'id' => 'heatmap',
'name' => 'Heatmap',
'dataTypes' =>
array (
'subset1' =>
array (
0 => 'CLINICAL.TXT',
),
),
'dataFileInputMapping' =>
array (
'CLINICAL.TXT' => 'FALSE',
'SNP.TXT' => 'snpData',
'MRNA_DETAILED.TXT' => 'TRUE',
),
'pivotData' => false,
...

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

18 / 22
Rserve
Targets for Rserve:
Download/build R
Install R packages
Start Rserve
Install System V init
script for Rserve
Idem for systemd

cd R
make - j8 bin / root / R
# some packages don ' t support
concurrent builds
make install_packages
make start_Rserve
make start_Rserve . dbg
TRANSMART_USER = tomcat7 sudo E make i n s ta l l _r s e rv e _ in i t
TRANSMART_USER = tomcat7 sudo E make i n s ta l l _r s e rv e _ un i t

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

19 / 22
Solr
Solr (4.5.0) automatically
downloaded and configured
Solr cores automatically created
User only needs to create a schema
file and dataconfig.xml
# setup & solr ( psql )
make start
# just c o n f i g u r e
make solr_home
make < core > _full_import
make < core > _delta_import
make clean_cores
ORACLE =1 make start

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

20 / 22
transmartApp Configuration

Out-of-tree config management:
Targets for installing files
Zero configuration for
dev!
Customization allowed
without touching the target
files
Only supports ours branches
But a lot of configuration
should be in-tree instead!

Gustavo Lopes (The Hyve B.V.)

# install everything
# previous files are backed
up
make install
# just one file :
make install_Config . groovy
make install_ Bu il dC on fi g .
groovy
make install _D at aS ou rce .
groovy
# costumizations in :
# Config - extra . php
# BuildConfig . groovy (
limited )

transmart-data

November 6, 2013

21 / 22
Current Limitations

DB upgrades not handled
Only a few ETL pipelines
supported
Oracle support is behind
PostgreSQL
Tooling shares repository
with application data
© Joost J. Bakker, CC BY 2.0

Gustavo Lopes (The Hyve B.V.)

transmart-data

November 6, 2013

22 / 22

More Related Content

What's hot

Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
Alexey Lesovsky
 
Introduction of R on Hadoop
Introduction of R on HadoopIntroduction of R on Hadoop
Introduction of R on Hadoop
Chung-Tsai Su
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Pgcenter overview
Pgcenter overviewPgcenter overview
Pgcenter overview
Alexey Lesovsky
 
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
Takahiro Inoue
 
Hadoop
HadoopHadoop
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
Bioinformatics and Computational Biosciences Branch
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
David Wellman
 
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
台灣資料科學年會
 
Meeting20150109 v1
Meeting20150109 v1Meeting20150109 v1
Meeting20150109 v1
Jean-Baptiste Poullet
 
MapReduce@DirectI
MapReduce@DirectIMapReduce@DirectI
MapReduce@DirectI
Directi Group
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Citus Data
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
Kalyan Hadoop
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
Command Prompt., Inc
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Viswanath Gangavaram
 
Postgres vision 2018: The Promise of zheap
Postgres vision 2018: The Promise of zheapPostgres vision 2018: The Promise of zheap
Postgres vision 2018: The Promise of zheap
EDB
 
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptxThink_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Payal Singh
 

What's hot (18)

Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Introduction of R on Hadoop
Introduction of R on HadoopIntroduction of R on Hadoop
Introduction of R on Hadoop
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Pgcenter overview
Pgcenter overviewPgcenter overview
Pgcenter overview
 
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
Hadoop
HadoopHadoop
Hadoop
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
 
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
 
Meeting20150109 v1
Meeting20150109 v1Meeting20150109 v1
Meeting20150109 v1
 
MapReduce@DirectI
MapReduce@DirectIMapReduce@DirectI
MapReduce@DirectI
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
Postgres vision 2018: The Promise of zheap
Postgres vision 2018: The Promise of zheapPostgres vision 2018: The Promise of zheap
Postgres vision 2018: The Promise of zheap
 
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptxThink_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
 

Similar to tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart-data

vega
vegavega
Pig latin
Pig latinPig latin
Pig latin
Bita Kazemi
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
Jim Mlodgenski
 
Performance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBPerformance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDB
Severalnines
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analytics
Maxim Grinev
 
Pro PostgreSQL, OSCon 2008
Pro PostgreSQL, OSCon 2008Pro PostgreSQL, OSCon 2008
Pro PostgreSQL, OSCon 2008
Robert Treat
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
C4Media
 
PPT
PPTPPT
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.
Onyxfish
 
Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
inside-BigData.com
 
Handout3o
Handout3oHandout3o
Handout3o
Shahbaz Sidhu
 
Oracle GoldenGate
Oracle GoldenGateOracle GoldenGate
Oracle GoldenGate
Anar Godjaev
 
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
Anton Chuvakin
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
Jeremy Schneider
 
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and VisualizationWhitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
Kristofferson A
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
Michael Renner
 
饿了么 TensorFlow 深度学习平台:elearn
饿了么 TensorFlow 深度学习平台:elearn饿了么 TensorFlow 深度学习平台:elearn
饿了么 TensorFlow 深度学习平台:elearn
Jiang Jun
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
Databricks
 
Polyglot persistence with Spring Data
Polyglot persistence with Spring DataPolyglot persistence with Spring Data
Polyglot persistence with Spring Data
Corneil du Plessis
 

Similar to tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart-data (20)

vega
vegavega
vega
 
Pig latin
Pig latinPig latin
Pig latin
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
 
Performance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBPerformance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDB
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analytics
 
Pro PostgreSQL, OSCon 2008
Pro PostgreSQL, OSCon 2008Pro PostgreSQL, OSCon 2008
Pro PostgreSQL, OSCon 2008
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
PPT
PPTPPT
PPT
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.
 
Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
 
Handout3o
Handout3oHandout3o
Handout3o
 
Oracle GoldenGate
Oracle GoldenGateOracle GoldenGate
Oracle GoldenGate
 
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and VisualizationWhitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
饿了么 TensorFlow 深度学习平台:elearn
饿了么 TensorFlow 深度学习平台:elearn饿了么 TensorFlow 深度学习平台:elearn
饿了么 TensorFlow 深度学习平台:elearn
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
 
Polyglot persistence with Spring Data
Polyglot persistence with Spring DataPolyglot persistence with Spring Data
Polyglot persistence with Spring Data
 

More from David Peyruc

tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
David Peyruc
 
Community
CommunityCommunity
Community
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMARTtranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker DiscoverytranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding CattranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
David Peyruc
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
David Peyruc
 

More from David Peyruc (20)

tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
 
Community
CommunityCommunity
Community
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMARTtranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker DiscoverytranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding CattranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Creating a Comprehensive ...
 

Recently uploaded

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 

Recently uploaded (20)

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 

tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart-data

  • 1. transmart-data Management of tranSMART’s Environment Gustavo Lopes The Hyve B.V. November 6, 2013 Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 1 / 22
  • 2. Outline 1 Problems Reproductibility Versioning Control Automation Why?! tranSMART Foundation’s Version 2 3 Gustavo Lopes (The Hyve B.V.) Solution: transmart-data General Description Configuration Database Schema Management Seed Data ETL RModules Analyses’ Rserve Solr transmartApp Configuration Limitations transmart-data November 6, 2013 2 / 22
  • 3. Typical Branch Distribution Grails Code Database transmartApp (without full repo history, always with wrong ancestry information ⇒ merging quite difficult) RModules (if you’re lucky), but analyses definitions in DB not provided SQL scripts on top of GPL 1.0 dump or later. Probably insufficent/won’t apply Stored procedures for ETL. Overlapping definitions with yours, but no history ⇒ merging quite difficult Manual fixups always required (even if just permissions/synonyms) Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 3 / 22
  • 4. Typical Branch Distribution (II) ETL Solr/Rserve/Configuration High variablity in strategies Instructions/sample data rarely provided Solr schemas/dataimport.xml perpetually forgotten Kettle scripts are problematic Idem for information on R packages Sample configuration rarely provided Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 4 / 22
  • 5. Versioning Control Version control used ONLY for Grails Code. . . But often squashed and with wrong ancestor information. Forget about database, Solr, most of ETL. Result Merges are very difficult. Changes cannot easily be tracked Changes’ wherefores are unknown Regressions are introduced (no conflicts) Collaboration is based on e-mail attachments Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 5 / 22
  • 6. Automation Even with all the pieces. . . Setting up a new branch takes days; weeks for non-basic functionality No reproductibility in the process! Result Devs driven away from fully local environment (too much work) Robust environment for CI passed over (too much work) Bugs cannot be reliably reproduced (see also: no consistent usage of VCS) Time wasted with deployment specific mistakes/inconsistencies Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 6 / 22
  • 7. Why?! The “source code” for a work means the preferred form of the work for making modifications to it. — GPL v3, section 1 Is everyone holding back “source code”? More likely explanation: No appropriate tooling being used Guillaume Duchenne (public domain) Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 7 / 22
  • 8. Situation for tranSMART 1.1 The situation is much better! Some problems remain, though. The Good Create/populate DB is easy Most stuff is versioned CI for builds Image available Public issue tracking Gustavo Lopes (The Hyve B.V.) The Bad No Oracle support Changes to DB scripts/seed data are ad hoc (lax structure) No mechanism to support/compare schemas with other branches R analyses are json blobs in TSVs No VCS for Solr or Rserve/images’ setup Set up Sol/Rserve is time-consuming Population of DB with sample data is still time-consuming Config changes required for dev transmart-data November 6, 2013 8 / 22
  • 9. Description of transmart-data We developed transmart-data to address most of these problems: transmart-data is a set of scripts for managing tranSMART’s environment and certain application data (e.g. Solr schemas, DDL, seed data), which is used by scripts and sometimes generated by them. It has a makefile based interface. Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 9 / 22
  • 10. transmart-data: Purposes Purposes of transmart-data: 1 Allow setting up a complete dev environment quickly (< 30 min) 2 Bring versioning to the database schema and Solr files 3 Setup Solr runtime 4 Invoke ETL pipelines 5 Setup Rserve Target audience: Programmers Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 10 / 22
  • 11. transmart-data: Non-purposes Non-purposes of transmart-data: 1 Setup a production environment (some components can be used) 2 New users evaluating tranSMART (use an pre-built image) 3 Building transmartApp or its plugin dependencies (build them yourself or use artifacts from Bamboo/Nexus) Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 11 / 22
  • 12. Configuration Environment variable based configuration cp v a r s . s a m p l e v a r s vim v a r s #e d i t f i l e source v a r s Gustavo Lopes (The Hyve B.V.) PGHOST=/tmp PGPORT=5432 PGDATABASE=t r a n s m a r t PGUSER=$USER PGPASSWORD= TABLESPACES=$HOME/ pg / t a b l e s p a c e s / PGSQL BIN=$HOME/ pg / b i n / ORAHOST=l o c a l h o s t ORAPORT=1521 ORASID=o r c l ORAUSER=” s y s a s s y s d b a ” ORAPASSWORD=mypassword ORACLE MANAGE TABLESPACES=0 #c o n t i n u e s . . . transmart-data November 6, 2013 12 / 22
  • 13. Database Schema Management Support for Oracle and Postgres Oracle Postgres Uses pg dump(all) Queries dba * tables Parses the dump files Dumps DDL w/ DBMS METADATA #Dump make −C p o s t g r e s / d d l dump make −C p o s t g r e s / d d l / GLOBAL e x t e n s i o n s . s q l roles . sql #Dump make −C o r a c l e / d d l dump #Load make o r a c l e #Load make −C p o s t g r e s / d d l l o a d Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 13 / 22
  • 14. Seed Data Only Postgres for now #Dump #T a b l e s t o dump i n p o s t g r e s / d a t a/<schema> l s t make −C p o s t g r e s / d a t a dump make −C p o s t g r e s /common m i n i m i z e d i f f s #Load make −C p o s t g r e s / d a t a l o a d #Load DDL and d a t a make p o s t g r e s Only for basic stuff with no ETL! Pretty fast (DDL+data loaded in 10s) Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 14 / 22
  • 15. ETL (I) Unified interface for ETL Prepare dataset Load dataset 1 Prepare ETL-specific source files 2 Prepare file with ETL specific params 3 Upload dataset to CDN (optional) For each new ETL pipeline, support must be added Gustavo Lopes (The Hyve B.V.) make −C s a m p l e s /{ o r a c l e , p o s t g r e s } l o a d <type> <s t u d y i d > #Example : make −C s a m p l e s / p o s t g r e s load clinical GSE8581 Everything is automated! transmart-data November 6, 2013 15 / 22
  • 16. ETL (II) Show TM CZ logs: $ make -C samples/postgres showdblog make: Entering directory `/home/gustavo/repos/transmart-data/samples/postgres' groovy -cp postgresql-9.2-1003.jdbc4.jar ../common/dump_audit.groovy postgres `tput cols` Procedure | Description | Stat | Recs | Date | Time spent -----------------------------------------------------------------------------------------------------alysis_data.kjb | GSE8581 | DONE | 1 | 2013-10-15 13:23:22. | 0.0 .load_ext_files | Drop null samples rows | Done | 0 | 2013-10-15 13:23:23. | 0.450529 .load_ext_files | Drop null cohorts rows | Done | 0 | 2013-10-15 13:23:23. | 0.043125 .load_ext_files | Drop null analysis rows | Done | 0 | 2013-10-15 13:23:23. | 0.066097 .load_ext_files | Read analysis file | Done | 1 | 2013-10-15 13:23:23. | 0.048055 .load_ext_files | Read cohort file | Done | 3 | 2013-10-15 13:23:23. | 0.085535 .load_ext_files | Read samples file | Done | 57 | 2013-10-15 13:23:23. | 0.049993 .load_ext_files | Write rwg_cohorts_ext | Done | 3 | 2013-10-15 13:23:23. | 0.099452 .load_ext_files | Write rwg_analysis_ext | Done | 1 | 2013-10-15 13:23:23. | 0.047331 .load_ext_files | Write rwg_samples_ext | Done | 57 | 2013-10-15 13:23:23. | 0.044567 .load_ext_files | Read analysis data file | Done | 436898 | 2013-10-15 13:23:27. | 3.911089 .load_ext_files | Drop null analysis_data rows | Done | 382223 | 2013-10-15 13:23:27. | 0.067765 .load_ext_files | Write rwg_analysis_data_ext | Done | 54675 | 2013-10-15 13:23:28. | 1.332746 IMPORT_FROM_EXT | Start FUNCTION | Done | 0 | 2013-10-15 13:23:29. | 0.117319 IMPORT_FROM_EXT | Delete existing records from TM_ | Done | 0 | 2013-10-15 13:23:29. | 0.035825 IMPORT_FROM_EXT | Delete existing records from TM_ | Done | 0 | 2013-10-15 13:23:29. | 6.26E-4 IMPORT_FROM_EXT | Delete existing records from TM_ | Done | 0 | 2013-10-15 13:23:29. | 4.84E-4 IMPORT_FROM_EXT | Insert records from TM_LZ.Rwg_An | Done | 1 | 2013-10-15 13:23:29. | 0.001079 IMPORT_FROM_EXT | Update bio_assay_analysis_id on | Done | 0 | 2013-10-15 13:23:29. | 0.030793 IMPORT_FROM_EXT | Insert records from TM_LZ.Rwg_Co | Done | 3 | 2013-10-15 13:23:29. | 8.28E-4 ... (continues) Errors are also shown (if any) Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 16 / 22
  • 17. RModules Analyses’(tsApp-DB) Situation in transmartApp-DB: u p d a t e searchapp . plugin_module s e t params = ' {" id ":" survivalAnalysis " ," converter ":{" R ":[" source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common / dataBuilders . R ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common / E xt ra ct Concepts . R ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common / collapsingData . R ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Common / BinData . R ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Survival / Bui ldS urv iva lDa ta . R ' ') " ," tSurvivalData . build ( n tinput . dataFile = ' ' || T E M P F O L D E RD I R E C T O R Y || Clinical / clinical . i2b2trans ' ' , n tconcept . time = ' ' || TIME || ' ' , n tconcept . category = ' ' || CATEGORY || ' ' , n tconcept . eventYes = ' ' || EVENTYES || ' ' , n tbinning . enabled = ' ' || BINNING || ' ' , n tbinning . bins = ' ' || NUMBERBINS || ' ' , n tbinning . type = ' ' || BINNINGTYPE || ' ' , n tbinning . manual = ' ' || BINNINGMANUAL || ' ' , n tbinning . binrangestring = ' ' || B I NN IN G RA NG E ST R IN G || ' ' , n tbinning . variabletype = ' ' || B IN N I N G V A R I AB L E T Y P E || ' ' , n tinput . gexFile = ' ' || T E M P F O L D E R D I R E CT O R Y || mRNA / Processed_Data / mRNA . trans ' ' , n tinput . snpFile = ' ' || T E M P F O L D E R D I R E CT O R Y || SNP / snp . trans ' ' , n tconcept . category . type = ' ' || TYPEDEP || ' ' , n tgenes . category = ' ' || GENESDEP || ' ' , n tgenes . category . aggregate = ' ' || AGGREGATEDEP || ' ' , n tsample . category = ' ' || SAMPLEDEP || ' ' , n ttime . category = ' ' || TIMEPOINTSDEP || ' ' , n tsnptype . category = ' ' || SNPTYPEDEP || ' ') n t "]} ," name ":" Survival Analysis " ," d a t a F i l e I n p u t M a p p i n g ":{" CLINICAL . TXT ":" TRUE " ," SNP . TXT ":" snpData " ," MRNA_DETAILED . TXT ":" mrnaData "} ," dataTypes ":{" subset1 ":[" CLINICAL . TXT "]} ," pivotData ": false ," view ":" S u r v i v a lAnalysis " ," processor ":{" R ":[" source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Survival / C o x R e g r e s s i o n L oa d e r . r ' ') " ," CoxRegression . loader ( input . filename = ' ' outputfile ' ') " ," source ( ' ' || P L U G I N S C R I P T D I R E C T O R Y || Survival / S u r v i v a l Cu r v e L o a d e r . r ' ') " ," SurvivalCurve . loader ( input . filename = ' ' outputfile ' ' , concept . time = ' ' || TIME || ' ') "]} ," renderer ":{" GSP ":"/ survivalAnalysis / s u r v i v a l A n a l y s i s O u t p u t "} ,... ( goes on ) ' where module_name = ' p gs u rv iv a lA n al ys i s '; Not very nice... Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 17 / 22
  • 18. RModules Analyses’ (transmart-data) In transmart-data: One file per analysis Files can be generated from DB data Sanely formatted But we really want to remove this from the DB! array ( 'id' => 'heatmap', 'name' => 'Heatmap', 'dataTypes' => array ( 'subset1' => array ( 0 => 'CLINICAL.TXT', ), ), 'dataFileInputMapping' => array ( 'CLINICAL.TXT' => 'FALSE', 'SNP.TXT' => 'snpData', 'MRNA_DETAILED.TXT' => 'TRUE', ), 'pivotData' => false, ... Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 18 / 22
  • 19. Rserve Targets for Rserve: Download/build R Install R packages Start Rserve Install System V init script for Rserve Idem for systemd cd R make - j8 bin / root / R # some packages don ' t support concurrent builds make install_packages make start_Rserve make start_Rserve . dbg TRANSMART_USER = tomcat7 sudo E make i n s ta l l _r s e rv e _ in i t TRANSMART_USER = tomcat7 sudo E make i n s ta l l _r s e rv e _ un i t Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 19 / 22
  • 20. Solr Solr (4.5.0) automatically downloaded and configured Solr cores automatically created User only needs to create a schema file and dataconfig.xml # setup & solr ( psql ) make start # just c o n f i g u r e make solr_home make < core > _full_import make < core > _delta_import make clean_cores ORACLE =1 make start Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 20 / 22
  • 21. transmartApp Configuration Out-of-tree config management: Targets for installing files Zero configuration for dev! Customization allowed without touching the target files Only supports ours branches But a lot of configuration should be in-tree instead! Gustavo Lopes (The Hyve B.V.) # install everything # previous files are backed up make install # just one file : make install_Config . groovy make install_ Bu il dC on fi g . groovy make install _D at aS ou rce . groovy # costumizations in : # Config - extra . php # BuildConfig . groovy ( limited ) transmart-data November 6, 2013 21 / 22
  • 22. Current Limitations DB upgrades not handled Only a few ETL pipelines supported Oracle support is behind PostgreSQL Tooling shares repository with application data © Joost J. Bakker, CC BY 2.0 Gustavo Lopes (The Hyve B.V.) transmart-data November 6, 2013 22 / 22