Best practice reference architecture for
data standardization and curation
Dr. Michael Engels, OSTHUS
BioIT World, Boston – April 21st 2015
Slide 2
Agenda
OSTHUS – Who we are
Painpoints
Reference architecture
Use cases
Benefits
Slide 3
Who we are
Slide 4
Cutting edge in R&D
Global partner
Independent
Digital Lab Informatics
Innovation
Active network
Open collaboration
Customer orientation
Trust
Who we are
Slide 5
Who we are
Focus on value Concepts and
methodology
Approach &
committment
Slide 6
Agenda
OSTHUS – Who we are
Painpoints
Reference architecture
Use cases
Benefits
Slide 7
Life science data
Scientific data are
 Valuable assets to NGO, academic and industries
 Domain/context specific
 Only interpreted by experts
Scientific data are subject of continuous change:
 Growth
 Formats, standards, and technology
 Concept extensions
 Context changes
Slide 8
Change of concepts
Phenomenological based concept Gene-based concept
Pharmacology example: Ion channels taxonomy
Slide 9
Painpoints
Data standardization, data curation, master data management,
data migration, ….
 Are complex endeavor's
 Are labor, and alignment-intensive
 Need expert input (technical and scientific)
 Are highly iterative
 Are difficult to frame in time-lines or costs
How to address this challenge?
Slide 10
Agenda
OSTHUS – Who we are
Painpoints
Reference architecture
Use cases
Benefits
Slide 11
Reference architecture
Data migration
Manage
Curation runs
Manage
Results
Analysis
I
II
III
IV
…...
Manage
Dictionary
Data
Source
Sources
Copy
Copy of targetWorking area
Transformation Glossary and VocabularyProperty Mapping
Extraction &
Loading
Data Concept
Target
Data
SourceGlossary
Vocabulary
Annotation
Rules
Mapping
Rules
Transformation
Rules
Run
Configuration
Data
partitioning
Data
Processing
Filtering
Monitoring &
Audit
Logs & Observ.
Exceptions
Comments
Dashboard
Calculate
Properties
Data
Comparison
Visual
Analytics
Tag
Data
List
Management
CDC
SQL to Load
Audit Trails
Slide 12
Agenda
OSTHUS – Who we are
Painpoints
Reference architecture
Use cases
Benefits
Slide 13
Use case 1
Chemical cartridge/structure migration
Accord Mol2000
#1: racemic
#1
Big Bang
Slide 14
Use case 2
Data integration – DWH
Continuous Growth
Slide 15
Agenda
OSTHUS – Who we are
Painpoints
Reference architecture
Use cases
Benefits
Slide 16
Benefits
Benefits are
 Modular set up
 All functions available within one integrated framework
 Separate components for technical and scientific experts alike
 Data curation – part of a process not of individual data editing
 Easy-to-use
 Configurable toolbox tailored to any program
 Integrated visual / comparative analysis between source and target data
 Reduction of technical issues
 Error propagation contained, roll backs possible
 Focus on data, not on technology
Slide 17
Questions?
For more information:
Visit us at Booth # 451
or at Poster # 47

Best Practice Reference Architecture for Data Curation

  • 1.
    Best practice referencearchitecture for data standardization and curation Dr. Michael Engels, OSTHUS BioIT World, Boston – April 21st 2015
  • 2.
    Slide 2 Agenda OSTHUS –Who we are Painpoints Reference architecture Use cases Benefits
  • 3.
  • 4.
    Slide 4 Cutting edgein R&D Global partner Independent Digital Lab Informatics Innovation Active network Open collaboration Customer orientation Trust Who we are
  • 5.
    Slide 5 Who weare Focus on value Concepts and methodology Approach & committment
  • 6.
    Slide 6 Agenda OSTHUS –Who we are Painpoints Reference architecture Use cases Benefits
  • 7.
    Slide 7 Life sciencedata Scientific data are  Valuable assets to NGO, academic and industries  Domain/context specific  Only interpreted by experts Scientific data are subject of continuous change:  Growth  Formats, standards, and technology  Concept extensions  Context changes
  • 8.
    Slide 8 Change ofconcepts Phenomenological based concept Gene-based concept Pharmacology example: Ion channels taxonomy
  • 9.
    Slide 9 Painpoints Data standardization,data curation, master data management, data migration, ….  Are complex endeavor's  Are labor, and alignment-intensive  Need expert input (technical and scientific)  Are highly iterative  Are difficult to frame in time-lines or costs How to address this challenge?
  • 10.
    Slide 10 Agenda OSTHUS –Who we are Painpoints Reference architecture Use cases Benefits
  • 11.
    Slide 11 Reference architecture Datamigration Manage Curation runs Manage Results Analysis I II III IV …... Manage Dictionary Data Source Sources Copy Copy of targetWorking area Transformation Glossary and VocabularyProperty Mapping Extraction & Loading Data Concept Target Data SourceGlossary Vocabulary Annotation Rules Mapping Rules Transformation Rules Run Configuration Data partitioning Data Processing Filtering Monitoring & Audit Logs & Observ. Exceptions Comments Dashboard Calculate Properties Data Comparison Visual Analytics Tag Data List Management CDC SQL to Load Audit Trails
  • 12.
    Slide 12 Agenda OSTHUS –Who we are Painpoints Reference architecture Use cases Benefits
  • 13.
    Slide 13 Use case1 Chemical cartridge/structure migration Accord Mol2000 #1: racemic #1 Big Bang
  • 14.
    Slide 14 Use case2 Data integration – DWH Continuous Growth
  • 15.
    Slide 15 Agenda OSTHUS –Who we are Painpoints Reference architecture Use cases Benefits
  • 16.
    Slide 16 Benefits Benefits are Modular set up  All functions available within one integrated framework  Separate components for technical and scientific experts alike  Data curation – part of a process not of individual data editing  Easy-to-use  Configurable toolbox tailored to any program  Integrated visual / comparative analysis between source and target data  Reduction of technical issues  Error propagation contained, roll backs possible  Focus on data, not on technology
  • 17.
    Slide 17 Questions? For moreinformation: Visit us at Booth # 451 or at Poster # 47