3. 3
problem areas
domain expertise
Public & proprietary datasets are spread
across many catalogs, not all online, so
finding the right dataset is time consuming
privacy & compliance
Datasets have different owners, so
complying with multiple licenses, attribution
& reporting terms is an ongoing burden
data engineering
Formatting, optimizing and loading data into
your big data or data science platform of
choice requires substantial effort
data quality
Each dataset has different errors, missing
values, outliers, gaps, flurries, biases, typos –
requiring substantial manual effort to clean
data evolution
Datasets are updated on different schedules,
creating an operational burden to keep them
up to date
data integration
Datasets from different sources give different
meaning & assumptions to similarly named
concepts, making joins semantically wrong
4. 4
DataOps defined
domain expertise
◦ Find or create the right data sets
◦ Enrich by experts or by joining data sources
◦ Access to proprietary or hard-to-find data
privacy & compliance
◦ Data license compliance support
◦ Ongoing reporting, attribution, disclosures
◦ Data monetization & secure licensing
data engineering
◦ Format data optimally for target platform
◦ Automatically load the data & metadata
◦ Auto-update governance & lineage catalogs
data quality
◦ Automated cleansing & validation rules
◦ Curated and auto-validated metadata
◦ Quality scores and beyond: Find outliers,
gaps, flurries, biases & provenance issues
data evolution
◦ Track changes in all source data
◦ Deliver clean, versioned updates daily
◦ Support overwrite-on-update
◦ Mappings for terminology changes
data integration
◦ Semantic inter-operability
◦ Unified type system & constraints
◦ Unified metadata specification
◦ Automated mappings to data platforms
5. 5
how it works
tell us what you’re building
We have clinicians & data science experts who
speak your language. Just explain your goal,
your platform and what help you need.
we’ll prepare the data
We will research, curate, clean, license,
format, load, update & document all the
datasets your project requires.
so you can go build it
Focus on data science, and leave data
operations to us. We take care of updates,
integration, compliance and support you.
I want to predict patients at risk
for chronic kidney disease
I want to automatically generate
ICD-10 codes from clinical notes
I want to auto-recommend diets
that match patients’ treatment plan
I want to monitor and alert on
shifts in drug pricing & shortages
6. Continuously tested with the latest big data
& data science platform for one-step load
Ready to load
Up to 100x speedup on Hadoop & Spark
clusters thanks to Parquet serialization
Optimized
It’s as if all your health data came from one
clean source. Somewhere, pigs are flying.
Inter-operable
Get updates as they happen, not worrying
worry about broken schemas or identifiers.
Always up to date
Turnkey data
7. Clean
30+ automated validation rule sets run
on the data and metadata to ensure
correct, complete, same representation
– including units, currencies, locations,
timestamps, dates and missing values
Problem specific
Data provenance (sampling method,
data collection methodology, publisher,
conflicts of interest, freshness, gaps)
documented and verified by a domain
expert against your project
Compliant
Know in advance you have the right
data license for your business model,
geographic target and team. Remain
proactively compliant with reporting,
audits, attribution and privacy terms.
Quality data
8. 8
domain expertise
88%
On our team
% MSc or MA
36%
On our team
% MD or PhD
PharmaClinical Revenue
Cycle
Public
Health
Cyber