CREST Overview

CREST
Health Analytics
Platform

CREST Health Analytics Platform
• CREST platform is a one-stop solution for health data storage and analysis.
• It has comprehensive, flexible, and scalable ecosystem of frameworks.
• It allows capturing, processing, analysis, and visualisation of large volumes of
health data which are too complex for the traditional data-processing application.

Use cases and scenarios
Scenario: For emergency health situations such as pandemic or flooding, there is a high
need to do predictive analytics to know the requirement of medical supplies.
Solution: Health analytics with the CREST platform to
• Predictive analysis using outbreak patterns and other historical data
• Monitoring of cases – numbers of cases and patients' health
• Recommendations on resources for healthcare facilities

CREST
Infrastructure and
data management

CREST Infrastructure and data management
• Automated infrastructure deployment (IaaC)
• Network configuration
• Software installation
• Benchmarking experimentation testbed
•
• Run-time patching recovery
• Data storage and management
• Big data storage solutions cluster configuration

Use cases and scenarios
• Determining energy efficiency of various data workloads for low-powered devices
• Measuring performance and resource usage for various data distribution flows
• Modelling effects of node mobility under different networking scenarios
• Automated comparison of multiple data storage and processing solutions
• Detecting and recovering from broken run-time patches

CREST -
Software Security Research

Supply Chain
Provenance of ML-
based Software
Nguyen Khoi Tran, M. Ali Babar, Mingyu Guo;
CREST – The University of Adelaide, Australia

Scenario: Distributed ML DevOps
Auditor
Model
Verifier
Model
Developer
Client /
Operator
Dataset
Admin
I have idea for an ML application
1
I hire a company to collect data for me
2
I gather the data
3
I outsource labeling to Amazon
Mechanism Turk
4
I pass the data to the appointed
developers
5
I develop the ML model
6 I test and verify the model
7
I return the model to client
8
I seek third-party validation
9

Scenario: Distributed ML DevOps goes wrong
Auditor
Model
Verifier
Model
Developer
Client /
Operator
Dataset
Admin
I have idea for an ML application
1
I hire a company to collect data for me
2
I gather the data
3
I outsource labeling to Amazon
Mechanism Turk
4
I pass the data to the appointed
developers
5
I develop the ML model
6 I test and verify the model
7
I return the model to client
8
I seek third-party validation
9
Deliberate mislabeling
(poisoning)
Dataset tampering
Vulnerability in ML
frameworks
Model swapping
Cover-up
Not enough
information

How to capture and preserve the
records of “who did what” to ML assets
(a.k.a., workflow provenance information)
in a distributed ML workflow environment?

Existing Approach: A Centralised Platform
Auditor
Model
Verifier
Model
Developer
Client /
Operator
Dataset
Admin
Provenance Database
Dataset Provenance
(e.g., data sheet)
Auditing results
Asset Provenance
Model development records
Model testing records
(e.g., model card)

Problem Summary
Problem
Security Authenticity
Integrity
Non-repudiation
Resilience Availability and fault tolerance
Decentralization Disintermediation
User-driven
Control information flow

Decentralized Software Platform
Auditor
Model Verifier
Model
Developer
Model
Operator
Dataset
Admin
ProML
Node
ProML
Node
ProML
Node
ProML
Node
ProML
Node
Provenance Update
Broadcasts
P
r
o
M
L
N
o
d
e
Provider
Clients
Service
IPFS Client
Storage Provider
Provenance Capturing
Blockchain
Wallet
Signer
Content Distribution Network
Dataset Model
Blockchain
Dataset
Provenance
Provenance
Update
Process
Model
Provenance
User
Interface
CLI
Client
Capturing
Library
Query
Interface
Blockchain Client
Blockchain
Provider
Provenance Querying
If you use provenance …
… you control it
… you manage and store it
P1
Use your existing tools
Keep info flow within your organisation
P2
Embed provenance records in blockchain for security
Embed provenance update process in smart contracts for resilience
P3

User-driven Provenance Capturing
P
r
o
M
L
N
o
d
e
Provider
Clients
Service
IPFS Client
Provenance
Capturing
Blockchain
Wallet
Content Distribution Network
Dataset Model
Blockchain
Dataset
Provenance
Provenance
Update
Process
Model
Provenance
User
Interface
Capturing
Library
Blockchain Client
Use your existing tools
Keep info flow within your organisation
P2
1. Develop
Model
Developer
ML Training Script /
Notebook
Calls to Logging
API
2. Embed
3. Send 𝑝𝑚𝑖
4a. Submit payload
4b. Return CID
Storage Provider
7. Submit tx𝑝𝑚𝑖
8. Validate and Insert tx𝑝𝑚𝑖
5. Craft tx𝑝𝑚𝑖
6. Sign tx𝑝𝑚𝑖
Signer
Blockchain
Provider
Exemplary logging API
Function Parameters
selectData() datasetID, datasetVersion,
datasetMetadata: columnInfo, labelInfo
preprocessData() processedDataset,
datasetMetadata: columnInfo, labelInfo
engineerFeatures() featureList,
featureSelectAlg: algConfigs
train() classifierInfo: type, library, version, hyperparameters
model
evaluate() trainingSetRatio, F1, acc, trainingDuration
validate() F1, acc, recall, precision, Matthew, MSE, Fowlkes
deploy() model, deploymentInfo

Sample Data
Initial State Registered
Selected
Dataset
Pre-
processed
Dataset
Engineered
Feature Sets
Trained Model
Evaluated
model
Validated
Model
Deployed
ML2-4: Training:
0xd816547ccc817d8cd3b28a56a84e8f2bd960ab3c648e6425bee2eade363e2501
ML2-3: Feature engineering:
0x0e75eb311c8f4d0a89948a701729e3696d6da33bec5a7e6403543c4d676ea380
ML2-6: Validation:
0xfb67bbc4e7391ca8711d6fa9f06a688a774329deef05e732c49749b3d44657fa
ML2-5: Evaluation:
0x4a30e2905f6f774d02f80a366566492698dacd47ca2a90ff55bfd56c1f910cbc
ML2-1: Select Dataset:
0x1e440b6842ab9efd56ff995a7cc43d08a1f75ece170226c25aeacfc1946a1c66

Sample Transaction
(ML2-4: Training)

CREST Overview

Recommended

Recommended

More Related Content

Similar to CREST Overview

Similar to CREST Overview (20)

More from CREST @ University of Adelaide

More from CREST @ University of Adelaide (20)

Recently uploaded

Recently uploaded (20)

CREST Overview

Editor's Notes