SlideShare a Scribd company logo
1 of 21
CREST
Health Analytics
Platform
CREST Health Analytics Platform
• CREST platform is a one-stop solution for health data storage and analysis.
• It has comprehensive, flexible, and scalable ecosystem of frameworks.
• It allows capturing, processing, analysis, and visualisation of large volumes of
health data which are too complex for the traditional data-processing application.
Use cases and scenarios
Scenario: For emergency health situations such as pandemic or flooding, there is a high
need to do predictive analytics to know the requirement of medical supplies.
Solution: Health analytics with the CREST platform to
• Predictive analysis using outbreak patterns and other historical data
• Monitoring of cases – numbers of cases and patients' health
• Recommendations on resources for healthcare facilities
CREST
Infrastructure and
data management
CREST Infrastructure and data management
• Automated infrastructure deployment (IaaC)
• Network configuration
• Software installation
• Benchmarking experimentation testbed
•
• Run-time patching recovery
• Data storage and management
• Big data storage solutions cluster configuration
Use cases and scenarios
• Determining energy efficiency of various data workloads for low-powered devices
• Measuring performance and resource usage for various data distribution flows
• Modelling effects of node mobility under different networking scenarios
• Automated comparison of multiple data storage and processing solutions
• Detecting and recovering from broken run-time patches
CREST -
Software Security Research
Software is Everywhere
Software/AI is Everywhere
Supply Chain
Provenance of ML-
based Software
Nguyen Khoi Tran, M. Ali Babar, Mingyu Guo;
CREST – The University of Adelaide, Australia
Scenario: Distributed ML DevOps
Auditor
Model
Verifier
Model
Developer
Client /
Operator
Dataset
Admin
I have idea for an ML application
1
I hire a company to collect data for me
2
I gather the data
3
I outsource labeling to Amazon
Mechanism Turk
4
I pass the data to the appointed
developers
5
I develop the ML model
6 I test and verify the model
7
I return the model to client
8
I seek third-party validation
9
Scenario: Distributed ML DevOps goes wrong
Auditor
Model
Verifier
Model
Developer
Client /
Operator
Dataset
Admin
I have idea for an ML application
1
I hire a company to collect data for me
2
I gather the data
3
I outsource labeling to Amazon
Mechanism Turk
4
I pass the data to the appointed
developers
5
I develop the ML model
6 I test and verify the model
7
I return the model to client
8
I seek third-party validation
9
Deliberate mislabeling
(poisoning)
Dataset tampering
Vulnerability in ML
frameworks
Model swapping
Cover-up
Not enough
information
How to capture and preserve the
records of “who did what” to ML assets
(a.k.a., workflow provenance information)
in a distributed ML workflow environment?
Existing Approach: A Centralised Platform
Auditor
Model
Verifier
Model
Developer
Client /
Operator
Dataset
Admin
Provenance Database
Dataset Provenance
(e.g., data sheet)
Auditing results
Asset Provenance
Model development records
Model testing records
(e.g., model card)
Problem Summary
Problem
Security Authenticity
Integrity
Non-repudiation
Resilience Availability and fault tolerance
Decentralization Disintermediation
User-driven
Control information flow
Decentralized Software Platform
Auditor
Model Verifier
Model
Developer
Model
Operator
Dataset
Admin
ProML
Node
ProML
Node
ProML
Node
ProML
Node
ProML
Node
Provenance Update
Broadcasts
P
r
o
M
L
N
o
d
e
Provider
Clients
Service
IPFS Client
Storage Provider
Provenance Capturing
Blockchain
Wallet
Signer
Content Distribution Network
Dataset Model
Blockchain
Dataset
Provenance
Provenance
Update
Process
Model
Provenance
User
Interface
CLI
Client
Capturing
Library
Query
Interface
Blockchain Client
Blockchain
Provider
Provenance Querying
If you use provenance …
… you control it
… you manage and store it
P1
Use your existing tools
Keep info flow within your organisation
P2
Embed provenance records in blockchain for security
Embed provenance update process in smart contracts for resilience
P3
User-driven Provenance Capturing
P
r
o
M
L
N
o
d
e
Provider
Clients
Service
IPFS Client
Provenance
Capturing
Blockchain
Wallet
Content Distribution Network
Dataset Model
Blockchain
Dataset
Provenance
Provenance
Update
Process
Model
Provenance
User
Interface
Capturing
Library
Blockchain Client
Use your existing tools
Keep info flow within your organisation
P2
1. Develop
Model
Developer
ML Training Script /
Notebook
Calls to Logging
API
2. Embed
3. Send 𝑝𝑚𝑖
4a. Submit payload
4b. Return CID
Storage Provider
7. Submit tx𝑝𝑚𝑖
8. Validate and Insert tx𝑝𝑚𝑖
5. Craft tx𝑝𝑚𝑖
6. Sign tx𝑝𝑚𝑖
Signer
Blockchain
Provider
Exemplary logging API
Function Parameters
selectData() datasetID, datasetVersion,
datasetMetadata: columnInfo, labelInfo
preprocessData() processedDataset,
datasetMetadata: columnInfo, labelInfo
engineerFeatures() featureList,
featureSelectAlg: algConfigs
train() classifierInfo: type, library, version, hyperparameters
model
evaluate() trainingSetRatio, F1, acc, trainingDuration
validate() F1, acc, recall, precision, Matthew, MSE, Fowlkes
deploy() model, deploymentInfo
Sample Data
Initial State Registered
Selected
Dataset
Pre-
processed
Dataset
Engineered
Feature Sets
Trained Model
Evaluated
model
Validated
Model
Deployed
ML2-4: Training:
0xd816547ccc817d8cd3b28a56a84e8f2bd960ab3c648e6425bee2eade363e2501
ML2-3: Feature engineering:
0x0e75eb311c8f4d0a89948a701729e3696d6da33bec5a7e6403543c4d676ea380
ML2-6: Validation:
0xfb67bbc4e7391ca8711d6fa9f06a688a774329deef05e732c49749b3d44657fa
ML2-5: Evaluation:
0x4a30e2905f6f774d02f80a366566492698dacd47ca2a90ff55bfd56c1f910cbc
ML2-1: Select Dataset:
0x1e440b6842ab9efd56ff995a7cc43d08a1f75ece170226c25aeacfc1946a1c66
Sample Transaction
(ML2-4: Training)

More Related Content

Similar to CREST Overview

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)Matt Barnes
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platformibemam
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...Amazon Web Services
 
Philips john huffman
Philips john huffmanPhilips john huffman
Philips john huffmanBigDataExpo
 
Next Gen Clinical Data Sciences
Next Gen Clinical Data SciencesNext Gen Clinical Data Sciences
Next Gen Clinical Data SciencesSaama
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Databricks
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014Raja Chiky
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and ImplementationSHIKHA GAUTAM
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...David Peyruc
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralPaolo Missier
 
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...Matt Stine
 
Evaluation of the importance of standards for data and metadata exchange for ...
Evaluation of the importance of standards for data and metadata exchange for ...Evaluation of the importance of standards for data and metadata exchange for ...
Evaluation of the importance of standards for data and metadata exchange for ...Wolfgang Kuchinke
 
Chip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureChip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureMarco van der Hart
 
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...Perficient
 

Similar to CREST Overview (20)

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Burton - Security, Privacy and Trust
Burton - Security, Privacy and TrustBurton - Security, Privacy and Trust
Burton - Security, Privacy and Trust
 
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...
 
Philips john huffman
Philips john huffmanPhilips john huffman
Philips john huffman
 
Next Gen Clinical Data Sciences
Next Gen Clinical Data SciencesNext Gen Clinical Data Sciences
Next Gen Clinical Data Sciences
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
 
Data mining
Data miningData mining
Data mining
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and Implementation
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
 
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
 
Evaluation of the importance of standards for data and metadata exchange for ...
Evaluation of the importance of standards for data and metadata exchange for ...Evaluation of the importance of standards for data and metadata exchange for ...
Evaluation of the importance of standards for data and metadata exchange for ...
 
Chip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureChip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochure
 
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
 

More from CREST @ University of Adelaide

Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...
Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...
Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...CREST @ University of Adelaide
 
Understanding and Addressing Architectural Challenges of Cloud- Based Systems
Understanding and Addressing Architectural Challenges of Cloud- Based SystemsUnderstanding and Addressing Architectural Challenges of Cloud- Based Systems
Understanding and Addressing Architectural Challenges of Cloud- Based SystemsCREST @ University of Adelaide
 
DevSecOps: Continuous Engineering with Security by Design: Challenges and Sol...
DevSecOps: Continuous Engineering with Security by Design: Challenges and Sol...DevSecOps: Continuous Engineering with Security by Design: Challenges and Sol...
DevSecOps: Continuous Engineering with Security by Design: Challenges and Sol...CREST @ University of Adelaide
 
A Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
A Deep Dive into the Socio-Technical Aspects of Delays in Security PatchingA Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
A Deep Dive into the Socio-Technical Aspects of Delays in Security PatchingCREST @ University of Adelaide
 
Mining Software Repositories for Security: Data Quality Issues Lessons from T...
Mining Software Repositories for Security: Data Quality Issues Lessons from T...Mining Software Repositories for Security: Data Quality Issues Lessons from T...
Mining Software Repositories for Security: Data Quality Issues Lessons from T...CREST @ University of Adelaide
 
A Decentralised Platform for Provenance Management of Machine Learning Softwa...
A Decentralised Platform for Provenance Management of Machine Learning Softwa...A Decentralised Platform for Provenance Management of Machine Learning Softwa...
A Decentralised Platform for Provenance Management of Machine Learning Softwa...CREST @ University of Adelaide
 
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...CREST @ University of Adelaide
 
Falling for Phishing: An Empirical Investigation into People's Email Response...
Falling for Phishing: An Empirical Investigation into People's Email Response...Falling for Phishing: An Empirical Investigation into People's Email Response...
Falling for Phishing: An Empirical Investigation into People's Email Response...CREST @ University of Adelaide
 
An Experience Report on the Design and Implementation of an Ad-hoc Blockchain...
An Experience Report on the Design and Implementation of an Ad-hoc Blockchain...An Experience Report on the Design and Implementation of an Ad-hoc Blockchain...
An Experience Report on the Design and Implementation of an Ad-hoc Blockchain...CREST @ University of Adelaide
 
Gazealytics: A Unified and Flexible Visual Toolkit for Exploratory and Compar...
Gazealytics: A Unified and Flexible Visual Toolkit for Exploratory and Compar...Gazealytics: A Unified and Flexible Visual Toolkit for Exploratory and Compar...
Gazealytics: A Unified and Flexible Visual Toolkit for Exploratory and Compar...CREST @ University of Adelaide
 
Detecting Misuses of Security APIs: A Systematic Review
Detecting Misuses of Security APIs: A Systematic ReviewDetecting Misuses of Security APIs: A Systematic Review
Detecting Misuses of Security APIs: A Systematic ReviewCREST @ University of Adelaide
 
Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...
Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...
Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...CREST @ University of Adelaide
 
Energy Efficiency Evaluation of Local and Offloaded Data Processing
Energy Efficiency Evaluation of Local and Offloaded Data ProcessingEnergy Efficiency Evaluation of Local and Offloaded Data Processing
Energy Efficiency Evaluation of Local and Offloaded Data ProcessingCREST @ University of Adelaide
 

More from CREST @ University of Adelaide (20)

Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...
Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...
Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...
 
Making Software and Software Engineering visible
Making Software and Software Engineering visibleMaking Software and Software Engineering visible
Making Software and Software Engineering visible
 
Understanding and Addressing Architectural Challenges of Cloud- Based Systems
Understanding and Addressing Architectural Challenges of Cloud- Based SystemsUnderstanding and Addressing Architectural Challenges of Cloud- Based Systems
Understanding and Addressing Architectural Challenges of Cloud- Based Systems
 
DevSecOps: Continuous Engineering with Security by Design: Challenges and Sol...
DevSecOps: Continuous Engineering with Security by Design: Challenges and Sol...DevSecOps: Continuous Engineering with Security by Design: Challenges and Sol...
DevSecOps: Continuous Engineering with Security by Design: Challenges and Sol...
 
A Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
A Deep Dive into the Socio-Technical Aspects of Delays in Security PatchingA Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
A Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
 
Mining Software Repositories for Security: Data Quality Issues Lessons from T...
Mining Software Repositories for Security: Data Quality Issues Lessons from T...Mining Software Repositories for Security: Data Quality Issues Lessons from T...
Mining Software Repositories for Security: Data Quality Issues Lessons from T...
 
A Decentralised Platform for Provenance Management of Machine Learning Softwa...
A Decentralised Platform for Provenance Management of Machine Learning Softwa...A Decentralised Platform for Provenance Management of Machine Learning Softwa...
A Decentralised Platform for Provenance Management of Machine Learning Softwa...
 
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
 
Falling for Phishing: An Empirical Investigation into People's Email Response...
Falling for Phishing: An Empirical Investigation into People's Email Response...Falling for Phishing: An Empirical Investigation into People's Email Response...
Falling for Phishing: An Empirical Investigation into People's Email Response...
 
An Experience Report on the Design and Implementation of an Ad-hoc Blockchain...
An Experience Report on the Design and Implementation of an Ad-hoc Blockchain...An Experience Report on the Design and Implementation of an Ad-hoc Blockchain...
An Experience Report on the Design and Implementation of an Ad-hoc Blockchain...
 
Gazealytics: A Unified and Flexible Visual Toolkit for Exploratory and Compar...
Gazealytics: A Unified and Flexible Visual Toolkit for Exploratory and Compar...Gazealytics: A Unified and Flexible Visual Toolkit for Exploratory and Compar...
Gazealytics: A Unified and Flexible Visual Toolkit for Exploratory and Compar...
 
Detecting Misuses of Security APIs: A Systematic Review
Detecting Misuses of Security APIs: A Systematic ReviewDetecting Misuses of Security APIs: A Systematic Review
Detecting Misuses of Security APIs: A Systematic Review
 
Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...
Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...
Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...
 
Data Quality for Software Vulnerability Dataset
Data Quality for Software Vulnerability DatasetData Quality for Software Vulnerability Dataset
Data Quality for Software Vulnerability Dataset
 
Mod2Dash Presentation
Mod2Dash PresentationMod2Dash Presentation
Mod2Dash Presentation
 
Run-time Patching and updating Impact Estimation
Run-time Patching and updating Impact EstimationRun-time Patching and updating Impact Estimation
Run-time Patching and updating Impact Estimation
 
ECSA 2023 Ubuntu Case Study
ECSA 2023 Ubuntu Case StudyECSA 2023 Ubuntu Case Study
ECSA 2023 Ubuntu Case Study
 
Energy Efficiency Evaluation of Local and Offloaded Data Processing
Energy Efficiency Evaluation of Local and Offloaded Data ProcessingEnergy Efficiency Evaluation of Local and Offloaded Data Processing
Energy Efficiency Evaluation of Local and Offloaded Data Processing
 
Designing Quality-Driven Blockchain Networks
Designing Quality-Driven Blockchain NetworksDesigning Quality-Driven Blockchain Networks
Designing Quality-Driven Blockchain Networks
 
Privacy Engineering in the Wild
Privacy Engineering in the WildPrivacy Engineering in the Wild
Privacy Engineering in the Wild
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

CREST Overview

  • 2. CREST Health Analytics Platform • CREST platform is a one-stop solution for health data storage and analysis. • It has comprehensive, flexible, and scalable ecosystem of frameworks. • It allows capturing, processing, analysis, and visualisation of large volumes of health data which are too complex for the traditional data-processing application.
  • 3. Use cases and scenarios Scenario: For emergency health situations such as pandemic or flooding, there is a high need to do predictive analytics to know the requirement of medical supplies. Solution: Health analytics with the CREST platform to • Predictive analysis using outbreak patterns and other historical data • Monitoring of cases – numbers of cases and patients' health • Recommendations on resources for healthcare facilities
  • 5. CREST Infrastructure and data management • Automated infrastructure deployment (IaaC) • Network configuration • Software installation • Benchmarking experimentation testbed • • Run-time patching recovery • Data storage and management • Big data storage solutions cluster configuration
  • 6. Use cases and scenarios • Determining energy efficiency of various data workloads for low-powered devices • Measuring performance and resource usage for various data distribution flows • Modelling effects of node mobility under different networking scenarios • Automated comparison of multiple data storage and processing solutions • Detecting and recovering from broken run-time patches
  • 10.
  • 11.
  • 12. Supply Chain Provenance of ML- based Software Nguyen Khoi Tran, M. Ali Babar, Mingyu Guo; CREST – The University of Adelaide, Australia
  • 13. Scenario: Distributed ML DevOps Auditor Model Verifier Model Developer Client / Operator Dataset Admin I have idea for an ML application 1 I hire a company to collect data for me 2 I gather the data 3 I outsource labeling to Amazon Mechanism Turk 4 I pass the data to the appointed developers 5 I develop the ML model 6 I test and verify the model 7 I return the model to client 8 I seek third-party validation 9
  • 14. Scenario: Distributed ML DevOps goes wrong Auditor Model Verifier Model Developer Client / Operator Dataset Admin I have idea for an ML application 1 I hire a company to collect data for me 2 I gather the data 3 I outsource labeling to Amazon Mechanism Turk 4 I pass the data to the appointed developers 5 I develop the ML model 6 I test and verify the model 7 I return the model to client 8 I seek third-party validation 9 Deliberate mislabeling (poisoning) Dataset tampering Vulnerability in ML frameworks Model swapping Cover-up Not enough information
  • 15. How to capture and preserve the records of “who did what” to ML assets (a.k.a., workflow provenance information) in a distributed ML workflow environment?
  • 16. Existing Approach: A Centralised Platform Auditor Model Verifier Model Developer Client / Operator Dataset Admin Provenance Database Dataset Provenance (e.g., data sheet) Auditing results Asset Provenance Model development records Model testing records (e.g., model card)
  • 17. Problem Summary Problem Security Authenticity Integrity Non-repudiation Resilience Availability and fault tolerance Decentralization Disintermediation User-driven Control information flow
  • 18. Decentralized Software Platform Auditor Model Verifier Model Developer Model Operator Dataset Admin ProML Node ProML Node ProML Node ProML Node ProML Node Provenance Update Broadcasts P r o M L N o d e Provider Clients Service IPFS Client Storage Provider Provenance Capturing Blockchain Wallet Signer Content Distribution Network Dataset Model Blockchain Dataset Provenance Provenance Update Process Model Provenance User Interface CLI Client Capturing Library Query Interface Blockchain Client Blockchain Provider Provenance Querying If you use provenance … … you control it … you manage and store it P1 Use your existing tools Keep info flow within your organisation P2 Embed provenance records in blockchain for security Embed provenance update process in smart contracts for resilience P3
  • 19. User-driven Provenance Capturing P r o M L N o d e Provider Clients Service IPFS Client Provenance Capturing Blockchain Wallet Content Distribution Network Dataset Model Blockchain Dataset Provenance Provenance Update Process Model Provenance User Interface Capturing Library Blockchain Client Use your existing tools Keep info flow within your organisation P2 1. Develop Model Developer ML Training Script / Notebook Calls to Logging API 2. Embed 3. Send 𝑝𝑚𝑖 4a. Submit payload 4b. Return CID Storage Provider 7. Submit tx𝑝𝑚𝑖 8. Validate and Insert tx𝑝𝑚𝑖 5. Craft tx𝑝𝑚𝑖 6. Sign tx𝑝𝑚𝑖 Signer Blockchain Provider Exemplary logging API Function Parameters selectData() datasetID, datasetVersion, datasetMetadata: columnInfo, labelInfo preprocessData() processedDataset, datasetMetadata: columnInfo, labelInfo engineerFeatures() featureList, featureSelectAlg: algConfigs train() classifierInfo: type, library, version, hyperparameters model evaluate() trainingSetRatio, F1, acc, trainingDuration validate() F1, acc, recall, precision, Matthew, MSE, Fowlkes deploy() model, deploymentInfo
  • 20. Sample Data Initial State Registered Selected Dataset Pre- processed Dataset Engineered Feature Sets Trained Model Evaluated model Validated Model Deployed ML2-4: Training: 0xd816547ccc817d8cd3b28a56a84e8f2bd960ab3c648e6425bee2eade363e2501 ML2-3: Feature engineering: 0x0e75eb311c8f4d0a89948a701729e3696d6da33bec5a7e6403543c4d676ea380 ML2-6: Validation: 0xfb67bbc4e7391ca8711d6fa9f06a688a774329deef05e732c49749b3d44657fa ML2-5: Evaluation: 0x4a30e2905f6f774d02f80a366566492698dacd47ca2a90ff55bfd56c1f910cbc ML2-1: Select Dataset: 0x1e440b6842ab9efd56ff995a7cc43d08a1f75ece170226c25aeacfc1946a1c66

Editor's Notes

  1. Good morning everyone, I'm Triet, a research fellow in CREST. And today I'd like to share with you an overview of our research on software security.
  2. To start with, I believe that you all can recognize most if not all the icons on the screen here. Actually, it's even hard to imagine that we don't use any of these software apps in a day. At least, for me, I don't even remember how many emails I've sent and received this week. Nowadays, software exists everywhere and has become a part of our daily life. They've drastically changed the way we live, work, and interact. And of course, software is also widely used for the healthcare domain. A notable and recent example would be the contact tracing app, or specifically CovidSafe app we have here in Australia.
  3. And if you think about these apps, these aren't just "software", but they're actually "AI powered software". For example, Google products use advanced AI recommender systems to show us the next video to watch on YouTube. In the context of healthcare and specifically COVID-19, as far as I know, AI has been utilised to predict the hot-spot locations or the number of new positive cases next week to help government come up with suitable preventive measures early. As we can see, these software apps and technologies are very useful for us, but they also contain security risks that can lead to catastrophic consequences.
  4. For example, last year, you may have heard about the Log4J vulnerability, which took the entire Internet by storm back then. This vulnerability can be exploited and affect millions of systems around the world. And it's estimated that billions of dollars will be lost in the resulting cyber attacks caused by just this one vulnerability. And this’s just one example among the thousands of critical vulnerabilities discovered every single year. So you can see much damage vulnerabilities can cause if we don’t prevent and address them on time.
  5. And the vision of our research is to prevent such dangerous vulnerabilities. Specifically, we aim to develop tools and techniques and distill practices to provide early information and warnings about software vulnerabilities for both expert and non-expert users. Our research mainly leverages various data sources to develop high-performing and robust AI/data-driven techniques to automate and give insights into the whole vulnerability lifecycle, ranging from early detecting these vulnerabilities, to assessing them in terms of their probability of exploitation and impacts, and then giving recommendations to developers to plan and prioritise the mitigation and fixing. And currently, our research targets both traditional and contemporary AI-based systems as well as the supporting infrastructure for these systems. For AI systems, we've focused on phishing detection systems, for example, systems preventing the spam emails that can trick users into clicking malicious links and losing their personal data. It's worth noting that so far, we have not analysed vulnerabilities in healthcare apps. So, I believe that can be one area of collaboration that we can explore in the meeting today. And that's all about our current software security research. Thank you.
  6. We can leverage decentralised technology to solve this problem?
  7. Let's see how these principles manifest in the platform. According to the first design principle, we structure the ProML platform as a collection of peer nodes, called ProML nodes. All participants who collaborate on an ML model can deploy a ProML node within their organisation. Each ProML node acts as a representative of a participant. All of the interconnected ProML nodes have equal rights and responsibility to access, update, and secure ML provenance information. ProML nodes are synchronised with each other using a blockchain protocol. The ProML nodes themselves can act as full nodes to form a blockchain network. Alternatively, the framework can rely on a remote blockchain network. ProML nodes are also gateways for participants to interact with the provenance information. Through APIs and command line interfaces, the ML toolsets of participants submit and query provenance information. No new tools are necessary. It should also be noted that these exchanges of information happen within organisational boundaries, thus fulfilling the design principle 2. ProML also supports peer-to-peer content distribution network as an alternative venue for storing and distributing models and datasets.
  8. Let's take a closer look at how provenance information is captured. The process is initiated by a workflow participant, such as a model developer. They embed function calls to a Logging API provided by their local ProML node. When the training script or notebook runs, the function calls will happen at the exact spot specified by the model developer and submit the requested provenance information to the local ProML node. If a participant chooses to, the ProML node can offload the payload part of the submited proveannce information, such as a dataset or binary of a model, and replace the payload with its corresponding hash. This process is called offloading. After offloading, the ProML node transforms the submitted provenance information into a blockchain transaction and sign it on behalf of the participant. Finally, ProML node submits the transaction to the blockchain via its local blockchain client. After the blockchain mining process is completed, the new information would be available across all workflow participants.
  9. Here are some sample transactions on the Ropsten network that capture some key provenance updates. The hex strings are hash of blockchain transactions on the Ropsten network.
  10. For instance, the training record ML2-4 contains the information required by researchers in the case project, such as hyperparameters, type and version of the utilised ML training library.