Big Data and Health Care

BIG DATA
DIGITAL HEALTH
REVOLUTION
Alex A0135681
Henri A0135487
Zheng A0121892
Pham A0095804
Yin A0119974
Kavitha A0110143
For information on other technologies, see http://www.slideshare.net/Funk98/presentations

HAVE YOU EVER
VISITED A DOCTOR?

BIG DATA
DIGITAL HEALTH
REVOLUTION

CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE
HEALTHCARE
SYSTEM

CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE
HEALTHCARE
SYSTEM
TODAY
IN FUTURE

SENSORS
TODAY
iBGStar
iHealth wireless
pulse oximeter
Jawbon
e Withings smart body
analyser

iBGStar
iHealth wireless
pulse oximeter
Jawbon
e Withings smart body
analyser
CALORI
ES
EATING
HABITS
SLEEP
BODY
TEMPERATUR
E
HEART
RATE
BLOOD
SUGAR
SENSORS
TODAY

DETECTIO
N
ANALYSIS DIAGNOSTIC
S
CELL
CULTURE
DRUG
DELIVERY
THERAPEUTI
CS
SENSORS
IN FUTURE

Continuous MicroCHIPS
Glucose Monitoring
Google lens
MIT batteryless power
source
Parathyroid
hormone
microchip injection
SENSORS
IN FUTURE
Sensor-Laden
Transdermal patch

SENSORS
IN FUTURE - BioMEMS and Microsystems

SENSORS
IN FUTURE - Micro supercapacitors
Laser-scribed graphene micro-supercapacitors

SENSORS
IN FUTURE - Reduction in MOSFET size

SENSORS
IN FUTURE - External communication

SENSORS
IN FUTURE - The trend in shrinking sells

SENSORS
IN FUTURE - BioMEMS and Microsystems
● Size decrease
● Better and smaller communication chips and algorithms
● micro supercapacitors
● This will facilitate the arrival of these new implantable chips
● Allows for non bothersome personal medicine
● Allow for more tailored medicine
● It will require more data analysis and more processing power

CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE
HEALTHCARE
SYSTEM
Introduction
SSD vs HDD
Data Protection

The Storage Medium used is
of More focus than the
Quantity of Storage used. It is
no longer one-size-fits-all
“Data Deluge” is
Fundamentally Changing the
way that Storage is
Approached.
HARDWARE
Introduction

● Provide Real-time Or Near Real-time
Responses.
● Handle Huge Data Volumes Growing Rapidly
Key Characteristics of Big Data Infrastructure:
● High processing/IOPS performance
● Very Large Capacity.
HARDWARE
What’s Key to Efficient Data Processing?

KEY DIFFERENTIATOR
● Big Data is Largely Unstructured.
● Unstructured Data is Immutable
● Traditional File Systems have Built-in Functions to handle Insert/Update.
● Creates a Lot of Overhead in Terms of Performance, IOs Required to
Access Data and the Ability to Scale
HARDWARE
WHY DO WE NEED A DIFFERENT APPROACH?
FIG. GROWTH OF UNSTRUCTURED DATA ANNUA

● Objects in one Large, Scalable Pool of Storage
● Stores metadata – Information about the
object
● An Object ID is stored, to Locate the Data
● Objects are immutable
● No File System Hierarchy
Products:
● Scality’s RING architecture
● Dell DX
● EMC’s Atmos
HARDWARE
OBJECT STORAGE – Choice of Storage

● Access Times
SSDs exhibit Virtually no Access
time
● Random I/O Performance of SSD
SSD Delivers at least 6000 IO/Sec
15 times faster than HDD(400
IO/S)
● Reliability
SSDs 4-10 Times more Reliable
HARDWARE
Storage Medium Solid-State Drive (SSD) or Hard Disk
Drive(HDD)
SSD

REAL TIME APPLICATIONS OF SSD
● Read-Intensive Video-on-demand(VOD), and Image-Retrieval
Applications.
● Emerging Applications (Big Data/Hadoop/Cloud)
HARDWARE
COMPARISON OF BOOT TIMES USING SSD & HDD

2011
Throughput 250 MB/s , Capacity 512GB
2014:
1000 MB/s Data Transfer , Capacity 4TB
Standard 2.5 inch form factor
Further Scale Down of Flash
Lithography
Leads to Continued Performance Gains
and Greater Capacity Points.
HARDWARE
Solid-State drives SSDs & Moore’s Law
Fig 1.HDD Aerial Density follow Moore’s
Law
Fig2. Avg. Price Comparison of SSD Vs.

HARDWARE
DATA PROTECTION – WHY DIFFER FROM TRADITIONAL
APPROACHES?

RAID (REDUNDANT ARRAY OF INDEPENDENT DISKS)
● Originally Designed for Small Capacity Disks.
● Longer Time taken to Restore a Failed Drive as Capacity Increase.
● To Shorten Longer Rebuild cycles, RAID Systems Ship with Faster Processors,
Leading to High Energy Consumption.
REPLICATION
● Copies Add Additional Costs: Typically 133% or more Additional Storage is
needed for each Additional Copy
● Storage System will get More Expensive as the amount of Data Increases
HARDWARE
Limitations of Traditional Approaches

How Does it Work?
● Information Dispersal Algorithms (IDAs)
separate data into Unrecognizable slices of
information.
● It is then dispersed to Storage Nodes in
disparate Storage locations.
● It can be implemented Locally or
Distributed .
● Only a Pre-defined subset of the slices From
the Dispersed Storage Nodes is needed to fully
Retrieve all of the Data.
HARDWARE
Information Dispersal - Better Approach?

● It is Resilient against Natural disasters or Technological failures, like
Drive failures, System Crashes and Network Failures.
● Data can still be Accessed in Real-time even if there are Multiple
Simultaneous Failures across a String of Hosting Devices, Servers or
Networks
● Five 9’s or More are Guaranteed with Overhead Low as 20% - As
Opposed To 3 Copies Requiring 200% Overhead.
HARDWARE
Benefits of Information Dispersal

HARDWARE
Cost Savings from IDA in Petabyte Storage over RAID and
Replication

When looking at Number of Years without Data loss, with a 99.99999% Confidence Level,
Information Dispersal doesn’t even appear on the Chart because even For a Large storage amount
like 524K Terabytes, the Confidence for Years without data loss is not within anyone’s
lifetime.(Theoretically Over 79 Million Years.)
HARDWARE
Cost Savings from IDA in Petabyte Storage over RAID and
Replication

CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE
HEALTHCARE
SYSTEM
Deal with huge
data
Machine learning

How make the huge dataset to match the ICD 10?
ALGORITHMS
Deal with the huge data

ICD 10 Clinical
Modifications
69823
ICD CM Dataset • 3-7 characters
• Character 1 is alpha
• Character 2 is numeric
• Character 3-7 can be alpha
or numeric
ICD 10 Procedure
Coding System
76000
ICD 10 PCS Dataset • 7 characters
• Each one can be alpha or
numeric
• Numbers 0-9; letters A-H, J-
N, P-Z
ALGORITHMS
ICD 10 introduction

Analytics Algorithms
Machine Learning
Image Retrieval system
Huge Nonstandard
Data Source (4V)
Data Feature Selection
Huge multiple
characters mapping
databases
Data Analytics
Volume
Velocity
Variety
Veracity
ALGORITHMS
Why we need big data

CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE
HEALTHCARE
SYSTEM
Deal with huge data
Machine learning

Diagnosis is a relatively straightforward
machine learning problem. Clinical
decision making is highly suited for
rule-based systems because of the
nature of the data, such as ICD-10
codes, medications, etc.,
ALGORITHMS
Machine Learning in medical diagnosis

ALGORITHMS
Popular Imaging Modalities in Healthcare Domain

ALGORITHMS
Medical Image Retrieval System

*ImageCLEF medical – competition on Medical Image Processing
Two main tasks:
● Image–based retrieval
● Case–based retrieval
source : http://www.imageclef.org/
# of images
ALGORITHMS
Database of ImageCLEF Data Medical
competition

• This is the classic medical retrieval task.
• Similar to Query by Image Example.
• Given the query image, find the most similar images.
http://www.imageclef.org/
# performance
ALGORITHMS
Image base retrieval Algorithm
Performance = Difficulty * Accuracy
# of images Mean average
precision

• This is a more complex
task; is closer to the
clinical workflow.
• A case description, with
patient demographics,
limited symptoms and test
results including imaging
studies, is provided (but
not the final diagnosis).
• The goal is to retrieve
cases including images
that might best suit the
provided case description.
http://www.imageclef.org/
ALGORITHMS
Case-based retrieval Algorithm

Speed Slow Fast
Accuracy Hard to keep Precision
Level to study Quite hard Easy to learn
Solution level Shallow Deep
Machine
Learning
NO YES
Result Hard to explain Perspective visualization
ALGORITHMS
Manual Calculate VS Software and Algorithm

CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE
HEALTHCARE
SYSTEM
Technological
fusion

TECHNOLOGICAL FUSION
BioMEMS Hardware Object Storage
Information Dispersal
Machine Learning

More data can be
gathered to identify
patterns and
interactions
Doctors will use for
diagnosis and decision-
making
Health care costs will
decrease
Individual patient care
will improve
TECHNOLOGICAL FUSION
CONCLUSION

Big Data and Health Care

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data and Health Care

Similar to Big Data and Health Care (20)

More from Jeffrey Funk

More from Jeffrey Funk (20)

Recently uploaded

Recently uploaded (20)

Big Data and Health Care