SlideShare a Scribd company logo
“Big Data” 
For 
Medicine & Health Care - 
An Introductory Tutorial 
Frank W Meissner, MD, RDMS, RDCS 
FACP, FACC, FCCP, FASNC, CPHIMS, CCDS 
Diplomate- Subspecialty Board of Advanced Heart Failure & Transplant Cardiology 
Diplomate - Certification Board of Cardiovascular Computed Tomography 
Certified Professional Health Information and Management Systems 
Diplomate- Subspecialty Board of Cardiovascular Diseases 
Diplomate - Subspecialty Board of Critical Care Medicine 
Diplomate - Certification Board of Nuclear Cardiology 
Diplomate - American Board of Forensic Medicine 
Diplomate- American Board of Internal Medicine 
Diplomate - National Board of Echocardiography 
Certified Cardiac Device Specialist - Physician
Big Data - Definition 
(With Apologies to Douglas Adams) 
Big Data - 
You just won't believe how vastly, 
hugely, 
mind-bogglingly big it is.” 
The Hitchhiker’s Guide to the Galaxy
Seriously, 
Big Data A Real Definition 
Big data is an evolving term that describes any 
voluminous amount of structured, semi-structured and 
unstructured data that has the potential to be mined for 
information 
Although big data doesn't refer to any specific quantity, 
the term is often used when speaking about petabytes 
(PB) and exabytes (EB) of data 
1 PB = 1000000000000000B = 1015 bytes = 1000 terabytes 
1 EB = 10006 bytes = 1018 bytes = 1000000000000000000 B = 1000 
petabytes = 1 million terabytes = 1 billion gigabytes
Data Source/Streams 
-4- 
Big Data Analytics
‘Big Data’ - An Operational Definition 
Big Data is 
High Volume 
High Speed 
High Variety 
High Veracity 
THE Data demands new types and forms of info processing to 
support decision support, insight discovery, and process 
optimization
A Proto-Typical Big Data Project 
(With More Apologies to Douglas Adams) 
“O Deep Thought computer," he said, 
"the task we have designed you to perform is this. 
We want you to tell us...." he paused, "The Answer." 
"The Answer?" said Deep Thought. "The Answer to what?" 
"Life!" urged Fook. 
"The Universe!" said Lunkwill. 
"Everything!" they said in chorus. 
Deep Thought paused for a moment's reflection. 
"Tricky," he said finally. 
"But can you do it?" 
Again, a significant pause. 
"Yes," said Deep Thought, "I can do it." 
"There is an answer?" said Fook with breathless excitement. 
"Yes," said Deep Thought. "Life, the Universe, and Everything. 
There is an answer. But, I'll have to think about it."
The 3-Dimensions of Big Data 
Volume, Velocity, Variety
Data Validity/Veracity 
The 4th Dimension of Big Data 
Raw data may not be valid 
May be incomplete (missing attributes or values) 
May be ‘noisy’ (contains outliers or errors) 
May be inconsistent (Invalid data, e.g., state/zip code mismatch )
Data Variety 
Aggregating structured and unstructured data 
in preparation for data analysis 
Nontrivial & complex task 
As in all Informatics efforts standards for data 
exchange are essential & vital
Data Velocity 
Salient Issue #1 - How often to sample your 
data 
Salient Issue #2 - How much can you afford to 
pay for data sampling 
Answers to #1 & #2 define data velocity
Data Volume 
Not just the magnitude of storage 
Wide variety of data also essential driver for 
the ‘Big’ in Big Data 
So Volume & Variety inexorably intertwined 
In fact Data Volume is directly proportional to 
data Variety & Velocity, i.e., specify Variety of 
data sources & Velocity of data streams => 
Data Volume Requirements
By 2015 Average Hospital 
Generates 2/3 Petra-byte Patient Data Per Year
Predictable 
‘Big Data’ Challenges 
Analysis, 
Capture, 
Curation, 
Search, 
Sharing, 
Storage, 
Transfer, 
Visualization, 
Privacy Violations
Knowledge Discovery 
Data Warehouse vs Big Data 
Data Warehouse 
Predefined & Structured Data 
Non-operational relational data-base 
On Line Analytical Processing of Data 
Conventional SQL Query Tools 
Exploratory Statistical Analysis 
Data Visualization Techniques 
K-nearest neighbor analysis 
Decision Trees & Association Rules 
Construction of Genetic Algorithms & Neural Network
Knowledge Discovery 
Via 
The Data Warehouse
Knowledge Discovery 
Data Warehouse vs Big Data 
Big Data Approach 
Undefined & UnStructured Data 
Non relational data-bases via Hadoop Distributed File 
System 
Massively Distributed Data Processing VIA Hadoop 
(open-source Java-based programming framework 
for processing large datasets in a distributed 
computing environment) (Currently version 0.23) 
Economical - traditional data storage $5 per 
gigabyte - Hadoop storage $0.25 per gigabyte
Other Open Source Tools 
Avro - data serialization system 
Cassandra - scalable multi-master database (critical design feature no single 
points of failure) 
Chukwa - data collection system for managing large distributed systems 
Hbase - scalable distributed database supporting structured data storage of large 
tables 
Hive - data warehouse infrastructure providing data summarization & ad hoc 
query capacities 
Mahout - scalable machine learning & data mining library 
PIG - high-level data-flow language and execution framework for parallel 
computation 
ZooKeeper - high performance coordination service for distributed applications
Big Data System Architecture
Q: Why Hadoop? 
A: Bigger Slice of the Info- Pie!
Classical Relational Data Model
Hadoop Data Model 
Flat File Structure any Format 
No data schema 
Files automatically partitioned into defined blocks
Classical Distributed 
Database Model 
Transactional & 
State Dependent 
Atomicity 
Consistency 
Isolation 
Durability
Hadoop Distributed 
Database Model 
Database “Job” 
Job Divided into Tasks 
Map-Reduce Computing Model 
Every Task either a Map 
or 
Reduce
Hadoop Computing Framework 
Two conceptual layers 
Hadoop Distributed File System 
File broken into definable blocks 
Stored on minimum of 3 servers for fault tolerance 
Execution engine (MapReduce) 
Reduces file requests into smaller requests 
Optimizes scalable use of CPU resources
A Simple Example: Word Count 
Count Each Occurrence of a Single Word in a Dataset
A More Complex Task Join Databases 
The network functions here like any peer-peer distributed file sharing 
system such as that seen with the bit- torrent protocol
A Generalized Schema 
MapReduce Generalized Flow Schema
Hadoop Cluster 
Hadoop File System (HDFS) building block of the computing cluster 
HDFS breaks incoming files into blocks and stores with triple 
redundancy across the network 
Computation on the block occurs at the storage node 
The Well Known SETI@home project serves as easily 
understandable example of this computing model
File Characteristics 
‘Write Once’ files - original input data not modified - 
triple redundantly stored 
Input data streamed into HDFS - processed by 
MapReduce - any results stored back in HDFS 
Obviously HDFS not general purpose file system
HDFS System Architecture
MapReduce 
Programming Model Enabling Massive Distributed Parallel Computations 
Originally proprietary Google Technology 
Map() procedure performs filtering and sorting 
Reduce() procedure performs summary operation 
Model was inspired but are not strictly analogous to the functional 
programming map & reduce functions 
The power of the model lays within the multi-threading capability that is 
it’s essential design feature 
Some have criticized the problem set approachable by this technique
Data Architecture Designs 
Hadoop 
(HDFS) 
Hadoop 
File System 
data storage 
component of 
open source 
Apache Hadoop 
Project 
Stores any type of data - structured, semi-structured, 
& unstructured, 
e.g., email, social data, XML data, videos, audio files, photos, GPS, satellite images, 
sensor data, spreadsheets, web log data, mobile data, RFID tags, pdf docs 
A Massively 
Distributed 
File 
System 
Optimized 
for Parallel 
Processing
Data Architecture Designs 
Minimally intrusive 
addition of 
Hadoop 
to enterprise 
architecture 
Data 
Staging 
Platform 
Employing data 
processing 
power of Hadoop 
with structured 
data 
Process 
Data
Data Architecture Designs 
Processing 
Structured & 
Unstructured 
Data 
Process 
Data 
Global Archiving 
of all Data 
Total 
Global 
Data 
Storage
Data Architecture Designs 
Processing 
Structured & 
Unstructured 
Data Access via 
EDW 
Processing 
Structured & 
Unstructured 
Data Access via 
Hadoop 
Preserving 
The 
Classical 
Data 
Model 
Embracing 
The 
Future Data 
Model
High Yield Areas 4 Use 
Pharmacological Research 
Genomic and Genetic Research 
Psychiatry / Behavorial Health 
Novel Sensors & Sensor Analysis Algorithms 
Epidemiological Research 
Much Talked About - Little Concrete 
Actionable Effects
Conclusion 
“Things have never been more like the way they are today in history.” 
Dwight D Eisenhower 
“Things are more like they are now than they’ve ever been before.” 
Gerald Ford 
“Those who cannot remember the past are condemned to repeat it.” 
George Santayana
Random Smattering of Articles 
Predicting Breast Cancer Survivability Using Data Mining Techniques Bellaachia A & Guven 
E. Age 2006, 58:10-110. 
A. McKenna, M. Hanna, E. Banks et al., “The genome analysis toolkit: a MapReduce 
framework for analyzing next-generation DNA sequencing data,” Genome Research, vol. 20, 
no. 9, pp.1297–1303, 2010. 
R. C. Taylor, “An overview of the Hadoop/MapReduce/HBase framework and its current 
applications in bioinformatics,” BMC Bioinformatics, vol. 11, no. 12, article S1, 2010. 
J. D. Osborne, J. Flatow, M. Holko et al., “Annotating the human genome with disease 
ontology,” BMC Genomics, vol. 10, supplement 1, article S6, 2009. 
B. Giardine, C. Riemer, R. C. Hardison et al., “Galaxy: a platform for interactive large-scale 
genome analysis,” Genome Research, vol. 15, no. 10, pp. 1451–1455, 2005. 
Steinberg GB1, Church BW, McCall CJ, Scott AB, Kalis BP. Novel predictive models for 
metabolic syndrome risk: a "big data" analytic approach. Am J Manag Care. 2014 Jun 
1;20(6):e221-8. 
Vaitsis C1, Nilsson G2, Zary N1. Big data in medical informatics: improving education through 
visual analytics. Stud Health Technol Inform. 2014;205:1163-7. 
Ross MK1, Wei W, Ohno-Machado L. "Big data" and the electronic health record. Yearb Med 
Inform. 2014 Aug 15;9(1):97-104. doi: 10.15265/IY-2014-0003.

More Related Content

What's hot

User Experience - How Sensors and Big Data will change your Healthcare experi...
User Experience - How Sensors and Big Data will change your Healthcare experi...User Experience - How Sensors and Big Data will change your Healthcare experi...
User Experience - How Sensors and Big Data will change your Healthcare experi...
Mark D'Cunha
 
Healthcare and Big Data - May 2017
Healthcare and Big Data -  May 2017Healthcare and Big Data -  May 2017
Healthcare and Big Data - May 2017
paul young cpa, cga
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Health Catalyst
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in Healthcare
Skillspeed
 
Health care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cureHealth care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cure
Edureka!
 
Big data and the Healthcare Sector
Big data and the Healthcare Sector Big data and the Healthcare Sector
Big data and the Healthcare Sector
Chris Groves
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ Overview
Hamdaoui Younes
 
HealthCare and Big Data with Hadoop
HealthCare and Big Data with HadoopHealthCare and Big Data with Hadoop
HealthCare and Big Data with Hadoop
Edureka!
 
Big Data Solutions for Healthcare
Big Data Solutions for HealthcareBig Data Solutions for Healthcare
Big Data Solutions for Healthcare
Odinot Stanislas
 
Big Data in Medicine
Big Data in MedicineBig Data in Medicine
Big Data in Medicine
Nasir Arafat
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
Perficient, Inc.
 
The data explosion along the care cycle (Dell Healthcare)
The data explosion along the care cycle (Dell Healthcare)The data explosion along the care cycle (Dell Healthcare)
The data explosion along the care cycle (Dell Healthcare)
Eric Van 't Hoff
 
Data Lake vs. Data Warehouse: Which is Right for Healthcare?
Data Lake vs. Data Warehouse: Which is Right for Healthcare?Data Lake vs. Data Warehouse: Which is Right for Healthcare?
Data Lake vs. Data Warehouse: Which is Right for Healthcare?
Health Catalyst
 
Seven Ways DOS™ Simplifies the Complexities of Healthcare IT
Seven Ways DOS™ Simplifies the Complexities of Healthcare ITSeven Ways DOS™ Simplifies the Complexities of Healthcare IT
Seven Ways DOS™ Simplifies the Complexities of Healthcare IT
Health Catalyst
 
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and ClinicalBig Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Adrish Sannyasi
 
Big implications of Big Data in healthcare
Big implications of Big Data in healthcareBig implications of Big Data in healthcare
Big implications of Big Data in healthcare
Guires
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Health Catalyst
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in Healthcare
Health Catalyst
 
Hadoop Enabled Healthcare
Hadoop Enabled HealthcareHadoop Enabled Healthcare
Hadoop Enabled Healthcare
DataWorks Summit
 
Big data in Healthcare & Life Sciences
Big data in Healthcare & Life SciencesBig data in Healthcare & Life Sciences
Big data in Healthcare & Life Sciences
Matthias Vallaey
 

What's hot (20)

User Experience - How Sensors and Big Data will change your Healthcare experi...
User Experience - How Sensors and Big Data will change your Healthcare experi...User Experience - How Sensors and Big Data will change your Healthcare experi...
User Experience - How Sensors and Big Data will change your Healthcare experi...
 
Healthcare and Big Data - May 2017
Healthcare and Big Data -  May 2017Healthcare and Big Data -  May 2017
Healthcare and Big Data - May 2017
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in Healthcare
 
Health care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cureHealth care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cure
 
Big data and the Healthcare Sector
Big data and the Healthcare Sector Big data and the Healthcare Sector
Big data and the Healthcare Sector
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ Overview
 
HealthCare and Big Data with Hadoop
HealthCare and Big Data with HadoopHealthCare and Big Data with Hadoop
HealthCare and Big Data with Hadoop
 
Big Data Solutions for Healthcare
Big Data Solutions for HealthcareBig Data Solutions for Healthcare
Big Data Solutions for Healthcare
 
Big Data in Medicine
Big Data in MedicineBig Data in Medicine
Big Data in Medicine
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
 
The data explosion along the care cycle (Dell Healthcare)
The data explosion along the care cycle (Dell Healthcare)The data explosion along the care cycle (Dell Healthcare)
The data explosion along the care cycle (Dell Healthcare)
 
Data Lake vs. Data Warehouse: Which is Right for Healthcare?
Data Lake vs. Data Warehouse: Which is Right for Healthcare?Data Lake vs. Data Warehouse: Which is Right for Healthcare?
Data Lake vs. Data Warehouse: Which is Right for Healthcare?
 
Seven Ways DOS™ Simplifies the Complexities of Healthcare IT
Seven Ways DOS™ Simplifies the Complexities of Healthcare ITSeven Ways DOS™ Simplifies the Complexities of Healthcare IT
Seven Ways DOS™ Simplifies the Complexities of Healthcare IT
 
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and ClinicalBig Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
 
Big implications of Big Data in healthcare
Big implications of Big Data in healthcareBig implications of Big Data in healthcare
Big implications of Big Data in healthcare
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in Healthcare
 
Hadoop Enabled Healthcare
Hadoop Enabled HealthcareHadoop Enabled Healthcare
Hadoop Enabled Healthcare
 
Big data in Healthcare & Life Sciences
Big data in Healthcare & Life SciencesBig data in Healthcare & Life Sciences
Big data in Healthcare & Life Sciences
 

Viewers also liked

Differential Dx Chest Pain
Differential Dx Chest Pain Differential Dx Chest Pain
Differential Dx Chest Pain
Frank Meissner
 
Point of Care Cardiac U/S
Point of Care Cardiac U/S Point of Care Cardiac U/S
Point of Care Cardiac U/S
Frank Meissner
 
Acute Coronary Syndrome
Acute Coronary Syndrome Acute Coronary Syndrome
Acute Coronary Syndrome
Frank Meissner
 
Shock 2011
Shock 2011Shock 2011
Shock 2011
Frank Meissner
 
ChemBio Tutorial
ChemBio Tutorial ChemBio Tutorial
ChemBio Tutorial
Frank Meissner
 
Therapeutic hypothermia
Therapeutic hypothermiaTherapeutic hypothermia
Therapeutic hypothermia
Frank Meissner
 
Chronotropic Incompetence
Chronotropic Incompetence Chronotropic Incompetence
Chronotropic Incompetence
Frank Meissner
 
Pe final
Pe finalPe final
Pe final
Frank Meissner
 
Dining With Cannibals
Dining With CannibalsDining With Cannibals
Dining With Cannibals
Frank Meissner
 
Practical thanatology
Practical thanatologyPractical thanatology
Practical thanatology
Frank Meissner
 
Big data -strategia
Big data  -strategiaBig data  -strategia
Big data -strategia
ivoriofinland
 
Big data mita se on 10 casea
Big data mita se on 10 caseaBig data mita se on 10 casea
Big data mita se on 10 caseaASML
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Ola Spjuth
 
Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...
Daniel Nüst
 
New sources of big data for precision medicine: are we ready?
New sources of big data for precision medicine: are we ready?New sources of big data for precision medicine: are we ready?
New sources of big data for precision medicine: are we ready?
Health and Biomedical Informatics Centre @ The University of Melbourne
 
A&e answers
A&e answersA&e answers
A&e answers
Kirie Kozanegawa
 
Best Mobile Medical Apps in ED
Best Mobile Medical Apps in EDBest Mobile Medical Apps in ED
Best Mobile Medical Apps in ED
Sun Yai-Cheng
 
Precision Medicine in the Big Data World
Precision Medicine in the Big Data WorldPrecision Medicine in the Big Data World
Precision Medicine in the Big Data World
Cloudera, Inc.
 
ACLS 2015 Updates - The Malaysian Perspective
ACLS 2015 Updates - The Malaysian PerspectiveACLS 2015 Updates - The Malaysian Perspective
ACLS 2015 Updates - The Malaysian Perspective
Chew Keng Sheng
 
Principles Of Trauma Care (2)
Principles Of Trauma Care (2)Principles Of Trauma Care (2)
Principles Of Trauma Care (2)
MD Specialclass
 

Viewers also liked (20)

Differential Dx Chest Pain
Differential Dx Chest Pain Differential Dx Chest Pain
Differential Dx Chest Pain
 
Point of Care Cardiac U/S
Point of Care Cardiac U/S Point of Care Cardiac U/S
Point of Care Cardiac U/S
 
Acute Coronary Syndrome
Acute Coronary Syndrome Acute Coronary Syndrome
Acute Coronary Syndrome
 
Shock 2011
Shock 2011Shock 2011
Shock 2011
 
ChemBio Tutorial
ChemBio Tutorial ChemBio Tutorial
ChemBio Tutorial
 
Therapeutic hypothermia
Therapeutic hypothermiaTherapeutic hypothermia
Therapeutic hypothermia
 
Chronotropic Incompetence
Chronotropic Incompetence Chronotropic Incompetence
Chronotropic Incompetence
 
Pe final
Pe finalPe final
Pe final
 
Dining With Cannibals
Dining With CannibalsDining With Cannibals
Dining With Cannibals
 
Practical thanatology
Practical thanatologyPractical thanatology
Practical thanatology
 
Big data -strategia
Big data  -strategiaBig data  -strategia
Big data -strategia
 
Big data mita se on 10 casea
Big data mita se on 10 caseaBig data mita se on 10 casea
Big data mita se on 10 casea
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
 
Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...
 
New sources of big data for precision medicine: are we ready?
New sources of big data for precision medicine: are we ready?New sources of big data for precision medicine: are we ready?
New sources of big data for precision medicine: are we ready?
 
A&e answers
A&e answersA&e answers
A&e answers
 
Best Mobile Medical Apps in ED
Best Mobile Medical Apps in EDBest Mobile Medical Apps in ED
Best Mobile Medical Apps in ED
 
Precision Medicine in the Big Data World
Precision Medicine in the Big Data WorldPrecision Medicine in the Big Data World
Precision Medicine in the Big Data World
 
ACLS 2015 Updates - The Malaysian Perspective
ACLS 2015 Updates - The Malaysian PerspectiveACLS 2015 Updates - The Malaysian Perspective
ACLS 2015 Updates - The Malaysian Perspective
 
Principles Of Trauma Care (2)
Principles Of Trauma Care (2)Principles Of Trauma Care (2)
Principles Of Trauma Care (2)
 

Similar to Big Data In Medicine

BigDataInMedicine.pptx
BigDataInMedicine.pptxBigDataInMedicine.pptx
BigDataInMedicine.pptx
Frank Meissner
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
Klawal13
 
Is Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data ScienceIs Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data Science
Edureka!
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008
bosc_2008
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
Duncan Hull
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
Chelle Gentemann
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
Md Mizanur Rahman
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
c.titus.brown
 
Big Data in Healthcare Made Simple Where It Stands Today and Where .pdf
Big Data in Healthcare Made Simple Where It Stands Today and Where .pdfBig Data in Healthcare Made Simple Where It Stands Today and Where .pdf
Big Data in Healthcare Made Simple Where It Stands Today and Where .pdf
annamalaiagencies
 
Thesis blending big data and cloud -epilepsy global data research and inform...
Thesis  blending big data and cloud -epilepsy global data research and inform...Thesis  blending big data and cloud -epilepsy global data research and inform...
Thesis blending big data and cloud -epilepsy global data research and inform...
Anup Singh
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
Sandip Tipayle Patil
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
San Diego Supercomputer Center
 
Unit 1
Unit 1Unit 1
Big data
Big dataBig data
Big data
Mohamed Salman
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
Sadhana Singh
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
Polash Halder
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
ijtsrd
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 

Similar to Big Data In Medicine (20)

BigDataInMedicine.pptx
BigDataInMedicine.pptxBigDataInMedicine.pptx
BigDataInMedicine.pptx
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
 
Is Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data ScienceIs Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data Science
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Big Data in Healthcare Made Simple Where It Stands Today and Where .pdf
Big Data in Healthcare Made Simple Where It Stands Today and Where .pdfBig Data in Healthcare Made Simple Where It Stands Today and Where .pdf
Big Data in Healthcare Made Simple Where It Stands Today and Where .pdf
 
Thesis blending big data and cloud -epilepsy global data research and inform...
Thesis  blending big data and cloud -epilepsy global data research and inform...Thesis  blending big data and cloud -epilepsy global data research and inform...
Thesis blending big data and cloud -epilepsy global data research and inform...
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Unit 1
Unit 1Unit 1
Unit 1
 
Big data
Big dataBig data
Big data
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 

More from Frank Meissner

Eating D/O
Eating D/OEating D/O
Eating D/O
Frank Meissner
 
Bipolar D/O
Bipolar D/O Bipolar D/O
Bipolar D/O
Frank Meissner
 
EKG Patterns of SCD - Can't Miss EKG Patterns for Generalist & Psychiatrist
EKG Patterns of SCD - Can't Miss EKG Patterns for Generalist & PsychiatristEKG Patterns of SCD - Can't Miss EKG Patterns for Generalist & Psychiatrist
EKG Patterns of SCD - Can't Miss EKG Patterns for Generalist & Psychiatrist
Frank Meissner
 
PE case
PE casePE case
Pediatric delirium
Pediatric deliriumPediatric delirium
Pediatric delirium
Frank Meissner
 
Taktosubo cardiomyopathy verona june2018
Taktosubo cardiomyopathy verona june2018Taktosubo cardiomyopathy verona june2018
Taktosubo cardiomyopathy verona june2018
Frank Meissner
 
Drug induced brugadda apa
Drug induced brugadda apaDrug induced brugadda apa
Drug induced brugadda apa
Frank Meissner
 
Curanderos presentation
Curanderos presentation Curanderos presentation
Curanderos presentation
Frank Meissner
 
Pills & Thrills
Pills & ThrillsPills & Thrills
Pills & Thrills
Frank Meissner
 
Verona sleep hx presentation
Verona sleep hx presentationVerona sleep hx presentation
Verona sleep hx presentation
Frank Meissner
 
Burns zagreb presentation
Burns zagreb presentationBurns zagreb presentation
Burns zagreb presentation
Frank Meissner
 
Hemmorhagic fever zagreb
Hemmorhagic fever zagrebHemmorhagic fever zagreb
Hemmorhagic fever zagreb
Frank Meissner
 
Tropical cardiology
Tropical cardiologyTropical cardiology
Tropical cardiology
Frank Meissner
 
Schistomasis
SchistomasisSchistomasis
Schistomasis
Frank Meissner
 
Onchocerciasis
OnchocerciasisOnchocerciasis
Onchocerciasis
Frank Meissner
 
Malaria
MalariaMalaria
Visceral leishmanasis
Visceral leishmanasisVisceral leishmanasis
Visceral leishmanasis
Frank Meissner
 
Chest pain perals
Chest pain peralsChest pain perals
Chest pain perals
Frank Meissner
 
Cardiomyopathy
CardiomyopathyCardiomyopathy
Cardiomyopathy
Frank Meissner
 
Critical Care Arrhythmia
Critical Care ArrhythmiaCritical Care Arrhythmia
Critical Care Arrhythmia
Frank Meissner
 

More from Frank Meissner (20)

Eating D/O
Eating D/OEating D/O
Eating D/O
 
Bipolar D/O
Bipolar D/O Bipolar D/O
Bipolar D/O
 
EKG Patterns of SCD - Can't Miss EKG Patterns for Generalist & Psychiatrist
EKG Patterns of SCD - Can't Miss EKG Patterns for Generalist & PsychiatristEKG Patterns of SCD - Can't Miss EKG Patterns for Generalist & Psychiatrist
EKG Patterns of SCD - Can't Miss EKG Patterns for Generalist & Psychiatrist
 
PE case
PE casePE case
PE case
 
Pediatric delirium
Pediatric deliriumPediatric delirium
Pediatric delirium
 
Taktosubo cardiomyopathy verona june2018
Taktosubo cardiomyopathy verona june2018Taktosubo cardiomyopathy verona june2018
Taktosubo cardiomyopathy verona june2018
 
Drug induced brugadda apa
Drug induced brugadda apaDrug induced brugadda apa
Drug induced brugadda apa
 
Curanderos presentation
Curanderos presentation Curanderos presentation
Curanderos presentation
 
Pills & Thrills
Pills & ThrillsPills & Thrills
Pills & Thrills
 
Verona sleep hx presentation
Verona sleep hx presentationVerona sleep hx presentation
Verona sleep hx presentation
 
Burns zagreb presentation
Burns zagreb presentationBurns zagreb presentation
Burns zagreb presentation
 
Hemmorhagic fever zagreb
Hemmorhagic fever zagrebHemmorhagic fever zagreb
Hemmorhagic fever zagreb
 
Tropical cardiology
Tropical cardiologyTropical cardiology
Tropical cardiology
 
Schistomasis
SchistomasisSchistomasis
Schistomasis
 
Onchocerciasis
OnchocerciasisOnchocerciasis
Onchocerciasis
 
Malaria
MalariaMalaria
Malaria
 
Visceral leishmanasis
Visceral leishmanasisVisceral leishmanasis
Visceral leishmanasis
 
Chest pain perals
Chest pain peralsChest pain perals
Chest pain perals
 
Cardiomyopathy
CardiomyopathyCardiomyopathy
Cardiomyopathy
 
Critical Care Arrhythmia
Critical Care ArrhythmiaCritical Care Arrhythmia
Critical Care Arrhythmia
 

Recently uploaded

CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdfCHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
rishi2789
 
share - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptxshare - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptx
Tina Purnat
 
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.GawadHemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
NephroTube - Dr.Gawad
 
Osteoporosis - Definition , Evaluation and Management .pdf
Osteoporosis - Definition , Evaluation and Management .pdfOsteoporosis - Definition , Evaluation and Management .pdf
Osteoporosis - Definition , Evaluation and Management .pdf
Jim Jacob Roy
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
Dr. Ahana Haroon
 
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
Holistified Wellness
 
Hiranandani Hospital Powai News [Read Now].pdf
Hiranandani Hospital Powai News [Read Now].pdfHiranandani Hospital Powai News [Read Now].pdf
Hiranandani Hospital Powai News [Read Now].pdf
Dr. Sujit Chatterjee CEO Hiranandani Hospital
 
All info about Diabetes and how to control it.
 All info about Diabetes and how to control it. All info about Diabetes and how to control it.
All info about Diabetes and how to control it.
Gokuldas Hospital
 
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdfMedical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
Jim Jacob Roy
 
Efficacy of Avartana Sneha in Ayurveda
Efficacy of Avartana Sneha in AyurvedaEfficacy of Avartana Sneha in Ayurveda
Efficacy of Avartana Sneha in Ayurveda
Dr. Jyothirmai Paindla
 
Does Over-Masturbation Contribute to Chronic Prostatitis.pptx
Does Over-Masturbation Contribute to Chronic Prostatitis.pptxDoes Over-Masturbation Contribute to Chronic Prostatitis.pptx
Does Over-Masturbation Contribute to Chronic Prostatitis.pptx
walterHu5
 
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Oleg Kshivets
 
The Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic PrinciplesThe Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic Principles
MedicoseAcademics
 
Tests for analysis of different pharmaceutical.pptx
Tests for analysis of different pharmaceutical.pptxTests for analysis of different pharmaceutical.pptx
Tests for analysis of different pharmaceutical.pptx
taiba qazi
 
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdfCHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
rishi2789
 
Identifying Major Symptoms of Slip Disc.
 Identifying Major Symptoms of Slip Disc. Identifying Major Symptoms of Slip Disc.
Identifying Major Symptoms of Slip Disc.
Gokuldas Hospital
 
OCT Training Course for clinical practice Part 1
OCT Training Course for clinical practice Part 1OCT Training Course for clinical practice Part 1
OCT Training Course for clinical practice Part 1
KafrELShiekh University
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
Dr. Jyothirmai Paindla
 
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
rishi2789
 
Netter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdfNetter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdf
BrissaOrtiz3
 

Recently uploaded (20)

CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdfCHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
 
share - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptxshare - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptx
 
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.GawadHemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
 
Osteoporosis - Definition , Evaluation and Management .pdf
Osteoporosis - Definition , Evaluation and Management .pdfOsteoporosis - Definition , Evaluation and Management .pdf
Osteoporosis - Definition , Evaluation and Management .pdf
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
 
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
 
Hiranandani Hospital Powai News [Read Now].pdf
Hiranandani Hospital Powai News [Read Now].pdfHiranandani Hospital Powai News [Read Now].pdf
Hiranandani Hospital Powai News [Read Now].pdf
 
All info about Diabetes and how to control it.
 All info about Diabetes and how to control it. All info about Diabetes and how to control it.
All info about Diabetes and how to control it.
 
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdfMedical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
 
Efficacy of Avartana Sneha in Ayurveda
Efficacy of Avartana Sneha in AyurvedaEfficacy of Avartana Sneha in Ayurveda
Efficacy of Avartana Sneha in Ayurveda
 
Does Over-Masturbation Contribute to Chronic Prostatitis.pptx
Does Over-Masturbation Contribute to Chronic Prostatitis.pptxDoes Over-Masturbation Contribute to Chronic Prostatitis.pptx
Does Over-Masturbation Contribute to Chronic Prostatitis.pptx
 
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
 
The Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic PrinciplesThe Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic Principles
 
Tests for analysis of different pharmaceutical.pptx
Tests for analysis of different pharmaceutical.pptxTests for analysis of different pharmaceutical.pptx
Tests for analysis of different pharmaceutical.pptx
 
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdfCHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 4_ANTI VIRAL DRUGS.pdf
 
Identifying Major Symptoms of Slip Disc.
 Identifying Major Symptoms of Slip Disc. Identifying Major Symptoms of Slip Disc.
Identifying Major Symptoms of Slip Disc.
 
OCT Training Course for clinical practice Part 1
OCT Training Course for clinical practice Part 1OCT Training Course for clinical practice Part 1
OCT Training Course for clinical practice Part 1
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
 
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
 
Netter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdfNetter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdf
 

Big Data In Medicine

  • 1. “Big Data” For Medicine & Health Care - An Introductory Tutorial Frank W Meissner, MD, RDMS, RDCS FACP, FACC, FCCP, FASNC, CPHIMS, CCDS Diplomate- Subspecialty Board of Advanced Heart Failure & Transplant Cardiology Diplomate - Certification Board of Cardiovascular Computed Tomography Certified Professional Health Information and Management Systems Diplomate- Subspecialty Board of Cardiovascular Diseases Diplomate - Subspecialty Board of Critical Care Medicine Diplomate - Certification Board of Nuclear Cardiology Diplomate - American Board of Forensic Medicine Diplomate- American Board of Internal Medicine Diplomate - National Board of Echocardiography Certified Cardiac Device Specialist - Physician
  • 2. Big Data - Definition (With Apologies to Douglas Adams) Big Data - You just won't believe how vastly, hugely, mind-bogglingly big it is.” The Hitchhiker’s Guide to the Galaxy
  • 3. Seriously, Big Data A Real Definition Big data is an evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information Although big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes (PB) and exabytes (EB) of data 1 PB = 1000000000000000B = 1015 bytes = 1000 terabytes 1 EB = 10006 bytes = 1018 bytes = 1000000000000000000 B = 1000 petabytes = 1 million terabytes = 1 billion gigabytes
  • 4. Data Source/Streams -4- Big Data Analytics
  • 5. ‘Big Data’ - An Operational Definition Big Data is High Volume High Speed High Variety High Veracity THE Data demands new types and forms of info processing to support decision support, insight discovery, and process optimization
  • 6. A Proto-Typical Big Data Project (With More Apologies to Douglas Adams) “O Deep Thought computer," he said, "the task we have designed you to perform is this. We want you to tell us...." he paused, "The Answer." "The Answer?" said Deep Thought. "The Answer to what?" "Life!" urged Fook. "The Universe!" said Lunkwill. "Everything!" they said in chorus. Deep Thought paused for a moment's reflection. "Tricky," he said finally. "But can you do it?" Again, a significant pause. "Yes," said Deep Thought, "I can do it." "There is an answer?" said Fook with breathless excitement. "Yes," said Deep Thought. "Life, the Universe, and Everything. There is an answer. But, I'll have to think about it."
  • 7. The 3-Dimensions of Big Data Volume, Velocity, Variety
  • 8. Data Validity/Veracity The 4th Dimension of Big Data Raw data may not be valid May be incomplete (missing attributes or values) May be ‘noisy’ (contains outliers or errors) May be inconsistent (Invalid data, e.g., state/zip code mismatch )
  • 9. Data Variety Aggregating structured and unstructured data in preparation for data analysis Nontrivial & complex task As in all Informatics efforts standards for data exchange are essential & vital
  • 10. Data Velocity Salient Issue #1 - How often to sample your data Salient Issue #2 - How much can you afford to pay for data sampling Answers to #1 & #2 define data velocity
  • 11. Data Volume Not just the magnitude of storage Wide variety of data also essential driver for the ‘Big’ in Big Data So Volume & Variety inexorably intertwined In fact Data Volume is directly proportional to data Variety & Velocity, i.e., specify Variety of data sources & Velocity of data streams => Data Volume Requirements
  • 12. By 2015 Average Hospital Generates 2/3 Petra-byte Patient Data Per Year
  • 13. Predictable ‘Big Data’ Challenges Analysis, Capture, Curation, Search, Sharing, Storage, Transfer, Visualization, Privacy Violations
  • 14. Knowledge Discovery Data Warehouse vs Big Data Data Warehouse Predefined & Structured Data Non-operational relational data-base On Line Analytical Processing of Data Conventional SQL Query Tools Exploratory Statistical Analysis Data Visualization Techniques K-nearest neighbor analysis Decision Trees & Association Rules Construction of Genetic Algorithms & Neural Network
  • 15. Knowledge Discovery Via The Data Warehouse
  • 16. Knowledge Discovery Data Warehouse vs Big Data Big Data Approach Undefined & UnStructured Data Non relational data-bases via Hadoop Distributed File System Massively Distributed Data Processing VIA Hadoop (open-source Java-based programming framework for processing large datasets in a distributed computing environment) (Currently version 0.23) Economical - traditional data storage $5 per gigabyte - Hadoop storage $0.25 per gigabyte
  • 17. Other Open Source Tools Avro - data serialization system Cassandra - scalable multi-master database (critical design feature no single points of failure) Chukwa - data collection system for managing large distributed systems Hbase - scalable distributed database supporting structured data storage of large tables Hive - data warehouse infrastructure providing data summarization & ad hoc query capacities Mahout - scalable machine learning & data mining library PIG - high-level data-flow language and execution framework for parallel computation ZooKeeper - high performance coordination service for distributed applications
  • 18. Big Data System Architecture
  • 19. Q: Why Hadoop? A: Bigger Slice of the Info- Pie!
  • 21. Hadoop Data Model Flat File Structure any Format No data schema Files automatically partitioned into defined blocks
  • 22. Classical Distributed Database Model Transactional & State Dependent Atomicity Consistency Isolation Durability
  • 23. Hadoop Distributed Database Model Database “Job” Job Divided into Tasks Map-Reduce Computing Model Every Task either a Map or Reduce
  • 24. Hadoop Computing Framework Two conceptual layers Hadoop Distributed File System File broken into definable blocks Stored on minimum of 3 servers for fault tolerance Execution engine (MapReduce) Reduces file requests into smaller requests Optimizes scalable use of CPU resources
  • 25. A Simple Example: Word Count Count Each Occurrence of a Single Word in a Dataset
  • 26. A More Complex Task Join Databases The network functions here like any peer-peer distributed file sharing system such as that seen with the bit- torrent protocol
  • 27. A Generalized Schema MapReduce Generalized Flow Schema
  • 28. Hadoop Cluster Hadoop File System (HDFS) building block of the computing cluster HDFS breaks incoming files into blocks and stores with triple redundancy across the network Computation on the block occurs at the storage node The Well Known SETI@home project serves as easily understandable example of this computing model
  • 29. File Characteristics ‘Write Once’ files - original input data not modified - triple redundantly stored Input data streamed into HDFS - processed by MapReduce - any results stored back in HDFS Obviously HDFS not general purpose file system
  • 31. MapReduce Programming Model Enabling Massive Distributed Parallel Computations Originally proprietary Google Technology Map() procedure performs filtering and sorting Reduce() procedure performs summary operation Model was inspired but are not strictly analogous to the functional programming map & reduce functions The power of the model lays within the multi-threading capability that is it’s essential design feature Some have criticized the problem set approachable by this technique
  • 32. Data Architecture Designs Hadoop (HDFS) Hadoop File System data storage component of open source Apache Hadoop Project Stores any type of data - structured, semi-structured, & unstructured, e.g., email, social data, XML data, videos, audio files, photos, GPS, satellite images, sensor data, spreadsheets, web log data, mobile data, RFID tags, pdf docs A Massively Distributed File System Optimized for Parallel Processing
  • 33. Data Architecture Designs Minimally intrusive addition of Hadoop to enterprise architecture Data Staging Platform Employing data processing power of Hadoop with structured data Process Data
  • 34. Data Architecture Designs Processing Structured & Unstructured Data Process Data Global Archiving of all Data Total Global Data Storage
  • 35. Data Architecture Designs Processing Structured & Unstructured Data Access via EDW Processing Structured & Unstructured Data Access via Hadoop Preserving The Classical Data Model Embracing The Future Data Model
  • 36. High Yield Areas 4 Use Pharmacological Research Genomic and Genetic Research Psychiatry / Behavorial Health Novel Sensors & Sensor Analysis Algorithms Epidemiological Research Much Talked About - Little Concrete Actionable Effects
  • 37. Conclusion “Things have never been more like the way they are today in history.” Dwight D Eisenhower “Things are more like they are now than they’ve ever been before.” Gerald Ford “Those who cannot remember the past are condemned to repeat it.” George Santayana
  • 38. Random Smattering of Articles Predicting Breast Cancer Survivability Using Data Mining Techniques Bellaachia A & Guven E. Age 2006, 58:10-110. A. McKenna, M. Hanna, E. Banks et al., “The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data,” Genome Research, vol. 20, no. 9, pp.1297–1303, 2010. R. C. Taylor, “An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics,” BMC Bioinformatics, vol. 11, no. 12, article S1, 2010. J. D. Osborne, J. Flatow, M. Holko et al., “Annotating the human genome with disease ontology,” BMC Genomics, vol. 10, supplement 1, article S6, 2009. B. Giardine, C. Riemer, R. C. Hardison et al., “Galaxy: a platform for interactive large-scale genome analysis,” Genome Research, vol. 15, no. 10, pp. 1451–1455, 2005. Steinberg GB1, Church BW, McCall CJ, Scott AB, Kalis BP. Novel predictive models for metabolic syndrome risk: a "big data" analytic approach. Am J Manag Care. 2014 Jun 1;20(6):e221-8. Vaitsis C1, Nilsson G2, Zary N1. Big data in medical informatics: improving education through visual analytics. Stud Health Technol Inform. 2014;205:1163-7. Ross MK1, Wei W, Ohno-Machado L. "Big data" and the electronic health record. Yearb Med Inform. 2014 Aug 15;9(1):97-104. doi: 10.15265/IY-2014-0003.

Editor's Notes

  1. This talk will discuss basic concepts to allow understanding of the basic features of ‘big data’ and its analysis.
  2. According to the Author Douglas Adams, Big Data is vastly, hugely, mind-bogglingly big. Of course the sophisticated member of my audience will recognize that Adams was referring to the Universe itself in this quote, not too that subset of the universe called ‘Big Data.’
  3. This is a more conventional definition of Big Data. However, it doesn’t alter one witt, the mind-boggling characteristic of Big Data contained in the previous slides definition.
  4. This slide represents the most common data streams independent of knowledge domain that are incorporated into a ‘Big Data’ project.
  5. This is the conventional and most commonly articulated ‘definition’ of Big Data.
  6. The central story idea in The Hitchhikers Guide to the Galaxy revolves around the earth as the universes most advanced computation device designed by trans-dimensional beings to answer the Big Data Query noted above. Current plans and expectations for Big Data, now early in its hype-cycle seem only slightly less ambitious than answering the ultimate question.
  7. This illustration envisions Big Data as a data tsunami that feeds upon itself in an every expanding cycle of every greater velocities, varieties and quantities of data.
  8. “There is a fifth dimension beyond that which is known to man. It is a dimension as vast as space and as timeless as infinity. It is the middle ground between light and shadow, between science and superstition, and it lies between the pit of man's fears and the summit of his knowledge. This is the dimension of imagination. It is an area which we call the Twilight Zone.” While there is no 5th dimension to Big Data- it’s now classical 3-dimensional representation is often times augmented with the superposition of a 4th Dimension, a dimension to those of us interested in applying Big Data analytic products to the scientific practice of medicine, the most important of all data dimensions, data validity/veracity. As noted in this slide, raw data can be incomplete in a multitude of ways.
  9. Certainly the most exciting and potentially the most important dimension in the Big Data Tsunami is the free form mixture of structured and unstructured data elements. The Grandest unrealized challenge within the formal Grand Challenges of Medical Informatics ( D F Sittig. Grand challenges in medical informatics?J Am Med Inform Assoc. 1994 Sep-Oct; 1(5): 412–413.) has been its struggle to deal with the stubborn insistence of medical practitioners to prefer the use of unstructured and often times idiosyncratic formulations of diagnostic findings, hypotheses, diagnoses, and case summations. The full flighted pursuit of a unified controlled medical vocabulary which has obsessed the field literally since its inception seems doomed given the expanding and ever accelerating volume of new knowledge and biomedical concepts discovered every calendar year. Thus an analytical methodology able to efficiently deal with unstructured but relevant and germane data seems to make the unified controlled medical vocabulary grand challenge if not irrelevant at least theoretically manageable. The third line in this slide emphasizes that unlike the futile and hopeless dream that one can formally structure all clinical data input, all that is required for a ‘Big Data’ analysis is that the interface for data exchange is well formulated and has a relevant and pre-agreed data standard. Conceptually the difference can be visualized by the analogy with a Fax Machine. Rather than trying to specify all possible Fax transmission messages by type with a unified nomenclature, all that is required is that Fax transmission messages conform to a unified interface standard so that (A)Fax_machine can exchange text data of any conceptual type with (B)Fax_machine, without pre-knowledge of what type of content is being exchanged.
  10. This slide emphasizes that articulation and specification of sampling frequency coupled with an accurate estimate of the costs associated with data sampling and storage are critical planning factors prior to developing and implementing a ‘Big Data’ project. As such in Project Management terms specification of data velocity is in essence determining the scope of your project.
  11. This slide emphasizes that while Data Volume is conceptualized as an independent element of the Data Tsunami, in fact Data Volume appears to be a linear function of the other two dimensions, i.e., if one can accurately specify the source of the data streams while simultaneously specifying the velocity of those data streams than the data volume requirements for the project are uniquely and deterministically defined.
  12. It has been estimated that by next year, the average hospital in the US while generate a total of 2/3 Petra-byte of patient data of all types (predominately video data) emphasizing the necessity for deployment of ‘Big Data’ tools and techniques in taming the data tsunami that is threatening to wash away the foundations of US Healthcare.
  13. Just as there are Grand Challenges for the field of medical informatics, there remain predictable challenges for ‘Big Data.’ The elephant in the room here seems to me to be the potential for Privacy Violation and compromise of HIPAA mandated privacy laws and regulations as well as the bedrock ethical principle that patient-provider confidentiality is central to the medial encounter and is preserved and safe-guarded. In contradistinction to these legal and ethical mandates has to be our understanding that for the average layman their direct knowledge of ‘Big Data’ programs will probably be limited to those highly and recently publicized NSA programs such as Stellarwind and PRISM. As such, ‘Big Data’ programs within the medical domain have to be meticulous & proactive in defining and describing their safeguards so that data accumulation/manipulation/aggregation can occur at the same time that privacy and anonymity are guaranteed.
  14. This slide details the classical data warehouse approach to knowledge discovery.
  15. This flow diagram taken from my own paper on knowledge discovery via use of the data warehouse (Bothner U, Meissner FW. Wissen aus medizinischen Datenbanken nutzen. Dt Arztebl 1998;95: A-1336-1338(Heft 20]. In many ways the exploration of ‘Big Data’ is identical in terms of the analytical tools involved in the analysis of the data set. Specifically, all the on line analytical tools mentioned in the previous slide have been used for the analysis of Big Data sets. However, one critical difference characterizes analysis of Big Data sets. The analysis is done over the entire set of data, rather than extracted data subsets. As such any statistical analysis is done over the entire universe of discourse, rather than utilizing sampling sets as is done with conventional statistical analysis.
  16. The principle take home from this slide, is the enormous cost efficiency of the Hadoop Distributed File System.
  17. The analysis of ‘Big Data’ is facilitated by open source tools and techniques which contribute to its cost effectiveness. The tools discussed above are using with Hadoop to provide a full featured computing environment.
  18. The relationships between these tools & the Hadoop Distributed File System are made explicit in this block diagram.
  19. This slide emphasizes in the current ‘new data’ world, the vast majority of data is unstructured and resistant to relational database techniques with respect to organization and analysis of the data.
  20. In terms of compare and contrast, consider the Relational database Data model as illustrated above.
  21. Now consider the data model for Hadoop. Instead of a relational structure to the data model, i.e., each data element is characterized in relationship to other data elements and all are related to a data element key field; the Hadoop model is intrinsically flat and no predefined relationships are mandated on the data prior to data manipulation. The data is partitioned into defined blocks that are then distributed in a decentralized storage & computation schema.
  22. This slide illustrates the classical distributed database model. Conceptually, database operations are visualized as state dependent processes with a limited behavioral repertoire (insert data field, update data field, delete data field) with a final commit behavior once the data field manipulation is completed in the absence of error. In case of error or failure, the database state is returned to its pre-operation state.
  23. The Hadoop distributed database model is completely different. Each database operation is conceptualized as a ‘job’ with each job being divided into tasks by the Map-Reduce function. With each iteration of the job, either the task is reduced to a mapping function and the database job is concluded, or the task is further reduced to another sub-task and the process repeated until the task set is reduced to a mapping function.
  24. While this slide may have been more clear in front of the last slide, that order was selected to allow for compare and contrast with the relational distributed data base model. But in any case at the highest level of system analysis the Hadoop computing framework consists of the Hadoop distributed file system, that is responsible for breaking even the most huge data sets into definable and uniform computational chunks. Additionally, the HDFS is responsible for establishing at a minimum a triple redundancy to the data write operation. The other layer of the framework is the MapReduce execution engine which takes the data file blocks and further reduces file sized manipulation requests into smaller so-called task requests. The MapReduce function not only breaks the large data chunks into smaller tasks, it also tracks the tasks. In this way, optimal and maximal use of network CPU resources occurs. To reiterate and for emphasis, Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.
  25. Let us consider the performance of MapReduce in the setting of a simple word count task. In this scenario, the input files of a defined size are received from the Hadoop Distributed File System for processing. Once they arrive at the MapReduce engine, the mapper reduces the input data set into two smaller sets with 1/2 of the data instances that we saw in the original data input data set, i.e., the Mapper function divides the original task into two tasks that contain 50% of the original data sets. Once this has occurred, the mapping function attempts to map to a single type of datum within the processed dataset. If this operation fails to yield a data set of unitary elements, the data set is then sorted and randomly shuffled. For the sake of illustration this data operation has resulted in different sized sets of unitary elements. But in the real processing of large quantities of elements, such a sort and shuffle operation must take place many times until a unitary element result occurs. At this point, the sets are reduced to key value pairs (fruit type, # of instances within the input data set). The key value pairs than represent the final program outputs. Now imagine this process occurring over a Petabyte data set and one can get a feel for the power of the MapReduce function.
  26. Here is a more complicated MapReduce task. The goal is to take elements of two different datasets and join them into an integrated dataset. As noted, the network functions much like any peer-peer distributed sharing system such as those seen with the bit-torrent protocol. The difference is that in addition to sharing the data across the network, operations on the data are performed at the same network nodes that function as storage nodes.
  27. Another way to look at MapReduce is as a 5-step parallel and distributed computation: Prepare the Map() input – the "MapReduce system" designates Map processors, assigns the input key value K1 that each processor would work on, and provides that processor with all the input data associated with that key value. Run the user-provided Map() code – Map() is run exactly once for each K1 key value, generating output organized by key values A1. "Shuffle" the Map output to the Reduce processors – the MapReduce system designates Reduce processors, assigns the A1…C8 key value each processor should work on, and provides that processor with all the Map-generated data associated with that key value. Run the user-provided Reduce() code – Reduce() is run exactly once for each A1…C8 key value produced by the Map step. Produce the final output – the MapReduce system collects all the Reduce output, and sorts it by A1…C8 to produce the final outcome. These five steps can be Logically thought of as running in sequence – each step starts only after the previous step is completed – although in practice they can be interleaved as long as the final result is not affected. MapReduce can take advantage of locality of data, processing it on or near the storage assets in order to reduce the distance over which it must be transmitted. In summary, "Map" step: Each worker node applies the "map()" function to the local data, and writes the output to a temporary storage. A master node orchestrates that for redundant copies of input data, only one is processed. "Shuffle" step: Worker nodes redistribute data based on the output keys (produced by the "map()" function), such that all data belonging to one key is located on the same worker node. "Reduce" step: Worker nodes now process each group of output data, per key, in parallel.
  28. In addition to my appeal to the bit torrent file sharing protocol as a means to understand MapReduce & the Hadoop File System, I am encouraging the audience to recall the SETI@Home project which was probably the 1st well known example of massively parallel computing most layperson’s have been exposed too. In a similar way to the SETI system, Hadoop distributes data blocks with the Hadoop file sharing/information processing cluster resulting in a massively parallel effort to process large data sets in the search for simple comparisons across those data sets, i.e., returning a list of similar books ordered by customers who have bought the book you just bought on amazon.com. This is a search result we all now take for granted, but conceptually we can now understand how this occurs in real time, without implementation of truly impossible relational database structures.
  29. One of the ways that order is conferred on this very ad hoc file system, is both triple redundancy as well as ensuring all input files are ‘write once’ files, i.e., no modifications to input files is allowed to ensure absolute data integrity.
  30. This slide illustrates the high level systems architecture of HDFS. The name node is a single node in the computing cluster that is responsible for keeping track of the file system metadata. It additionally keeps a list of all the blocks within the HDFS as well as a list of all data nodes that host these blocks. I conceptualize the name node as analogous to a Domain Name Server in the TCP/IP protocol. Since it is a single point of failure in the system, it is provisioned with a resilient, highly available server. The datanode is a shared-nothing cluster of computers capable of executing the workload components of the system.
  31. A reiteration and summarization of the past several slides.
  32. Hadoop can be integrated into a Enterprise Wide Information system in various system configurations. This slide contrasts the independent Enterprise Data Warehouse with a standalone Hadoop file system.
  33. Hadoop can be integrated with the EDW (enterprise data warehouse) as a highly efficient distributed storage and data processing system for use with existing structured data sources.
  34. Additionally leaving the enterprise data warehouse as the sole vehicle for analysis of data, Hadoop can function to add and process unstructured as well as structured data to the EDW. Alternatively it can be used as an efficient data archive in which all enterprise data is archived and stored via Hadoop nodes.
  35. In this configuration, the EDW remains the single point of entry to all the available data but Hadoop can be utilized by conventional analytical programs for the purpose of analysis of large data sets utilizing defined tools. The final data architectural design utilizes Hadoop as the sole point of contact for all enterprise wide data and data analytics. The point of these last few slides was to emphasize the flexibility of the Hadoop system as well as too defeat the false dichotomy of either EDW or Hadoop, In fact Hadoop plays well with others.
  36. This slide demonstrates both current and projected areas of Big Data efforts in the fields of Biomedicine. Of course, given the enormous combinatorial complexity of Genomics research the application of Big Data techniques seems axiomatic. Additionally, given the financial resources and development costs related to drug research, simulation and advanced analysis systems have the potential to dramatically reduce drug development costs. By the way of analogy, the advent of modern ‘supercomputers’ was necessitated by treaty obligations that prevented all atomic weapons testing. Once the need for high speed weapons effects simulations became a national priority, high speed computing efforts became the focus of technological revolution. Not as obvious, but given that this type of computing (highly distributed, massively parallel) was pioneered by consumer driven web based enterprises that were trying to understand ‘individual consumer choices’ psychiatric and behavioral health analysis and applications seems as axiomatic as Genomics or pharmacological applications. Epidemiological research by reason of the potential size of their data sets also promise to yield significant insights from this computing methodology. Novel sensor analysis seems to me a long term benefit for this type of computational capacity. For example, while heart rate variability analysis has been a tool of cardiology for as long as my career, it has always been utilized in the isolated clinical case. Having massive amounts of heart rate data linked to personal activity logs and temporal data promise to yield dramatic insights into the area of sudden cardiac death, chronotropic dependences of AMI, neurohumoral and temporal factors dictating onset of atrial fibrillation, relationships between exercise and onset of cardiac disease, etc.
  37. While real results will be derived from this powerful new set of data manipulations, the reality is that we are on the ascending limb of the hype curve, and it is too soon to prognosticate if this is an evolutionary or revolutionary change in computing methodology.