SlideShare a Scribd company logo
1 of 33
Download to read offline
BIG DATA AS A GAME-
CHANGER OF
CLINICAL RESEARCH
STRATEGIES
Rafael San Miguel Carrasco
PhD Javier Gómez Pavón
PhD Beatriz Ares Castro-Conde
In the backpack:
•  Having read a few books about R, SAS, Hadoop and statistics
•  The feeling that healthcare would be a good place to start
Howitallstarted
} I am not sure what you
are trying to achieve.”
Starbucks, February 2015
Howitallended
} We should create a
company to sell this as a
product.”
Irish Pub, September 2015
Community
Traditional clinical research
Discarded
hypothesis
New
knowledge
Statistical
Analysis
Custom
dataset
Hypothesis
2-3 meses
•  Linear
•  1 max potential
outcome
•  High prob of failure
•  Long play
Big data as a game-changer
Big
Data
Business
process
improveme
nt
Not
expected
new
insights
Facts-
based
questions
Non-
intuitive
relationshi
ps
Visual
Analytics
Predictive
Modeling
Statistical
Analysis
New
knowledge
•  Continuous
•  Multiple
potential
outcomes
Goals
•  How can big data enhance procedures used to build
predictive models over traditional approaches, like
hypothesis-based clinical research?
•  Can big data help to measure ROI from research
initiatives and programs, in terms of patients' quality
of life or cost?
•  Can big data produce net new knowledge for the
medical community? If so, is it useful to optimize
limited resources and enhance planning and
forecasting processes?
•  How can exploratory analysis be made available
through ecosystems like Hadoop?
•  Can big data help improve prediction
accuracy of clinical performance over
traditional inference techniques from
small samples?
3keyareas
STATISTICAL ANALYSIS
Efficiency analysis of programs geared towards providing
assistance to nursing homes.
PREDICTIVE MODELING
Generation of models that connect admission-related data with key
target variables as length of stay, admission rate or mortality rate.
DATA VISUALIZATION
Hadoop-based visual analytics platform for domain experts to
perform exploratory analysis and on-the-go clinical research.
Database
Not a sample: full database
12.000 clinical records from elderly patients
admitted in Acute Unit from 2006 to today
Demographics
Diagnosis
Treatment
Functional and mental status
Admission-related complications
Visits to emergency units
Lab tests
Administered drugs
Well-knowntools
Statistical
analysis
Evaluating ROI of specialized
assistance programs
Background
Programs geared towards providing assistance
from a geriatrics doctor to nursing homes
•  Give better care to elderly patients
•  Fewer support from hospitals
•  Lowering the cost to deliver healthcare
Goal: validate that these programs can provide
the expected ROI
Target parameters
•  CRF (functional status)
•  CRM (mental status)
•  Barthel index
•  Number of complications (lower)
•  Admissions
•  Readmissions
•  LOS
•  Survival
•  Lab tests
•  Number of administered drugs
Methodology
Chi-Square test (X^2 statistic) was used to check for
an statistically significant difference in key clinical
variables between patients receiving specialized
assistance and the control group.
Results
These clinical parameters were found to improve:
•  CRF (functional status)
•  CRM (mental status)
•  Barthel index
•  Number of complications (lower)
This means that these patients have better quality of
life, which proves that these programs provide a true
return on investment.
Elderly patients receiving specialized
assistance exhibit better functional and
mental status, less disability and fewer
complications during admission.
New
knowled
ge
Predictive
modeling
What key clinical variables
can be predicted?
Background
Understanding what features of a nursing home
lead to higher performance constitute a desirable
goal for the medical community, because …
Length of stay becomes a key clinical variable after
an elderly patient is admitted., because …
Patients mortality is a key clinical variable. I guess
this raises no discussion …
Methodology
	
Diagrams Transformation
Visual inspection
Quantile-based comparison
Model fit indicators
Agnosticapproach
No prior questions or hypothesis for the analysis
No focus on particular input variables
… and multiple iterations/configurations to come up
with as many results as possible
Adversereactionsto
drugsleadingto
highermortalityrate
ITERATION 1
Mortality can be fully predicted through another variable:
Place of Exitus
ITERATION 2
Mortality can be predicted through another variable:
Morphine
ITERATION 3
Mortality can be predicted through several variables:
Digoxin
Number of lab tests
Occurrence of pressure ulcers
Non-
intuitive
relations
hips
Wecanpredictlength
ofstay
LOS can be predicted through:
Previous number of admissions
Gender
Diagnoses as acute kidney failure, respiratory infection and acute bronchitis
Barthel index prior to admission
Falls
Total amount of administered drugs
Need for urinary catheter
Infections at hospital
The resulting model was found to be statistically
signficant, accounting for 95,23% (R-Square) of
the target variable's variance.
Business
process
improve
ment
Wecanpredict
admissionrate
The transformed variable that represents the inverse of the
number of beds in the nursing home can accurately predict
the admission rate from that nursing home.
The model can explain up to 86% of the target's variance.
A Geriatric Unit can accurately forecast the number of
admissions from currently served nursing homes, and
make better choices with regards to new nursing
homes to be served in the future.
Business
process
improve
ment
Data
visualization
Playing with datasets and
finding new insights on-the-go
Background
Providing domain experts with an effective tool to
discover patterns or relationships among data variables
through exploratory analysis and visual inspection.
A tool to go beyond reported findings and discover new
insights in the datasets.
Using Hadoop to ensure that it could scale out to process
millions of clinical records with literally no changes to
the current architecture.
Architecture
So, how does it look like?
•  CIE-9 value 428 is selected from the diagnosis list
•  Charts and indicators related to gender, age, year of
admission, CRF, CRM and Barthel index are updated
and displayed.
What are the defining characteristics of a
patient with a cardiac insufficiency?
Most frequent profile is a 85-95 years old patient,
woman, with Barthel index higher than average.
Diagnosis 428 All diagnoses
New
knowled
ge
•  CIE-9 value 486 is selected from the diagnosis list.
The chart displaying yearly number of admissions is
updated.
Which trends are observed in admissions
related to pneumonia?
The number of admissions related to pneumonia was 81
in 2007 and 230 in 2014, which means that it has
increased by 4 times in the last 7 years.
Not
expected
new
insights
Case#3
•  Hospital A is selected from the Source of Admission list.
The number of readmissions is 548 from a total of 4.248;
therefore, the readmission rate is 12%
•  For Hospital B, the number of readmissions is 164 from a
total of 1.830, so the readmission rate is only 8%.
Are there variations in readmission rates for
patients from Hospital and Hospital B?
Therefore, readmission rate for patients from Hospital A
seems to be higher than from Hospital B.
New
knowled
ge
•  2014 is selected from the list of years
How was workload distributed among
doctors in the Acute Unit in 2014?
Most patients are admitted
by noon
Doctors’ workload requires
further review
Doctors should specialize in
ITU and respiratory diseases
Business
process
improve
ment
Not
expected
new
insights
New
knowled
ge
Lookingintothe
future
What
we
did
Identified
needs
Scenarios for
implementation
Rese
arch
Docto
rs
Mana
gers Stake
holde
rs
Man
age
ment
Phar
ma
Fina
nce
HR
Takeaways
Big data analytics
can be more
efficient than
hypothesis-based
research
Exploratory
analysis is a must
when it comes to
discover net new
knowledge
Predictive
modeling is
geared towards
business process
improvement
Extracting
insights shouldn’t
be a one-shot
activity, but a
continuous
process
1
2
3
4
Thanks!
Keep calm … and send us
feedback
rsanmcar@gmail.com

More Related Content

Viewers also liked

Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...Big Data Spain
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Big Data Spain
 
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data as a game-changer of clinical research strategies by Rafael San Migu...Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data as a game-changer of clinical research strategies by Rafael San Migu...Big Data Spain
 
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015Big Data Spain
 
Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Big Data Spain
 
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015Big Data Spain
 
Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...Big Data Spain
 
Location analytics by Marc Planaguma at Big Data Spain 2014
 Location analytics by Marc Planaguma at Big Data Spain 2014 Location analytics by Marc Planaguma at Big Data Spain 2014
Location analytics by Marc Planaguma at Big Data Spain 2014Big Data Spain
 
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014Big Data Spain
 
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ... Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...Big Data Spain
 
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...Big Data Spain
 
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...Big Data Spain
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Intro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conferenceIntro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conferenceBig Data Spain
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012Big Data Spain
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...Big Data Spain
 

Viewers also liked (16)

Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
 
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data as a game-changer of clinical research strategies by Rafael San Migu...Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
 
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
Predicting failures on complex machines by Ion Marqués at Big Data Spain 2015
 
Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015
 
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
Euclid & Big Data from dark space by Guillermo Buenadicha at Big Data Spain 2015
 
Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...
 
Location analytics by Marc Planaguma at Big Data Spain 2014
 Location analytics by Marc Planaguma at Big Data Spain 2014 Location analytics by Marc Planaguma at Big Data Spain 2014
Location analytics by Marc Planaguma at Big Data Spain 2014
 
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ... Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
 
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Intro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conferenceIntro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conference
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 

More from Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Big Data Spain
 

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

PROTEUS: Scalable online Machine Learning for predictive analytics by Rubén Casado at Big Data Spain 2015

  • 1.
  • 2. BIG DATA AS A GAME- CHANGER OF CLINICAL RESEARCH STRATEGIES Rafael San Miguel Carrasco PhD Javier Gómez Pavón PhD Beatriz Ares Castro-Conde
  • 3. In the backpack: •  Having read a few books about R, SAS, Hadoop and statistics •  The feeling that healthcare would be a good place to start
  • 4. Howitallstarted } I am not sure what you are trying to achieve.” Starbucks, February 2015
  • 5. Howitallended } We should create a company to sell this as a product.” Irish Pub, September 2015
  • 6. Community Traditional clinical research Discarded hypothesis New knowledge Statistical Analysis Custom dataset Hypothesis 2-3 meses •  Linear •  1 max potential outcome •  High prob of failure •  Long play
  • 7. Big data as a game-changer Big Data Business process improveme nt Not expected new insights Facts- based questions Non- intuitive relationshi ps Visual Analytics Predictive Modeling Statistical Analysis New knowledge •  Continuous •  Multiple potential outcomes
  • 8. Goals •  How can big data enhance procedures used to build predictive models over traditional approaches, like hypothesis-based clinical research? •  Can big data help to measure ROI from research initiatives and programs, in terms of patients' quality of life or cost? •  Can big data produce net new knowledge for the medical community? If so, is it useful to optimize limited resources and enhance planning and forecasting processes? •  How can exploratory analysis be made available through ecosystems like Hadoop? •  Can big data help improve prediction accuracy of clinical performance over traditional inference techniques from small samples?
  • 9. 3keyareas STATISTICAL ANALYSIS Efficiency analysis of programs geared towards providing assistance to nursing homes. PREDICTIVE MODELING Generation of models that connect admission-related data with key target variables as length of stay, admission rate or mortality rate. DATA VISUALIZATION Hadoop-based visual analytics platform for domain experts to perform exploratory analysis and on-the-go clinical research.
  • 10. Database Not a sample: full database 12.000 clinical records from elderly patients admitted in Acute Unit from 2006 to today Demographics Diagnosis Treatment Functional and mental status Admission-related complications Visits to emergency units Lab tests Administered drugs
  • 12. Statistical analysis Evaluating ROI of specialized assistance programs
  • 13. Background Programs geared towards providing assistance from a geriatrics doctor to nursing homes •  Give better care to elderly patients •  Fewer support from hospitals •  Lowering the cost to deliver healthcare Goal: validate that these programs can provide the expected ROI Target parameters •  CRF (functional status) •  CRM (mental status) •  Barthel index •  Number of complications (lower) •  Admissions •  Readmissions •  LOS •  Survival •  Lab tests •  Number of administered drugs
  • 14. Methodology Chi-Square test (X^2 statistic) was used to check for an statistically significant difference in key clinical variables between patients receiving specialized assistance and the control group.
  • 15. Results These clinical parameters were found to improve: •  CRF (functional status) •  CRM (mental status) •  Barthel index •  Number of complications (lower) This means that these patients have better quality of life, which proves that these programs provide a true return on investment. Elderly patients receiving specialized assistance exhibit better functional and mental status, less disability and fewer complications during admission. New knowled ge
  • 16. Predictive modeling What key clinical variables can be predicted?
  • 17. Background Understanding what features of a nursing home lead to higher performance constitute a desirable goal for the medical community, because … Length of stay becomes a key clinical variable after an elderly patient is admitted., because … Patients mortality is a key clinical variable. I guess this raises no discussion …
  • 19. Agnosticapproach No prior questions or hypothesis for the analysis No focus on particular input variables … and multiple iterations/configurations to come up with as many results as possible
  • 20. Adversereactionsto drugsleadingto highermortalityrate ITERATION 1 Mortality can be fully predicted through another variable: Place of Exitus ITERATION 2 Mortality can be predicted through another variable: Morphine ITERATION 3 Mortality can be predicted through several variables: Digoxin Number of lab tests Occurrence of pressure ulcers Non- intuitive relations hips
  • 21. Wecanpredictlength ofstay LOS can be predicted through: Previous number of admissions Gender Diagnoses as acute kidney failure, respiratory infection and acute bronchitis Barthel index prior to admission Falls Total amount of administered drugs Need for urinary catheter Infections at hospital The resulting model was found to be statistically signficant, accounting for 95,23% (R-Square) of the target variable's variance. Business process improve ment
  • 22. Wecanpredict admissionrate The transformed variable that represents the inverse of the number of beds in the nursing home can accurately predict the admission rate from that nursing home. The model can explain up to 86% of the target's variance. A Geriatric Unit can accurately forecast the number of admissions from currently served nursing homes, and make better choices with regards to new nursing homes to be served in the future. Business process improve ment
  • 23. Data visualization Playing with datasets and finding new insights on-the-go
  • 24. Background Providing domain experts with an effective tool to discover patterns or relationships among data variables through exploratory analysis and visual inspection. A tool to go beyond reported findings and discover new insights in the datasets. Using Hadoop to ensure that it could scale out to process millions of clinical records with literally no changes to the current architecture.
  • 26. So, how does it look like?
  • 27. •  CIE-9 value 428 is selected from the diagnosis list •  Charts and indicators related to gender, age, year of admission, CRF, CRM and Barthel index are updated and displayed. What are the defining characteristics of a patient with a cardiac insufficiency? Most frequent profile is a 85-95 years old patient, woman, with Barthel index higher than average. Diagnosis 428 All diagnoses New knowled ge
  • 28. •  CIE-9 value 486 is selected from the diagnosis list. The chart displaying yearly number of admissions is updated. Which trends are observed in admissions related to pneumonia? The number of admissions related to pneumonia was 81 in 2007 and 230 in 2014, which means that it has increased by 4 times in the last 7 years. Not expected new insights
  • 29. Case#3 •  Hospital A is selected from the Source of Admission list. The number of readmissions is 548 from a total of 4.248; therefore, the readmission rate is 12% •  For Hospital B, the number of readmissions is 164 from a total of 1.830, so the readmission rate is only 8%. Are there variations in readmission rates for patients from Hospital and Hospital B? Therefore, readmission rate for patients from Hospital A seems to be higher than from Hospital B. New knowled ge
  • 30. •  2014 is selected from the list of years How was workload distributed among doctors in the Acute Unit in 2014? Most patients are admitted by noon Doctors’ workload requires further review Doctors should specialize in ITU and respiratory diseases Business process improve ment Not expected new insights New knowled ge
  • 32. Takeaways Big data analytics can be more efficient than hypothesis-based research Exploratory analysis is a must when it comes to discover net new knowledge Predictive modeling is geared towards business process improvement Extracting insights shouldn’t be a one-shot activity, but a continuous process 1 2 3 4
  • 33. Thanks! Keep calm … and send us feedback rsanmcar@gmail.com