SlideShare a Scribd company logo
1 of 50
Big Data in Clinical Research
Michael Hogarth, MD, FACMI, FACP
Clinical Research Information Officer, UC San Diego Health System
Professor, Dept of Medicine, UCSD Health
What is Big Data?
https://searchdatamanagement.techtarget.com/definition/big-data
The genesis of the “big data” movement
• Google began the new database
development revolution as
relational databases could not
handle the volume or type of data
efficiently to provide google search
• What Google built – “BigTable”
– Innovation 1 – column oriented with
each row as a web page
– Innovation 2- data is stored across
multiple machines (nodes) using
“Google File System”
• The entire web table is split into mini-tables (Table-
ets), each a few Gigabytes each – 100,000
– Innovation 3 – a ”map-reduce”
computing engine
– Scales to Petabytes!
Data Today
https://twitter.com/nafisalam/status/867359592006733824/photo/1
Importance of “big data” in healthcare
https://www.incoutlook.com/2019/08/02/big-data-a-game-changer-in-healthcare-industry/
Big Data in Clinical Research
• Predicting ‘feasibility’ of a trial and reducing
uncertainty
• Improving accrual through “smart matching”
of trials to potential participants
Clinical trial efficiency
• Hypothesis generation
• Pragmatic/large-scale trials - Real world
“evidence” (RWE)
• Uncovering patterns
Real world data and Real-World
Evidence (RWE)
• predictive algorithms
• assistive systems (image
analysis/enhancement)
Healthcare AI/ML models
• pharmacokinetic simulation (in-silico drug
discovery)
Drug design
Big Data and Real-World Evidence (RWE)
Using “Real World Evidence” (RWE)
Adoption of health IT has resulted in
large scale (massive) amounts of
biomedical “digital” data
“RWE will not replace the need for data
from traditional trials; however,
technologies supporting RWD are
enabling far richer and more diverse
information to be collected during drug
development drug development.”
Swift et al. “Innovation at the Intersection of Clinical Trials and Real-
World Data Science to Advance Patient Care. Clin Transl Sci (2018)
00, 1-11. https://www.ncbi.nlm.nih.gov/pubmed/29768712
(subtext: the randomized clinical trial is still
here and is not dead - but is perhaps
becoming an endangered species under
pressure from new trial designs and RWE)
An Example of an RWE Trial
• ADAPTABLE – Aspirin Dosing: A
Patient-Centric Trial Assessing
Benefits and Long-Term
Effectiveness
• Compares two aspirin doses
(81mg vs. 325mg)
• Randomizes 20,000 patients with
CVDz to one of the two doses
• Currently underway through the
National Patient-Centered Clinical
Research Network (PCORNet) –
had 600,000 eligible patients
Wearable Sensors – Billions of data points
Mobile Sensors
RWE Trial Using a Wearable Sensor
Clinical Genomics
https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost
RNAseq – a key advancement for post-
translational/DNA processes
Comparing genomic data types in
terms of “big data”
Nature Reviews, Genetics, Volume 19, April 2018
Biomedical research often requires you to become
a ”data wrangler”
Obtaining and preparing
”real world data” (RDW) from an
EHR
Privacy and your data stewardship
Challenges with healthcare data
Your Data Wrangler Toolbelt
RCHSD Clinical Research Seminar Series -
07/28/2020
Types of Biomedical Research Data
– Pre-Clinical Experiments
• a broad range of data from wet-lab
experiments, animal models, etc..
• typically created/stored in
lab/bioinformatics systems
– Clinical Trial data
• specific trial-related data collection
• typically collected using electronic case
report forms (eCRFs)
– Clinical Practice data
• generated during routine clinical practice
events
• typically created/stored in an EHR
• clinical practice data = real world data
(RWD)
What is real world data (RWD)?
– “any health record information not
collected as part of a randomized
controlled trial”
– “With RDW, we mean data that are not
collected under experimental conditions,
but data generated in routine care”
TYPES OF
BIOMEDICAL
RESEARCH DATA
Real-World Data in an EHR
Privacy and Healthcare Data --
Being a responsible steward of patient data
Covered entities may disclose PHI for
research with *individual authorization”
Circumstances under which research
use can proceed without authorization
• IRB (or privacy board) issues a ”waiver of
HIPAA authorization”
• must satisfy 3 criteria:
• use/disclosure involves no more than minimal
risk to privacy of the individuals
• the research could not be conducted without the
waiver
• the research could not be conducted without
access to PHI.
• minimal risk means:
• a plan to protect identifiers from disclosure
• a plan to destroy the identifiers at the earliest
opportunity
• written assurance the PHI will not be reused or
disclosed to others
https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html#protected
What is a “HIPAA Limited Data Set (LDS)”?
Obtaining and Managing UCSDH Data
• Data Stewards - honest brokers
– UCSDH requires that you use a
‘data steward’ as an honest
broker
• You cannot access the warehouse
yourself – HIPAA principle of
“minimum necessary”
– Data stewards are allowed to query “all
records” to find specific cohort
• ACTRI Data Extraction Concierge
Service (DECS)
• The “federated data stewards
program”
– Ophthalmology
– Family Medicine
– O’Brien Center (Nephrology)
– Etc...
• UCSD Health Nightingale data sets – coming in early
2022 – LDS under DUA
– AKI
– Cancer
– COVID
UCSD Health Virtual Research Desktop (VRD)
Shin SY, et al. Healthc Inform Res. 2014;20(2):109-116.
The ACTRI Data Extraction Concierge Service (DECS)
Epic slicer dicer and cohort discovery
23
Using Slicer-Dicer to facilitate DECS exports
The UCSD Health Virtual Research Desktop
What is a VRD?
Common challenges with healthcare
data
The nature of EHR data
(“dirta”)
• Harmonizing to a
common set of codes
and values
• EHR data is often a
proxy for what really
happened
• Missing values are
common -- charting by
exception
• Data can be in multiple
places and in different
forms
Unstructured narrative text
contains many of the
desired data points
• Need to ‘find’ information
in narrative text
• Often requires some
form of text mining or
natural language
processing
Comorbidities
• Elixhauser
• Charlson
“Dirta” and why we “harmonize” data
Blood Pressure – what do we really mean ?
If we do an analysis, are we comparing the same things?
BP
“150/70 mmHg”
John S| BP | 1-14-
2015
Blood pressure stored in system #2
Blood pressure stored in system #1
Mary P | BP | 1-
14-2015
Systolic BP Diastolic BP Type Encounter
150 70 Arterial
Line
In Patient
What happens when we want to compute with data from both systems?
Standard coding systems used with harmonized EHR data
Conditions:
ICD 10 CM /
SNOMED CT
Labs, vitals,
reports -
observations:
LOINC
Medical
procedures:
CPT
Medications:
RxNORM
How many blood pressure measurement
types could there possibly be?
477 types of blood
pressure
measurements
The challenge of using “drug class”
The University of California Clinical Data Warehouse –
one of the larges OMOP CDM repositories today
The UCDW
has harmonized
data for
15M patients
in OMOP
The Observational Medical
Outcomes Partnership Common
Data Model (OMOP CDM_
The UC COVID Research Data Set (UC-CORDS)
• How Much Data? (Dec 2021)
– 687,239 patients
– 1,061,471,489 (>1.06 billion) observations
(lab results, vital signs)
– 23,243,659 clinical encounters
• Provided to UC Health researchers
through UC research informatics
units/groups in their respective health
systems
• Cohort
– All COVID tested patients (positive or
negative)
• De-identified to HIPAA LDS - no
personal identifiers
• Data
– Demographics, Diagnoses, Medications,
Laboratory results, Encounters (if seen by
our doctors or hospitalized)
• Refreshed
– Every 2 months
Learning to analyze very large data sets
Google BigTable (2004) -- managing “big data”
• The web circa 2000:
– 2Billion web pages
– 45Terabytes of data
– Contemporary databases
(RDBMS) were not able to cope
• Needed to invent a new
database
– Adam Bosworth and Jeff Dean
• Google BigTable
– Google distributed File System – “GFS”
– Virtualized single database “table” across
thousands of computer
– Stores data in a “column-oriented” database
design
“Map-Reduce” (Hadoop/Spark)
– analyzing large data sets efficiently and at scale
• Google needed to process
large amounts of raw data,
such as ”crawled” (acquired)
documents, web request
logs, etc..
• Created a simple computational
model called ‘map-reduce’ which can
process very large data sets
• Hadoop = open source version of a
map-reduce execution engine
https://www.guru99.com/introduction-to-mapreduce.html
Machine Learning
A form of artificial intelligence
• Identifies correlated patters in
order to create predictions
• Traditional “machine learning” –
linear regression, logistic
regression.
• Today’s “machine learning” -
leverages artificial neural networks
(ANNs)
• “Deep learning” – a deep neural
network which has many nodal
connections within its hidden layer
– designed to work like the visual
cortex... Most mature in image
analysis
– A convolutional neural network (CNN)
is a typical deep learning network
Rashidi, et al. Academic Pathology. 2019
“Machine Learning” and AI in Healthcare
• AI is not new to healthcare, but “machine learning” and “deep learning” are
new AI techniques enabled by large data sets.
https://mc.ai/awesome-ai-the-guide-to-master-artificial-intelligence/
1983
Deep Networks -- “Convolutional” Neural Networks (CNN) began
as an approach to image recognition
• a type of multi-layered (deep) ANN designed for
recognition of visual features - “feedforward neural
network whose architectures are tailored for
minimizing the sensitivity to translations, local
rotations and distortions of the input image.”
• originally devised in 1988 by Yann LeCun at AT&T Labs
to recognize handwritten digits
• inspired by the connectivity pattern seen between
neurons in the visual cortex
• Convolution operation emulates response of an
individual neuron to visual stimuli. Pooling coalesces
outputs from one layer into a single neuron in the
next layer
Example: Deep Learning and Interpretation
of Retinal Images for diabetic retinopathy
• trained a ML systems
with 494,661 retinal
images
• validated (tested) using
dataset of 71,896
images from 14,880
patients
• detection of vision-
threatening diabetic
retinopathy: AUC 0.958
(95% CI)
• detection of referable
diabetic retinopathy:
AUC=0.936(95% CI)
ML-based Chest X-Ray Assistive Interpretation in COVID
Did it make a difference?
1 out of 5 felt it had an impact
30% felt it affected the treatment plan
Big Data in Drug Design:
“in-silico” drug design through structural
bioinformatics and computer-based simulation
http://www.vls3d.com/index.php/virtual-screening/comments-about-virtuel-screening?start=3
Computational Drug Discovery as In-silico
Drug, Genome, Proteome interactions
Requires large-scale computing!
Supercomputers - massive parallel processing
• supercomputers split problems
into pieces and execute each at
the same time – massive
parallel processing
• Designed for mathematical
computation, which occurs in
simulations and optimization
problems --- where multiple co-
dependencies exist
Current top supercomputers
Uses of supercomputers 1970-2020
Computer Simulation and COVID
The SUMMIT supercomputer was used
to predict pharmacological therapeutic
targets to interfere with SARS-CoV-2
spike protein binding to ACE2 receptor
in type2 pneumocytes
Limits of classical computers in
simulation
Going beyond “classical” computers
• Only having two states (1 or 0)
for a single “bit” creates some
limitations:
– The larger the number of
input states you want (ie,
larger number or higher
number of simultaneous bits
to transmit), the higher the
number of “wires” you need
– The higher number of wires
means more power, heat,
unless you can make it all
smaller and more compact
• The Apple M1 CPU (8core, 64-
bit) has 16 billion transistors
and implements 5 nanometer
(nm) “transistor gate length”
(the smallest yet achieved).
– At some point, you reach a
limit in terms of what you can
compute with a “classical
computer” (binary computer)
Investment in Quantum Computing
Have we reached “quantum supremacy”?
• A Google-designed quantum computer
was used in an experiment to perform
a calculation that would require a
classical supercomputer 10,000 years
to complete
• The calculation was to predict the
likelihood of outcomes from a random-
number generator (a problem first
crafted by Google physicists in 2016)
• The 53-qubit quantum computer
performed the calculation in 200
seconds
• ?Was it a contrived experiment?
Questions?
Mt Whitney - 14,505ft
Lone Pine, California (Feb 2021)

More Related Content

Similar to Big Data in Clinical Research

Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchDataWorks Summit/Hadoop Summit
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchDataWorks Summit/Hadoop Summit
 
Big Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John CaiBig Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John CaiJohn Cai
 
Big Data and Artificial Intelligence
Big Data and Artificial Intelligence Big Data and Artificial Intelligence
Big Data and Artificial Intelligence Kamarul Imran
 
Introduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia ResearchIntroduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia ResearchDavid De Roure
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?Al Dossetter
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
Design and implementation of Clinical Databases using openEHR
Design and implementation of Clinical Databases using openEHRDesign and implementation of Clinical Databases using openEHR
Design and implementation of Clinical Databases using openEHRPablo Pazos
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchGreg Landrum
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
Caris Life Sciences
Caris Life SciencesCaris Life Sciences
Caris Life SciencesKim Kozlik
 
Caris Life Sciences
Caris Life SciencesCaris Life Sciences
Caris Life SciencesGorman K
 
Web scraping and healthcare
Web scraping and healthcareWeb scraping and healthcare
Web scraping and healthcareAvanish Giri
 
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...ARDC
 
Databases and Ontologies: Where do we go from here?
Databases and Ontologies:  Where do we go from here?Databases and Ontologies:  Where do we go from here?
Databases and Ontologies: Where do we go from here?Maryann Martone
 

Similar to Big Data in Clinical Research (20)

Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
2015 04-18-wilson cg
2015 04-18-wilson cg2015 04-18-wilson cg
2015 04-18-wilson cg
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
Big Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John CaiBig Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John Cai
 
Big Data and Artificial Intelligence
Big Data and Artificial Intelligence Big Data and Artificial Intelligence
Big Data and Artificial Intelligence
 
Introduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia ResearchIntroduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia Research
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
Hadoop Enabled Healthcare
Hadoop Enabled HealthcareHadoop Enabled Healthcare
Hadoop Enabled Healthcare
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Design and implementation of Clinical Databases using openEHR
Design and implementation of Clinical Databases using openEHRDesign and implementation of Clinical Databases using openEHR
Design and implementation of Clinical Databases using openEHR
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Caris Life Sciences
Caris Life SciencesCaris Life Sciences
Caris Life Sciences
 
Caris Life Sciences
Caris Life SciencesCaris Life Sciences
Caris Life Sciences
 
Web scraping and healthcare
Web scraping and healthcareWeb scraping and healthcare
Web scraping and healthcare
 
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
 
Databases and Ontologies: Where do we go from here?
Databases and Ontologies:  Where do we go from here?Databases and Ontologies:  Where do we go from here?
Databases and Ontologies: Where do we go from here?
 

More from Mike Hogarth, MD, FACMI, FACP

Informatics in disease management: What will the future bring?
Informatics in disease management: What will the future bring?Informatics in disease management: What will the future bring?
Informatics in disease management: What will the future bring?Mike Hogarth, MD, FACMI, FACP
 
Informatics and the merging of research and quality measures with bedside care
Informatics and the merging of research and quality measures with bedside careInformatics and the merging of research and quality measures with bedside care
Informatics and the merging of research and quality measures with bedside careMike Hogarth, MD, FACMI, FACP
 
Keep us safe: An overview of US public health informatics systems and archite...
Keep us safe: An overview of US public health informatics systems and archite...Keep us safe: An overview of US public health informatics systems and archite...
Keep us safe: An overview of US public health informatics systems and archite...Mike Hogarth, MD, FACMI, FACP
 
Taking Quantum Computing for a Spin: What is Imaginary and What is Real?
Taking Quantum Computing for a Spin: What is Imaginary and What is Real?Taking Quantum Computing for a Spin: What is Imaginary and What is Real?
Taking Quantum Computing for a Spin: What is Imaginary and What is Real?Mike Hogarth, MD, FACMI, FACP
 
Linking Electronic Patient Records and Death Records: Challenges and Opportun...
Linking Electronic Patient Records and Death Records: Challenges and Opportun...Linking Electronic Patient Records and Death Records: Challenges and Opportun...
Linking Electronic Patient Records and Death Records: Challenges and Opportun...Mike Hogarth, MD, FACMI, FACP
 
The OneSource Initiative: An Approach to Structured Sourcing of Key Clinical ...
The OneSource Initiative: An Approach to Structured Sourcing of Key Clinical ...The OneSource Initiative: An Approach to Structured Sourcing of Key Clinical ...
The OneSource Initiative: An Approach to Structured Sourcing of Key Clinical ...Mike Hogarth, MD, FACMI, FACP
 
A Federal and California State Collaboration to Improve Vital Events Reporting
A Federal and California State Collaboration to Improve Vital Events ReportingA Federal and California State Collaboration to Improve Vital Events Reporting
A Federal and California State Collaboration to Improve Vital Events ReportingMike Hogarth, MD, FACMI, FACP
 
Public Health Information Systems and Data Standards in Public Health Informa...
Public Health Information Systems and Data Standards in Public Health Informa...Public Health Information Systems and Data Standards in Public Health Informa...
Public Health Information Systems and Data Standards in Public Health Informa...Mike Hogarth, MD, FACMI, FACP
 
Engaging Patients Electronically for Research and Education: Challenges and O...
Engaging Patients Electronically for Research and Education: Challenges and O...Engaging Patients Electronically for Research and Education: Challenges and O...
Engaging Patients Electronically for Research and Education: Challenges and O...Mike Hogarth, MD, FACMI, FACP
 
Informatics Principles of Modern Institutional Bio-banking: The Road Ahead
Informatics Principles of Modern Institutional Bio-banking: The Road AheadInformatics Principles of Modern Institutional Bio-banking: The Road Ahead
Informatics Principles of Modern Institutional Bio-banking: The Road AheadMike Hogarth, MD, FACMI, FACP
 
Data Quality Matters: EHR Data Quality, MACRA, and Improving Healthcare
Data Quality Matters: EHR Data Quality, MACRA, and Improving HealthcareData Quality Matters: EHR Data Quality, MACRA, and Improving Healthcare
Data Quality Matters: EHR Data Quality, MACRA, and Improving HealthcareMike Hogarth, MD, FACMI, FACP
 
From Bits to Qubits: Can Medicine Benefit From Quantum Computing?
From Bits to Qubits: Can Medicine Benefit From Quantum Computing?From Bits to Qubits: Can Medicine Benefit From Quantum Computing?
From Bits to Qubits: Can Medicine Benefit From Quantum Computing?Mike Hogarth, MD, FACMI, FACP
 
Clinical Data Collection: The Good, the Bad, the Beautiful
Clinical Data Collection: The Good, the Bad, the BeautifulClinical Data Collection: The Good, the Bad, the Beautiful
Clinical Data Collection: The Good, the Bad, the BeautifulMike Hogarth, MD, FACMI, FACP
 

More from Mike Hogarth, MD, FACMI, FACP (17)

EHR v2.0: Optimizing Usability and Utility
EHR v2.0: Optimizing Usability and UtilityEHR v2.0: Optimizing Usability and Utility
EHR v2.0: Optimizing Usability and Utility
 
Informatics in disease management: What will the future bring?
Informatics in disease management: What will the future bring?Informatics in disease management: What will the future bring?
Informatics in disease management: What will the future bring?
 
Informatics and the merging of research and quality measures with bedside care
Informatics and the merging of research and quality measures with bedside careInformatics and the merging of research and quality measures with bedside care
Informatics and the merging of research and quality measures with bedside care
 
Keep us safe: An overview of US public health informatics systems and archite...
Keep us safe: An overview of US public health informatics systems and archite...Keep us safe: An overview of US public health informatics systems and archite...
Keep us safe: An overview of US public health informatics systems and archite...
 
Taking Quantum Computing for a Spin: What is Imaginary and What is Real?
Taking Quantum Computing for a Spin: What is Imaginary and What is Real?Taking Quantum Computing for a Spin: What is Imaginary and What is Real?
Taking Quantum Computing for a Spin: What is Imaginary and What is Real?
 
Linking Electronic Patient Records and Death Records: Challenges and Opportun...
Linking Electronic Patient Records and Death Records: Challenges and Opportun...Linking Electronic Patient Records and Death Records: Challenges and Opportun...
Linking Electronic Patient Records and Death Records: Challenges and Opportun...
 
The OneSource Initiative: An Approach to Structured Sourcing of Key Clinical ...
The OneSource Initiative: An Approach to Structured Sourcing of Key Clinical ...The OneSource Initiative: An Approach to Structured Sourcing of Key Clinical ...
The OneSource Initiative: An Approach to Structured Sourcing of Key Clinical ...
 
A Federal and California State Collaboration to Improve Vital Events Reporting
A Federal and California State Collaboration to Improve Vital Events ReportingA Federal and California State Collaboration to Improve Vital Events Reporting
A Federal and California State Collaboration to Improve Vital Events Reporting
 
Classic Papers in Medical Informatics
Classic Papers in Medical InformaticsClassic Papers in Medical Informatics
Classic Papers in Medical Informatics
 
Public Health Information Systems and Data Standards in Public Health Informa...
Public Health Information Systems and Data Standards in Public Health Informa...Public Health Information Systems and Data Standards in Public Health Informa...
Public Health Information Systems and Data Standards in Public Health Informa...
 
Pathology Informatics: Past, Present, and Future
Pathology Informatics: Past, Present, and FuturePathology Informatics: Past, Present, and Future
Pathology Informatics: Past, Present, and Future
 
Engaging Patients Electronically for Research and Education: Challenges and O...
Engaging Patients Electronically for Research and Education: Challenges and O...Engaging Patients Electronically for Research and Education: Challenges and O...
Engaging Patients Electronically for Research and Education: Challenges and O...
 
Best Practices in Clinical Systems Integration
Best Practices in Clinical Systems IntegrationBest Practices in Clinical Systems Integration
Best Practices in Clinical Systems Integration
 
Informatics Principles of Modern Institutional Bio-banking: The Road Ahead
Informatics Principles of Modern Institutional Bio-banking: The Road AheadInformatics Principles of Modern Institutional Bio-banking: The Road Ahead
Informatics Principles of Modern Institutional Bio-banking: The Road Ahead
 
Data Quality Matters: EHR Data Quality, MACRA, and Improving Healthcare
Data Quality Matters: EHR Data Quality, MACRA, and Improving HealthcareData Quality Matters: EHR Data Quality, MACRA, and Improving Healthcare
Data Quality Matters: EHR Data Quality, MACRA, and Improving Healthcare
 
From Bits to Qubits: Can Medicine Benefit From Quantum Computing?
From Bits to Qubits: Can Medicine Benefit From Quantum Computing?From Bits to Qubits: Can Medicine Benefit From Quantum Computing?
From Bits to Qubits: Can Medicine Benefit From Quantum Computing?
 
Clinical Data Collection: The Good, the Bad, the Beautiful
Clinical Data Collection: The Good, the Bad, the BeautifulClinical Data Collection: The Good, the Bad, the Beautiful
Clinical Data Collection: The Good, the Bad, the Beautiful
 

Recently uploaded

High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...narwatsonia7
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call girls in Ahmedabad High profile
 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girlsnehamumbai
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliRewAs ALI
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Miss joya
 
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...narwatsonia7
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...Garima Khatri
 
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service CoimbatoreCall Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatorenarwatsonia7
 
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableVip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableNehru place Escorts
 
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...CALL GIRLS
 
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls JaipurCall Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipurparulsinha
 
(Rocky) Jaipur Call Girl - 9001626015 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9001626015 Escorts Service 50% Off with Cash ON De...(Rocky) Jaipur Call Girl - 9001626015 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9001626015 Escorts Service 50% Off with Cash ON De...indiancallgirl4rent
 
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...narwatsonia7
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Serviceparulsinha
 
CALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune) Girls Service
CALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune)  Girls ServiceCALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune)  Girls Service
CALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune) Girls ServiceMiss joya
 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...narwatsonia7
 
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls ServiceCall Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Servicenarwatsonia7
 

Recently uploaded (20)

High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas Ali
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
 
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
 
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service CoimbatoreCall Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
 
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableVip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
 
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
 
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls JaipurCall Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
 
(Rocky) Jaipur Call Girl - 9001626015 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9001626015 Escorts Service 50% Off with Cash ON De...(Rocky) Jaipur Call Girl - 9001626015 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9001626015 Escorts Service 50% Off with Cash ON De...
 
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
 
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCREscort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
 
CALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune) Girls Service
CALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune)  Girls ServiceCALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune)  Girls Service
CALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune) Girls Service
 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
 
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
 
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls ServiceCall Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
 

Big Data in Clinical Research

  • 1. Big Data in Clinical Research Michael Hogarth, MD, FACMI, FACP Clinical Research Information Officer, UC San Diego Health System Professor, Dept of Medicine, UCSD Health
  • 2. What is Big Data? https://searchdatamanagement.techtarget.com/definition/big-data
  • 3. The genesis of the “big data” movement • Google began the new database development revolution as relational databases could not handle the volume or type of data efficiently to provide google search • What Google built – “BigTable” – Innovation 1 – column oriented with each row as a web page – Innovation 2- data is stored across multiple machines (nodes) using “Google File System” • The entire web table is split into mini-tables (Table- ets), each a few Gigabytes each – 100,000 – Innovation 3 – a ”map-reduce” computing engine – Scales to Petabytes!
  • 5. Importance of “big data” in healthcare https://www.incoutlook.com/2019/08/02/big-data-a-game-changer-in-healthcare-industry/
  • 6. Big Data in Clinical Research • Predicting ‘feasibility’ of a trial and reducing uncertainty • Improving accrual through “smart matching” of trials to potential participants Clinical trial efficiency • Hypothesis generation • Pragmatic/large-scale trials - Real world “evidence” (RWE) • Uncovering patterns Real world data and Real-World Evidence (RWE) • predictive algorithms • assistive systems (image analysis/enhancement) Healthcare AI/ML models • pharmacokinetic simulation (in-silico drug discovery) Drug design
  • 7. Big Data and Real-World Evidence (RWE)
  • 8. Using “Real World Evidence” (RWE) Adoption of health IT has resulted in large scale (massive) amounts of biomedical “digital” data “RWE will not replace the need for data from traditional trials; however, technologies supporting RWD are enabling far richer and more diverse information to be collected during drug development drug development.” Swift et al. “Innovation at the Intersection of Clinical Trials and Real- World Data Science to Advance Patient Care. Clin Transl Sci (2018) 00, 1-11. https://www.ncbi.nlm.nih.gov/pubmed/29768712 (subtext: the randomized clinical trial is still here and is not dead - but is perhaps becoming an endangered species under pressure from new trial designs and RWE)
  • 9. An Example of an RWE Trial • ADAPTABLE – Aspirin Dosing: A Patient-Centric Trial Assessing Benefits and Long-Term Effectiveness • Compares two aspirin doses (81mg vs. 325mg) • Randomizes 20,000 patients with CVDz to one of the two doses • Currently underway through the National Patient-Centered Clinical Research Network (PCORNet) – had 600,000 eligible patients
  • 10. Wearable Sensors – Billions of data points
  • 12. RWE Trial Using a Wearable Sensor
  • 14. RNAseq – a key advancement for post- translational/DNA processes
  • 15. Comparing genomic data types in terms of “big data” Nature Reviews, Genetics, Volume 19, April 2018
  • 16. Biomedical research often requires you to become a ”data wrangler” Obtaining and preparing ”real world data” (RDW) from an EHR Privacy and your data stewardship Challenges with healthcare data Your Data Wrangler Toolbelt
  • 17. RCHSD Clinical Research Seminar Series - 07/28/2020 Types of Biomedical Research Data – Pre-Clinical Experiments • a broad range of data from wet-lab experiments, animal models, etc.. • typically created/stored in lab/bioinformatics systems – Clinical Trial data • specific trial-related data collection • typically collected using electronic case report forms (eCRFs) – Clinical Practice data • generated during routine clinical practice events • typically created/stored in an EHR • clinical practice data = real world data (RWD) What is real world data (RWD)? – “any health record information not collected as part of a randomized controlled trial” – “With RDW, we mean data that are not collected under experimental conditions, but data generated in routine care” TYPES OF BIOMEDICAL RESEARCH DATA
  • 19. Privacy and Healthcare Data -- Being a responsible steward of patient data Covered entities may disclose PHI for research with *individual authorization” Circumstances under which research use can proceed without authorization • IRB (or privacy board) issues a ”waiver of HIPAA authorization” • must satisfy 3 criteria: • use/disclosure involves no more than minimal risk to privacy of the individuals • the research could not be conducted without the waiver • the research could not be conducted without access to PHI. • minimal risk means: • a plan to protect identifiers from disclosure • a plan to destroy the identifiers at the earliest opportunity • written assurance the PHI will not be reused or disclosed to others
  • 21. Obtaining and Managing UCSDH Data • Data Stewards - honest brokers – UCSDH requires that you use a ‘data steward’ as an honest broker • You cannot access the warehouse yourself – HIPAA principle of “minimum necessary” – Data stewards are allowed to query “all records” to find specific cohort • ACTRI Data Extraction Concierge Service (DECS) • The “federated data stewards program” – Ophthalmology – Family Medicine – O’Brien Center (Nephrology) – Etc... • UCSD Health Nightingale data sets – coming in early 2022 – LDS under DUA – AKI – Cancer – COVID UCSD Health Virtual Research Desktop (VRD) Shin SY, et al. Healthc Inform Res. 2014;20(2):109-116.
  • 22. The ACTRI Data Extraction Concierge Service (DECS)
  • 23. Epic slicer dicer and cohort discovery 23
  • 24. Using Slicer-Dicer to facilitate DECS exports
  • 25. The UCSD Health Virtual Research Desktop What is a VRD?
  • 26. Common challenges with healthcare data The nature of EHR data (“dirta”) • Harmonizing to a common set of codes and values • EHR data is often a proxy for what really happened • Missing values are common -- charting by exception • Data can be in multiple places and in different forms Unstructured narrative text contains many of the desired data points • Need to ‘find’ information in narrative text • Often requires some form of text mining or natural language processing Comorbidities • Elixhauser • Charlson
  • 27. “Dirta” and why we “harmonize” data Blood Pressure – what do we really mean ? If we do an analysis, are we comparing the same things? BP “150/70 mmHg” John S| BP | 1-14- 2015 Blood pressure stored in system #2 Blood pressure stored in system #1 Mary P | BP | 1- 14-2015 Systolic BP Diastolic BP Type Encounter 150 70 Arterial Line In Patient What happens when we want to compute with data from both systems?
  • 28. Standard coding systems used with harmonized EHR data Conditions: ICD 10 CM / SNOMED CT Labs, vitals, reports - observations: LOINC Medical procedures: CPT Medications: RxNORM
  • 29. How many blood pressure measurement types could there possibly be? 477 types of blood pressure measurements
  • 30. The challenge of using “drug class”
  • 31. The University of California Clinical Data Warehouse – one of the larges OMOP CDM repositories today The UCDW has harmonized data for 15M patients in OMOP The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM_
  • 32. The UC COVID Research Data Set (UC-CORDS) • How Much Data? (Dec 2021) – 687,239 patients – 1,061,471,489 (>1.06 billion) observations (lab results, vital signs) – 23,243,659 clinical encounters • Provided to UC Health researchers through UC research informatics units/groups in their respective health systems • Cohort – All COVID tested patients (positive or negative) • De-identified to HIPAA LDS - no personal identifiers • Data – Demographics, Diagnoses, Medications, Laboratory results, Encounters (if seen by our doctors or hospitalized) • Refreshed – Every 2 months
  • 33. Learning to analyze very large data sets
  • 34. Google BigTable (2004) -- managing “big data” • The web circa 2000: – 2Billion web pages – 45Terabytes of data – Contemporary databases (RDBMS) were not able to cope • Needed to invent a new database – Adam Bosworth and Jeff Dean • Google BigTable – Google distributed File System – “GFS” – Virtualized single database “table” across thousands of computer – Stores data in a “column-oriented” database design
  • 35. “Map-Reduce” (Hadoop/Spark) – analyzing large data sets efficiently and at scale • Google needed to process large amounts of raw data, such as ”crawled” (acquired) documents, web request logs, etc.. • Created a simple computational model called ‘map-reduce’ which can process very large data sets • Hadoop = open source version of a map-reduce execution engine https://www.guru99.com/introduction-to-mapreduce.html
  • 36. Machine Learning A form of artificial intelligence • Identifies correlated patters in order to create predictions • Traditional “machine learning” – linear regression, logistic regression. • Today’s “machine learning” - leverages artificial neural networks (ANNs) • “Deep learning” – a deep neural network which has many nodal connections within its hidden layer – designed to work like the visual cortex... Most mature in image analysis – A convolutional neural network (CNN) is a typical deep learning network Rashidi, et al. Academic Pathology. 2019
  • 37. “Machine Learning” and AI in Healthcare • AI is not new to healthcare, but “machine learning” and “deep learning” are new AI techniques enabled by large data sets. https://mc.ai/awesome-ai-the-guide-to-master-artificial-intelligence/ 1983
  • 38. Deep Networks -- “Convolutional” Neural Networks (CNN) began as an approach to image recognition • a type of multi-layered (deep) ANN designed for recognition of visual features - “feedforward neural network whose architectures are tailored for minimizing the sensitivity to translations, local rotations and distortions of the input image.” • originally devised in 1988 by Yann LeCun at AT&T Labs to recognize handwritten digits • inspired by the connectivity pattern seen between neurons in the visual cortex • Convolution operation emulates response of an individual neuron to visual stimuli. Pooling coalesces outputs from one layer into a single neuron in the next layer
  • 39. Example: Deep Learning and Interpretation of Retinal Images for diabetic retinopathy • trained a ML systems with 494,661 retinal images • validated (tested) using dataset of 71,896 images from 14,880 patients • detection of vision- threatening diabetic retinopathy: AUC 0.958 (95% CI) • detection of referable diabetic retinopathy: AUC=0.936(95% CI)
  • 40. ML-based Chest X-Ray Assistive Interpretation in COVID
  • 41. Did it make a difference? 1 out of 5 felt it had an impact 30% felt it affected the treatment plan
  • 42. Big Data in Drug Design: “in-silico” drug design through structural bioinformatics and computer-based simulation http://www.vls3d.com/index.php/virtual-screening/comments-about-virtuel-screening?start=3
  • 43. Computational Drug Discovery as In-silico Drug, Genome, Proteome interactions Requires large-scale computing!
  • 44. Supercomputers - massive parallel processing • supercomputers split problems into pieces and execute each at the same time – massive parallel processing • Designed for mathematical computation, which occurs in simulations and optimization problems --- where multiple co- dependencies exist Current top supercomputers Uses of supercomputers 1970-2020
  • 45. Computer Simulation and COVID The SUMMIT supercomputer was used to predict pharmacological therapeutic targets to interfere with SARS-CoV-2 spike protein binding to ACE2 receptor in type2 pneumocytes
  • 46. Limits of classical computers in simulation
  • 47. Going beyond “classical” computers • Only having two states (1 or 0) for a single “bit” creates some limitations: – The larger the number of input states you want (ie, larger number or higher number of simultaneous bits to transmit), the higher the number of “wires” you need – The higher number of wires means more power, heat, unless you can make it all smaller and more compact • The Apple M1 CPU (8core, 64- bit) has 16 billion transistors and implements 5 nanometer (nm) “transistor gate length” (the smallest yet achieved). – At some point, you reach a limit in terms of what you can compute with a “classical computer” (binary computer)
  • 49. Have we reached “quantum supremacy”? • A Google-designed quantum computer was used in an experiment to perform a calculation that would require a classical supercomputer 10,000 years to complete • The calculation was to predict the likelihood of outcomes from a random- number generator (a problem first crafted by Google physicists in 2016) • The 53-qubit quantum computer performed the calculation in 200 seconds • ?Was it a contrived experiment?
  • 50. Questions? Mt Whitney - 14,505ft Lone Pine, California (Feb 2021)