SlideShare a Scribd company logo
1 of 26
Lies, Damn Lies, and Big Data
Applications, Limitations, Misconceptions
Brian Bissett
Senior Member
Institute of Electrical and Electronics Engineers (IEEE)
5/23/20151
5/23/20152
Overview
What is Big Data
Common Attributes of Big Data
Challenges of Working with Big Data
Validity Space
Outliers
Variance
Correlation and Causality
Summary
What is “Big Data”?
Depends who you ask. . . .
Gartner – define by the “three Vs”: Volume,
Velocity and Variety.
Oracle - the derivation of value from traditional
relational database-driven business decision
making, augmented with new sources of
unstructured data.
Intel – the generation of a median of 300
terabytes of data a week.
3 5/23/2015
What is “Big Data”?
Microsoft - the process of applying serious computing
power—the latest in machine learning and artificial
intelligence—to seriously massive and often highly
complex sets of information.
The Method for an Integrated Knowledge Environment
(MIKE) project argues that big data is not a function of
the size but of complexity. (A high degree of
permutations and interactions within a data set defines
big data.)
National Institute of Standards and Technology (NIST) -
big data “exceed(s) the capacity or capability of current
methods and systems.”
4 5/23/2015
The Current 8 V’s of Big Data
Volume
Velocity
Variety
Value – is this worth something to someone?
Validity – is this correct?
Viability – can this stand independently?
Variability – is the same result reported consistently?
Verifiability – do we know where this came from?
5 5/23/2015
The 5 P’s for Biomedical Big Data
Evidence Based, Outcome Driven, and Affordable
Health Care will Require the Five P’s:
Predictive
Precise
Preventive
Personalized
Patient-Centric
The Cancer Genome Atlas (TCGA)
6 5/23/2015
Challenges of Dealing with Big Data
Management – In 10 Years at Zettabyte Levels!
Infrastructure
Performance Analytics – TBD.
Unstructured – Lacks any Meaningful Standards.
Data Visualization – Humans see in 3D Only.
Navigation – Siloed Data is Difficult to Access.
Missing Data – Average of 30% from HIT Data.
Incorrect Data – Average of 25% - 30%.
7 5/23/2015
The Three C’s (Challenges)
Collection
– is it worth saving?
– Value = Actionable
Consolidation
– Clean it up! "Not Collected Here"
Consumption
– Easy to Add Processors
– Difficult to move Data.
8 5/23/2015
Transactions: Real Time & Queued
Real Time – must be done ASAP
– Retail: Credit Card Transactions
– Security: Is Passenger on the “no fly list”
– NICS Checks for Firearms Purchases
– Stock Purchases
Queued – Everything else that can wait
– Traffic Data, process images from Traffic Cameras to
determine speed and volume.
– Daily Customer Counts
– Daily or Monthly Volume for Stock Transactions
9 5/23/2015
When are the Conclusions Drawn
from Big Data Most Accurate?
Big Data is most reliable when working in Two and
sometimes Three Dimensional Matrices.
Where the Assumption to be derived is Boolean.
Where the Data Acquired is known to be of Good
Quality.
Example: Traffic Data at Checkpoint
– Record: Number of Cars, Time, Maybe Speed
– Derive: Is Traffic Flowing without Delay?
10 5/23/2015
Big Data = Big Problems
More Excess Data as Compared to Real Signals =
More Spurious Relationships.
11 5/23/2015
Source: N.N. Taleb
Outliers: Goldmine or Nuisance
An Outlier can either be a Goldmine (the needle in
the haystack sought) or a Nuisance (an artifact to be
ignored)
Example: Lipinski’s Rule of 5 (Ro5)
16% of oral drugs violate at least one of the criteria,
and 6% fail two or more.
Billion Dollar Drugs that have failed the Ro5 criteria:
Lipitor, Singulair
12 5/23/2015
Outliers: Goldmine or Nuisance
Example: Nuisance Outlier
The speed of the Motorcycle in no
way reflects the true speed of the
Traffic.
13 5/23/2015
No rigid mathematical definition exists of what
constitutes an outlier, or when an Outlier may be
omitted from an analysis.
Mahalanobis Distance - distance between data point
and a multivariate space's centroid (overall mean).
(Commonly used in Linear Regression)
Outliers – Bonedigger and Milo
Bonedigger the lion and Milo the sausage dog are inseparable. The
friendship between an 11-pound wiener dog and a 500-pound lion is
the only one ever seen in the world.
14 5/23/2015
Melanoma Example
Dealing with Variance
Impossible to Positively Discern without Biopsy
15 5/23/2015
Melanoma ~ 80% Diagnostic Rate
with Current Image Algorithms
Because Melanoma can present in all Colors, Shapes,
Granularities, and Textures; More Data is unlikely to
improve Current Diagnostic Image algorithms.
Sensitivity – Rule out Condition when Negative
= true positives/(true positives + false negatives)
80% Sensitive Test will Detect 8 out of 10 Cancers.
Specificity – Rule in Condition when Positive
=true negatives/(true negatives + false positives)
95% Specific Test -> False Positive rate of 5%
Sensitivity and specificity are inversely proportional16 5/23/2015
Variance – The Batch Effect
High-throughput technologies.
Batch Effects when measurements are affected by
laboratory conditions, reagent lots, and personnel
differences.
Pharmaceutical Mergers - Particularly troubling
when merging data sets from different labs.
Normalization for Batch Effects is extremely
difficult.
“What level is your pain on a scale from 1 to 10?”
17 5/23/2015
Qualitative Variance
Massachusetts General Hospital Harvard Medical
School investigated discrepancy rates for the
interpretation of Radiology Films.
60 examinations - 30 previously interpreted by
themselves and 30 interpreted by their peers.
Interobserver Disagreement Rate = 26%.
Intraobserver Disagreement Rate = 32%.
Radiologists agreed with other Radiologists more
than themselves.
18 5/23/2015
Correlation vs. Causation
Correlation is easy to prove.
How much of a Correlation is Easy to Prove.
R2 = 1.0 – Perfect Correlation.
R2 = 0.0 – No Correlation.
Causation is nearly Impossible to Prove.
US Spending on Science, Space, and Technology
correlates Nearly Perfectly (R2 = 0.99208) with
Suicides by Hanging, Strangulation and Suffocation.
19 5/23/2015
Bradford Hill Causality Proof
Strong – Five or Ten Fold Increase
Consistent – Populations or Time does not Effect
Specific – A Link (a location, mechanism, etc.)
Temporal - Association Increases with Duration
Gradient - Association Increases with Exposure
Plausible – Association Easily Seen
Coherent – Experimental Evidence Supports
Similar Behavior in Analogous Situations
20 5/23/2015
Big Data Governance Does not Exist
No laws exist to address the utilization of big data.
Concerns about citizen privacy and business liability
have yet to be addressed.
Critical Challenge to the Federal Government.
Federal Agencies that Utilize Big Data do so on an
ad-hoc basis.
Little guidance exists on using petabyte sizes of
private citizen data for predictive analytics.
– Privacy Act of 1974 and HIPAA 1996.
21 5/23/2015
Hierarchy of Evidence
22 5/23/2015
Big Data = Observational Study
Data is not Collected to Examine a Specific
Problem using a Protocol.
The Treatment Group and the Control Group are
outside the control of the Investigator.
Groups Differing in Outcome are identified and
compared on the basis of a supposed causal
attribute.
Longitudinal - repeated observations of the same
variables over long periods of time.
23 5/23/2015
Summary
The World is Accumulating a Lot of Data.
Nobody Agrees on What “Big” is.
On Average, 30% of the Data is Incorrect.
On Average, 30% of the Data is Missing.
Correlation is the Easy Part.
Bradford Hill gives Guidance on Proving Causation.
There is a Hierarchy of Evidence and Expert Opinion
and Big Data are at the bottom of it.
24 5/23/2015
Selected Publications
Automated Data Analysis with Excel
– Softcover: 442 Pages
– Chapman & Hall (June 2007)
– Second Edition Coming in 2016
– ISBN: 1-58488-885-7
Practical Pharmaceutical Laboratory
Automation
– Hardcover: 464 pages
– Publisher: CRC Press (May 2003)
– ISBN: 0849318149
25 5/23/2015
References
Bickerton GR, Paolini GV, Besnard J, Muresan S, Hopkins AL. Quantifying the chemical beauty of
drugs. Nat Chem. 2012;4:90–98. doi: 10.1038/nchem.1243.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3524573/
The Big Data Conundrum: How to Define It? http://www.technologyreview.com/view/519851/the-
big-data-conundrum-how-to-define-it/
Abujudeh, HH, Boland, GW, Kaewalai, R, et al. Abdominal and Pelvic Computed Tomography (CT)
Interpretation: discrepancy rates among experienced radiologists. Eur Radiol.2010;20(8): 1952-7.
Maryam Ramezani, Alireza Karimian, and Payman Moallem. Automatic Detection of Malignant
Melanoma using Macroscopic Images. J Med Signals Sens. 2014 Oct-Dec; 4(4): 281–290. PMCID:
PMC4236807
26 5/23/2015

More Related Content

What's hot

Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
 
Beyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AIBeyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AIPaul Agapow
 
Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...BenVanCalster
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Ewout Steyerberg
 
Make clinical prediction models great again
Make clinical prediction models great againMake clinical prediction models great again
Make clinical prediction models great againBenVanCalster
 
Program theory evaluation
Program theory evaluationProgram theory evaluation
Program theory evaluationMatti Heino
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchGreg Landrum
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
Data Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That WayData Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That WayMelinda Thielbar
 
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache SparkDrug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache SparkDatabricks
 
Interpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical ResearchInterpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical ResearchPaul Agapow
 
Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...
Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...
Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...University of Twente
 
AI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth IsraelAI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth IsraelLevi Shapiro
 
Introduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIIntroduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIMaarten van Smeden
 
Assumptions Schmassumptions; Did my intervention work or not?!
Assumptions Schmassumptions; Did my intervention work or not?!Assumptions Schmassumptions; Did my intervention work or not?!
Assumptions Schmassumptions; Did my intervention work or not?!Matti Heino
 
ML & AI in pharma: an overview
ML & AI in pharma: an overviewML & AI in pharma: an overview
ML & AI in pharma: an overviewPaul Agapow
 
Artificial intelligence in drug discovery
Artificial intelligence in drug discoveryArtificial intelligence in drug discovery
Artificial intelligence in drug discoveryRAVINDRABABUKOPPERA
 
Growth, Engagement & Search Metrics: Snake Oil or North Stars
Growth, Engagement & Search Metrics: Snake Oil or North StarsGrowth, Engagement & Search Metrics: Snake Oil or North Stars
Growth, Engagement & Search Metrics: Snake Oil or North StarsJune Andrews
 

What's hot (20)

Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Beyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AIBeyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AI
 
Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
 
Make clinical prediction models great again
Make clinical prediction models great againMake clinical prediction models great again
Make clinical prediction models great again
 
Program theory evaluation
Program theory evaluationProgram theory evaluation
Program theory evaluation
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Math in data
Math in dataMath in data
Math in data
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Data Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That WayData Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That Way
 
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache SparkDrug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
 
Interpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical ResearchInterpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical Research
 
Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...
Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...
Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...
 
AI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth IsraelAI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth Israel
 
Introduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIIntroduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part II
 
Assumptions Schmassumptions; Did my intervention work or not?!
Assumptions Schmassumptions; Did my intervention work or not?!Assumptions Schmassumptions; Did my intervention work or not?!
Assumptions Schmassumptions; Did my intervention work or not?!
 
ML & AI in pharma: an overview
ML & AI in pharma: an overviewML & AI in pharma: an overview
ML & AI in pharma: an overview
 
Artificial intelligence in drug discovery
Artificial intelligence in drug discoveryArtificial intelligence in drug discovery
Artificial intelligence in drug discovery
 
Growth, Engagement & Search Metrics: Snake Oil or North Stars
Growth, Engagement & Search Metrics: Snake Oil or North StarsGrowth, Engagement & Search Metrics: Snake Oil or North Stars
Growth, Engagement & Search Metrics: Snake Oil or North Stars
 

Viewers also liked

Pre coord subj hdgs to-fast 2016-04-04rev
Pre coord subj hdgs to-fast 2016-04-04revPre coord subj hdgs to-fast 2016-04-04rev
Pre coord subj hdgs to-fast 2016-04-04revJohn Riemer
 
Addressable Location Indicator Apparatus and Method
Addressable Location Indicator Apparatus and MethodAddressable Location Indicator Apparatus and Method
Addressable Location Indicator Apparatus and MethodBrian Bissett
 
Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...
Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...
Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...Brian Bissett
 
Multivariate Analysis Of Energy Policy Options Using Lindo
Multivariate Analysis Of Energy Policy Options Using LindoMultivariate Analysis Of Energy Policy Options Using Lindo
Multivariate Analysis Of Energy Policy Options Using LindoBrian Bissett
 
ElogDoct: A Tool for Lipophilicity Determination in Drug Discovery. 2. Basic ...
ElogDoct: A Tool for Lipophilicity Determination in Drug Discovery. 2. Basic ...ElogDoct: A Tool for Lipophilicity Determination in Drug Discovery. 2. Basic ...
ElogDoct: A Tool for Lipophilicity Determination in Drug Discovery. 2. Basic ...Brian Bissett
 

Viewers also liked (7)

Pre coord subj hdgs to-fast 2016-04-04rev
Pre coord subj hdgs to-fast 2016-04-04revPre coord subj hdgs to-fast 2016-04-04rev
Pre coord subj hdgs to-fast 2016-04-04rev
 
Leo Burnett
Leo BurnettLeo Burnett
Leo Burnett
 
Addressable Location Indicator Apparatus and Method
Addressable Location Indicator Apparatus and MethodAddressable Location Indicator Apparatus and Method
Addressable Location Indicator Apparatus and Method
 
Leo Burnett
Leo BurnettLeo Burnett
Leo Burnett
 
Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...
Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...
Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...
 
Multivariate Analysis Of Energy Policy Options Using Lindo
Multivariate Analysis Of Energy Policy Options Using LindoMultivariate Analysis Of Energy Policy Options Using Lindo
Multivariate Analysis Of Energy Policy Options Using Lindo
 
ElogDoct: A Tool for Lipophilicity Determination in Drug Discovery. 2. Basic ...
ElogDoct: A Tool for Lipophilicity Determination in Drug Discovery. 2. Basic ...ElogDoct: A Tool for Lipophilicity Determination in Drug Discovery. 2. Basic ...
ElogDoct: A Tool for Lipophilicity Determination in Drug Discovery. 2. Basic ...
 

Similar to Lies, Damn Lies, and Big Data

D1S1T3N4_Pratibha Jalui & Reetabrata Bhattacharyya
D1S1T3N4_Pratibha Jalui & Reetabrata BhattacharyyaD1S1T3N4_Pratibha Jalui & Reetabrata Bhattacharyya
D1S1T3N4_Pratibha Jalui & Reetabrata BhattacharyyaReetabrata Bhattacharyya
 
hisory of computers in pharmaceutical research presentation.pptx
hisory of computers in pharmaceutical research presentation.pptxhisory of computers in pharmaceutical research presentation.pptx
hisory of computers in pharmaceutical research presentation.pptxDhanaa Dhoni
 
20160811 Big Data for Health and Medicine
20160811 Big Data for Health and Medicine20160811 Big Data for Health and Medicine
20160811 Big Data for Health and MedicineBrian Bot
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009Ian Foster
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...D3 Consutling
 
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...marcus evans Network
 
Clinical Decision Support: Driving the Last Mile
Clinical Decision Support: Driving the Last MileClinical Decision Support: Driving the Last Mile
Clinical Decision Support: Driving the Last MileHealth Catalyst
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in HealthcarePaul Agapow
 
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...IRJET Journal
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
 
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...DataScienceConferenc1
 
Improving health care outcomes with responsible data science #escience2018
Improving health care outcomes with responsible data science #escience2018Improving health care outcomes with responsible data science #escience2018
Improving health care outcomes with responsible data science #escience2018Wessel Kraaij
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsVijay Raghavan
 
Predicting the Future of Predictive Analytics in Healthcare
Predicting the Future of Predictive Analytics in HealthcarePredicting the Future of Predictive Analytics in Healthcare
Predicting the Future of Predictive Analytics in HealthcareDale Sanders
 
Is Big Data Always Good Data?
Is Big Data Always Good Data? Is Big Data Always Good Data?
Is Big Data Always Good Data? MyMeds&Me
 
Big Data in Disease Management
Big Data in Disease ManagementBig Data in Disease Management
Big Data in Disease ManagementInterpretOmics
 
ai in clinical trails.pptx
ai in clinical trails.pptxai in clinical trails.pptx
ai in clinical trails.pptxRajdeepMaji3
 
aiinclinicaltrails-221008052225-c7ed8a95.pdf
aiinclinicaltrails-221008052225-c7ed8a95.pdfaiinclinicaltrails-221008052225-c7ed8a95.pdf
aiinclinicaltrails-221008052225-c7ed8a95.pdfMartaHC1
 

Similar to Lies, Damn Lies, and Big Data (20)

Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
 
D1S1T3N4_Pratibha Jalui & Reetabrata Bhattacharyya
D1S1T3N4_Pratibha Jalui & Reetabrata BhattacharyyaD1S1T3N4_Pratibha Jalui & Reetabrata Bhattacharyya
D1S1T3N4_Pratibha Jalui & Reetabrata Bhattacharyya
 
hisory of computers in pharmaceutical research presentation.pptx
hisory of computers in pharmaceutical research presentation.pptxhisory of computers in pharmaceutical research presentation.pptx
hisory of computers in pharmaceutical research presentation.pptx
 
20160811 Big Data for Health and Medicine
20160811 Big Data for Health and Medicine20160811 Big Data for Health and Medicine
20160811 Big Data for Health and Medicine
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
 
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
 
Clinical Decision Support: Driving the Last Mile
Clinical Decision Support: Driving the Last MileClinical Decision Support: Driving the Last Mile
Clinical Decision Support: Driving the Last Mile
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in Healthcare
 
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancer
 
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
 
Improving health care outcomes with responsible data science #escience2018
Improving health care outcomes with responsible data science #escience2018Improving health care outcomes with responsible data science #escience2018
Improving health care outcomes with responsible data science #escience2018
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and Applications
 
Predicting the Future of Predictive Analytics in Healthcare
Predicting the Future of Predictive Analytics in HealthcarePredicting the Future of Predictive Analytics in Healthcare
Predicting the Future of Predictive Analytics in Healthcare
 
Is Big Data Always Good Data?
Is Big Data Always Good Data? Is Big Data Always Good Data?
Is Big Data Always Good Data?
 
Big Data in Disease Management
Big Data in Disease ManagementBig Data in Disease Management
Big Data in Disease Management
 
ai in clinical trails.pptx
ai in clinical trails.pptxai in clinical trails.pptx
ai in clinical trails.pptx
 
aiinclinicaltrails-221008052225-c7ed8a95.pdf
aiinclinicaltrails-221008052225-c7ed8a95.pdfaiinclinicaltrails-221008052225-c7ed8a95.pdf
aiinclinicaltrails-221008052225-c7ed8a95.pdf
 

More from Brian Bissett

Automating Data Analysis with Excel Bio-IT World 2018
Automating Data Analysis with Excel Bio-IT World 2018Automating Data Analysis with Excel Bio-IT World 2018
Automating Data Analysis with Excel Bio-IT World 2018Brian Bissett
 
Deaths by Shooting in Baltimore before and after the Firearms Safety Act of 2...
Deaths by Shooting in Baltimore before and after the Firearms Safety Act of 2...Deaths by Shooting in Baltimore before and after the Firearms Safety Act of 2...
Deaths by Shooting in Baltimore before and after the Firearms Safety Act of 2...Brian Bissett
 
Bio-IT 2017 Automation
Bio-IT 2017 AutomationBio-IT 2017 Automation
Bio-IT 2017 AutomationBrian Bissett
 
Presentation given at Bio-IT World 2016 as a Senior Member of the IEEE on the...
Presentation given at Bio-IT World 2016 as a Senior Member of the IEEE on the...Presentation given at Bio-IT World 2016 as a Senior Member of the IEEE on the...
Presentation given at Bio-IT World 2016 as a Senior Member of the IEEE on the...Brian Bissett
 
Program Management of SSA's Data Center OMB 300 Program
Program Management of SSA's Data Center OMB 300 ProgramProgram Management of SSA's Data Center OMB 300 Program
Program Management of SSA's Data Center OMB 300 ProgramBrian Bissett
 
Data Analytics of Strategic Information Technology Asset Reviews
Data Analytics of Strategic Information Technology Asset ReviewsData Analytics of Strategic Information Technology Asset Reviews
Data Analytics of Strategic Information Technology Asset ReviewsBrian Bissett
 
ElogPoct: A Tool for Lipophilicity Determination in Drug Discovery
ElogPoct: A Tool for Lipophilicity Determination in Drug DiscoveryElogPoct: A Tool for Lipophilicity Determination in Drug Discovery
ElogPoct: A Tool for Lipophilicity Determination in Drug DiscoveryBrian Bissett
 
Automating pKa Curve Fitting Using Origin
Automating pKa Curve Fitting Using OriginAutomating pKa Curve Fitting Using Origin
Automating pKa Curve Fitting Using OriginBrian Bissett
 
Physicochemical Profiling In Drug Research
Physicochemical Profiling In Drug ResearchPhysicochemical Profiling In Drug Research
Physicochemical Profiling In Drug ResearchBrian Bissett
 
Automated Kinetic Solubility Assay Apparatus and Method
Automated Kinetic Solubility Assay Apparatus and MethodAutomated Kinetic Solubility Assay Apparatus and Method
Automated Kinetic Solubility Assay Apparatus and MethodBrian Bissett
 
Advanced Excel Technologies In Early Development Applications
Advanced Excel Technologies In Early Development ApplicationsAdvanced Excel Technologies In Early Development Applications
Advanced Excel Technologies In Early Development ApplicationsBrian Bissett
 
Development of Pfizer's Third Generation Turbidimetric Solubility Assay - An ...
Development of Pfizer's Third Generation Turbidimetric Solubility Assay - An ...Development of Pfizer's Third Generation Turbidimetric Solubility Assay - An ...
Development of Pfizer's Third Generation Turbidimetric Solubility Assay - An ...Brian Bissett
 
Bridging Pharma And IT 2008
Bridging Pharma And IT 2008Bridging Pharma And IT 2008
Bridging Pharma And IT 2008Brian Bissett
 

More from Brian Bissett (14)

IT Security Metrics
IT Security MetricsIT Security Metrics
IT Security Metrics
 
Automating Data Analysis with Excel Bio-IT World 2018
Automating Data Analysis with Excel Bio-IT World 2018Automating Data Analysis with Excel Bio-IT World 2018
Automating Data Analysis with Excel Bio-IT World 2018
 
Deaths by Shooting in Baltimore before and after the Firearms Safety Act of 2...
Deaths by Shooting in Baltimore before and after the Firearms Safety Act of 2...Deaths by Shooting in Baltimore before and after the Firearms Safety Act of 2...
Deaths by Shooting in Baltimore before and after the Firearms Safety Act of 2...
 
Bio-IT 2017 Automation
Bio-IT 2017 AutomationBio-IT 2017 Automation
Bio-IT 2017 Automation
 
Presentation given at Bio-IT World 2016 as a Senior Member of the IEEE on the...
Presentation given at Bio-IT World 2016 as a Senior Member of the IEEE on the...Presentation given at Bio-IT World 2016 as a Senior Member of the IEEE on the...
Presentation given at Bio-IT World 2016 as a Senior Member of the IEEE on the...
 
Program Management of SSA's Data Center OMB 300 Program
Program Management of SSA's Data Center OMB 300 ProgramProgram Management of SSA's Data Center OMB 300 Program
Program Management of SSA's Data Center OMB 300 Program
 
Data Analytics of Strategic Information Technology Asset Reviews
Data Analytics of Strategic Information Technology Asset ReviewsData Analytics of Strategic Information Technology Asset Reviews
Data Analytics of Strategic Information Technology Asset Reviews
 
ElogPoct: A Tool for Lipophilicity Determination in Drug Discovery
ElogPoct: A Tool for Lipophilicity Determination in Drug DiscoveryElogPoct: A Tool for Lipophilicity Determination in Drug Discovery
ElogPoct: A Tool for Lipophilicity Determination in Drug Discovery
 
Automating pKa Curve Fitting Using Origin
Automating pKa Curve Fitting Using OriginAutomating pKa Curve Fitting Using Origin
Automating pKa Curve Fitting Using Origin
 
Physicochemical Profiling In Drug Research
Physicochemical Profiling In Drug ResearchPhysicochemical Profiling In Drug Research
Physicochemical Profiling In Drug Research
 
Automated Kinetic Solubility Assay Apparatus and Method
Automated Kinetic Solubility Assay Apparatus and MethodAutomated Kinetic Solubility Assay Apparatus and Method
Automated Kinetic Solubility Assay Apparatus and Method
 
Advanced Excel Technologies In Early Development Applications
Advanced Excel Technologies In Early Development ApplicationsAdvanced Excel Technologies In Early Development Applications
Advanced Excel Technologies In Early Development Applications
 
Development of Pfizer's Third Generation Turbidimetric Solubility Assay - An ...
Development of Pfizer's Third Generation Turbidimetric Solubility Assay - An ...Development of Pfizer's Third Generation Turbidimetric Solubility Assay - An ...
Development of Pfizer's Third Generation Turbidimetric Solubility Assay - An ...
 
Bridging Pharma And IT 2008
Bridging Pharma And IT 2008Bridging Pharma And IT 2008
Bridging Pharma And IT 2008
 

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 

Recently uploaded (20)

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 

Lies, Damn Lies, and Big Data

  • 1. Lies, Damn Lies, and Big Data Applications, Limitations, Misconceptions Brian Bissett Senior Member Institute of Electrical and Electronics Engineers (IEEE) 5/23/20151
  • 2. 5/23/20152 Overview What is Big Data Common Attributes of Big Data Challenges of Working with Big Data Validity Space Outliers Variance Correlation and Causality Summary
  • 3. What is “Big Data”? Depends who you ask. . . . Gartner – define by the “three Vs”: Volume, Velocity and Variety. Oracle - the derivation of value from traditional relational database-driven business decision making, augmented with new sources of unstructured data. Intel – the generation of a median of 300 terabytes of data a week. 3 5/23/2015
  • 4. What is “Big Data”? Microsoft - the process of applying serious computing power—the latest in machine learning and artificial intelligence—to seriously massive and often highly complex sets of information. The Method for an Integrated Knowledge Environment (MIKE) project argues that big data is not a function of the size but of complexity. (A high degree of permutations and interactions within a data set defines big data.) National Institute of Standards and Technology (NIST) - big data “exceed(s) the capacity or capability of current methods and systems.” 4 5/23/2015
  • 5. The Current 8 V’s of Big Data Volume Velocity Variety Value – is this worth something to someone? Validity – is this correct? Viability – can this stand independently? Variability – is the same result reported consistently? Verifiability – do we know where this came from? 5 5/23/2015
  • 6. The 5 P’s for Biomedical Big Data Evidence Based, Outcome Driven, and Affordable Health Care will Require the Five P’s: Predictive Precise Preventive Personalized Patient-Centric The Cancer Genome Atlas (TCGA) 6 5/23/2015
  • 7. Challenges of Dealing with Big Data Management – In 10 Years at Zettabyte Levels! Infrastructure Performance Analytics – TBD. Unstructured – Lacks any Meaningful Standards. Data Visualization – Humans see in 3D Only. Navigation – Siloed Data is Difficult to Access. Missing Data – Average of 30% from HIT Data. Incorrect Data – Average of 25% - 30%. 7 5/23/2015
  • 8. The Three C’s (Challenges) Collection – is it worth saving? – Value = Actionable Consolidation – Clean it up! "Not Collected Here" Consumption – Easy to Add Processors – Difficult to move Data. 8 5/23/2015
  • 9. Transactions: Real Time & Queued Real Time – must be done ASAP – Retail: Credit Card Transactions – Security: Is Passenger on the “no fly list” – NICS Checks for Firearms Purchases – Stock Purchases Queued – Everything else that can wait – Traffic Data, process images from Traffic Cameras to determine speed and volume. – Daily Customer Counts – Daily or Monthly Volume for Stock Transactions 9 5/23/2015
  • 10. When are the Conclusions Drawn from Big Data Most Accurate? Big Data is most reliable when working in Two and sometimes Three Dimensional Matrices. Where the Assumption to be derived is Boolean. Where the Data Acquired is known to be of Good Quality. Example: Traffic Data at Checkpoint – Record: Number of Cars, Time, Maybe Speed – Derive: Is Traffic Flowing without Delay? 10 5/23/2015
  • 11. Big Data = Big Problems More Excess Data as Compared to Real Signals = More Spurious Relationships. 11 5/23/2015 Source: N.N. Taleb
  • 12. Outliers: Goldmine or Nuisance An Outlier can either be a Goldmine (the needle in the haystack sought) or a Nuisance (an artifact to be ignored) Example: Lipinski’s Rule of 5 (Ro5) 16% of oral drugs violate at least one of the criteria, and 6% fail two or more. Billion Dollar Drugs that have failed the Ro5 criteria: Lipitor, Singulair 12 5/23/2015
  • 13. Outliers: Goldmine or Nuisance Example: Nuisance Outlier The speed of the Motorcycle in no way reflects the true speed of the Traffic. 13 5/23/2015 No rigid mathematical definition exists of what constitutes an outlier, or when an Outlier may be omitted from an analysis. Mahalanobis Distance - distance between data point and a multivariate space's centroid (overall mean). (Commonly used in Linear Regression)
  • 14. Outliers – Bonedigger and Milo Bonedigger the lion and Milo the sausage dog are inseparable. The friendship between an 11-pound wiener dog and a 500-pound lion is the only one ever seen in the world. 14 5/23/2015
  • 15. Melanoma Example Dealing with Variance Impossible to Positively Discern without Biopsy 15 5/23/2015
  • 16. Melanoma ~ 80% Diagnostic Rate with Current Image Algorithms Because Melanoma can present in all Colors, Shapes, Granularities, and Textures; More Data is unlikely to improve Current Diagnostic Image algorithms. Sensitivity – Rule out Condition when Negative = true positives/(true positives + false negatives) 80% Sensitive Test will Detect 8 out of 10 Cancers. Specificity – Rule in Condition when Positive =true negatives/(true negatives + false positives) 95% Specific Test -> False Positive rate of 5% Sensitivity and specificity are inversely proportional16 5/23/2015
  • 17. Variance – The Batch Effect High-throughput technologies. Batch Effects when measurements are affected by laboratory conditions, reagent lots, and personnel differences. Pharmaceutical Mergers - Particularly troubling when merging data sets from different labs. Normalization for Batch Effects is extremely difficult. “What level is your pain on a scale from 1 to 10?” 17 5/23/2015
  • 18. Qualitative Variance Massachusetts General Hospital Harvard Medical School investigated discrepancy rates for the interpretation of Radiology Films. 60 examinations - 30 previously interpreted by themselves and 30 interpreted by their peers. Interobserver Disagreement Rate = 26%. Intraobserver Disagreement Rate = 32%. Radiologists agreed with other Radiologists more than themselves. 18 5/23/2015
  • 19. Correlation vs. Causation Correlation is easy to prove. How much of a Correlation is Easy to Prove. R2 = 1.0 – Perfect Correlation. R2 = 0.0 – No Correlation. Causation is nearly Impossible to Prove. US Spending on Science, Space, and Technology correlates Nearly Perfectly (R2 = 0.99208) with Suicides by Hanging, Strangulation and Suffocation. 19 5/23/2015
  • 20. Bradford Hill Causality Proof Strong – Five or Ten Fold Increase Consistent – Populations or Time does not Effect Specific – A Link (a location, mechanism, etc.) Temporal - Association Increases with Duration Gradient - Association Increases with Exposure Plausible – Association Easily Seen Coherent – Experimental Evidence Supports Similar Behavior in Analogous Situations 20 5/23/2015
  • 21. Big Data Governance Does not Exist No laws exist to address the utilization of big data. Concerns about citizen privacy and business liability have yet to be addressed. Critical Challenge to the Federal Government. Federal Agencies that Utilize Big Data do so on an ad-hoc basis. Little guidance exists on using petabyte sizes of private citizen data for predictive analytics. – Privacy Act of 1974 and HIPAA 1996. 21 5/23/2015
  • 23. Big Data = Observational Study Data is not Collected to Examine a Specific Problem using a Protocol. The Treatment Group and the Control Group are outside the control of the Investigator. Groups Differing in Outcome are identified and compared on the basis of a supposed causal attribute. Longitudinal - repeated observations of the same variables over long periods of time. 23 5/23/2015
  • 24. Summary The World is Accumulating a Lot of Data. Nobody Agrees on What “Big” is. On Average, 30% of the Data is Incorrect. On Average, 30% of the Data is Missing. Correlation is the Easy Part. Bradford Hill gives Guidance on Proving Causation. There is a Hierarchy of Evidence and Expert Opinion and Big Data are at the bottom of it. 24 5/23/2015
  • 25. Selected Publications Automated Data Analysis with Excel – Softcover: 442 Pages – Chapman & Hall (June 2007) – Second Edition Coming in 2016 – ISBN: 1-58488-885-7 Practical Pharmaceutical Laboratory Automation – Hardcover: 464 pages – Publisher: CRC Press (May 2003) – ISBN: 0849318149 25 5/23/2015
  • 26. References Bickerton GR, Paolini GV, Besnard J, Muresan S, Hopkins AL. Quantifying the chemical beauty of drugs. Nat Chem. 2012;4:90–98. doi: 10.1038/nchem.1243. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3524573/ The Big Data Conundrum: How to Define It? http://www.technologyreview.com/view/519851/the- big-data-conundrum-how-to-define-it/ Abujudeh, HH, Boland, GW, Kaewalai, R, et al. Abdominal and Pelvic Computed Tomography (CT) Interpretation: discrepancy rates among experienced radiologists. Eur Radiol.2010;20(8): 1952-7. Maryam Ramezani, Alireza Karimian, and Payman Moallem. Automatic Detection of Malignant Melanoma using Macroscopic Images. J Med Signals Sens. 2014 Oct-Dec; 4(4): 281–290. PMCID: PMC4236807 26 5/23/2015

Editor's Notes

  1. 1:55 Lies, Damn Lies, and Big Data: How to Best Utilize Data to Drive Decisions Brian Bissett, Senior Member, Institute of Electrical and Electronics Engineers Big Data is hailed as the solution to many problems in industry. In many respects this is a fallacy because it only takes a small amount of erroneous data to corrupt the usefulness of a large dataset. While Big Data can be extremely useful in predicting patterns for the masses such as traffic patterns and peak usage hours for a utility, its usefulness begins to diminish in situations where quality is more important than quantity. In addition, the underlying assumption of Big Data that the behavior of the masses is the correct course of action is not always true. The audience will gain an appreciation for how to best utilize data to drive decisions. Common fallacies will be addressed including the notion that Big Data sets are always superior to smaller data sets. The limitations of big data sets, the importance of quality data, effective display of quantitative information, boundary conditions, and the evaluation of quantitative and qualitative factors will all be discussed.
  2. Variability – Go to Doctor – What is your level of Pain from 1 to 10.
  3. 1 ZB = 1 Trillion TB or 10^21 bytes How do you store it? How do you Visualize on more than 3 Parameters? How do you effectively query on something with so many parameters?
  4. Lanesplitting.
  5. interobserver (between two different radiologists) intraobserver (disagreeing with one’s self) Pathologists are worse 40% discrepancy rate with each other.