The Pros and Cons of Big Data in an ePatient World


Published on

PYA Principal Dr. Kent Bottles, who is also PYA Analytics’ Chief Medical Officer, presented “The Pros and Cons of Big Data in an ePatient World” at the ePatient Connections 2013 conference.

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The Pros and Cons of Big Data in an ePatient World

  1. 1. The Pros and Cons of Big Data in an ePatient World Kent Bottles, MD Chief Medical Officer, PYA Analytics ePatient Connections/2013 Philadelphia, Pennsylvania September 16, 2013
  2. 2. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Big data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value, in ways that change markets, organizations, the relationship between citizens and governments. • Causality is replaced by correlation • Not knowing why but only what
  3. 3. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Statistics allows richest findings using the smallest amount of data • Randomness trumped sample size • 2007 300 exabytes of stored data • 2013 1,200 exabytes of stored data • 2013 only 2% is non-digital
  4. 4. Sizing Up Big Data Steve Lohr, NY Times, June 20, 2013 • Bundle of technologies – Web pages, browsing habits, sensor signals, social media, GPS location data, genomic information, surveillance videos – Advances in data storage and processing – Machine learning/AI software to find actionable correlations from the big data
  5. 5. Sizing Up Big Data Steve Lohr, NY Times, June 20, 2013 • Philosophy about how decisions should be made – Decisions based on data and analysis – Less based on experience and gut intuition – Eliminates anchoring bias and confirmation bias • Revolution in measurement – Digital equivalent of the telescope – Digital equivalent of the microscope
  6. 6. Big Data WSJ March 11, 2013 • 1950s 600 megabytes (John Hancock) • 1960s 807 megabytes (AA Sabre) • 1970s 80 gigabytes (Fed Express Cosmos) • 1980s 450 gigabytes (CitiCorp NAIB) • 1990s 180 terabytes (WalMart) • 2000s 25 petabytes (Google) • 2010s 100 petabytes (Facebook)
  7. 7. Big Data WSJ March 11, 2013 • 1 Bit = Binary Digit • 8 Bits = 1 Byte • 1000 Bytes = 1 Kilobyte • 1000 Kilobytes = 1 Megabyte • 1000 Megabytes = 1 Gigabyte • 1000 Gigabytes = 1 Terabyte • 1000 Terabytes = 1 Petabyte • 1000 Petabytes = 1 Exabyte • 1000 Exabytes = 1 Zettabyte
  8. 8. Jeffrey Hammerbacher • All industries are being disrupted – Moneyball, 538, Large Hadron Collider • McKinsley: Big Data: The Next Frontier for Competition – $338 billion potential annual value to US healthcare – $165 billion in clinical operations – $105 billion in research and development
  9. 9. Jeffrey Hammerbacher • Oracle: From Overload to Impact – Healthcare executives say collecting & managing more business information today than 2 years ago – Average increase 85% per year • Frost & Sullivan: US Hospital Health Data Analytics Market – 2011 10% of US hospitals use data analytic tools – 2016 50% of US hospitals will use data analytic tools
  10. 10. Jeffrey Hammerbacher on Moneyball • Triple Crown in MLB: Batting average, RBI, HR • OPS (on base plus slugging) • GPA (gross production average) • TOB (times on base) • The outcome is how many runs we score and allow; A’s have Matt Stairs; Need stat that reflects both runs produced at bat & runs saved by defense • WAR (“Wins above replacement”)
  11. 11. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • To analyze & understand the world we used to test hypotheses driven by theories • Big data discards theories & causality for correlations • University of Ontario premature baby studies • 1,260 data points per second • Diagnose infections 24 hours before apparent • Very constant vital signs indicate impending infection
  12. 12. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Google Nature article predicts flu spread in USA • Compared 50 million search terms with CDC data on spread of flu from 2003 to 2008 • 450 million different mathematical models • 45 search terms had strong correlation with spread of flu • H1N1 crisis in 2009 Google approach worked
  13. 13. New Tools to Combat Epidemics Amy O’Leary, NY Times, June 20, 2013 • Google Flu overestimates spread of flu in 2013 • Goggle Flu does not track new diseases • BioMosaic – Combines airline records, disease reports, demographic data – Website and iPad app – Showed 5 counties in Florida, 5 counties in NY were most at risk from cholera epidemic in Haiti in 2010
  14. 14. New York City’s Office of Policy & Strategic Planning • 1 terabyte of data flows into office every day • 95% success rate in identifying restaurants dumping cooking oil into sewers • Doubled the hit rate of finding stores selling bootleg cigarettes • Sped removal of trees toppled by Sandy • Guided building inspectors to increase citation rate from 13 to 80% for buildings likely to have catastrophic house fires
  15. 15. Algorithms Mine Public Data • Atul Butte combined data from 130 studies of gene activity levels in diabetic & healthy tissue • Butte identified new gene associate with Type 2 DM because stood out in 78/130 studies • Algorithm looking for drugs & diseases that had opposing effects on gene expression – Cimetidine for lung adenocarcinomas – Topiramate for Chrohn’s Disease
  16. 16. Algorithms Mine Public Data • Russ Altman used algorithms to mine Stanford Translational Research Integrated Database Environment & FDA adverse event reports database • Patients taking SSRI antidepressants and thiazide are at increased risk for long QT syndrome, a serious cardiac arrhythmia
  17. 17. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • GPS allows us to establish location quickly, cheaply, and without requiring specialized knowledge • UPS uses geo-loc data from sensors, wireless modules, and GPS on vehicles • 2011 UPS shaved 30 million miles off routes, saved 3 million gallons of fuel, and 30,000 metric tons of carbon dioxide emissions
  18. 18. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Datafication of acts of living • Zeo large database of sleep patterns • Asthmapolis sensor to inhaler that tracks location via GPS identifies environmental triggers • Fitbit and Jawbone • iTrem monitors Parkinson’s tremors almost as well as the tri-axial accelerometer used in specialized office medical equipment
  19. 19. Big Data for Cancer Care Ron Winslow, WSJ, March 27, 2013 • ASCO • Database of hundreds of thousands of patients • Prototype has collected 100,000 breast cancer patients from 27 groups who have different EMRs • “Recognition that big data is imperative for the future of medicine” Lynn Etheredge • Less than 5% of adult cancer patients participate in randomized clinical trials
  20. 20. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Recombinant data • Danish Cancer Society study on cell phone/cancer • Cellphone users from 1987 to 1995 (358,403) • Brain cancer patients (10,729) • Registry of education and disposable income • Combining the three databases found no increase in risk of cancer for those who used cell phones • Not based on sample size; based on N=all
  21. 21. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Multiple uses of same database • Data exhaust: digital trail people leave in their wake • Google spell-checking system uses bad data to improve search, autocomplete feature in Gmail, Google Docs, and translation system
  22. 22. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Paralyzing privacy – Notice and consent – Cannot give informed consent for secondary uses – Anonymization does not work • AOL 2006 20 million search queries from 657,000 users: NY Times identified user number 4417749 as Thelma Arnold (“My goodness, it’s my whole personal life. I had no idea somebody was looking over my shoulder”) • Netflix Prize 100 million rental records from 500,000 users. Mother and closeted lesbian in Midwest was reidentified
  23. 23. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Probability and punishment – Minority Report: People are imprisoned not for what they did, but for what they are foreseen to do, even though they never actually commit the crime – Blue CRUSH (Crime Reduction, Utilizing Statistical History in Memphis, Tennessee) – Homeland Security FAST (Future Attribute Screening Technology) – Big data based on correlation unsuitable tool to judge causality and thus assign individual culpability
  24. 24. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Dictatorship of Data – Relying on numbers when they are far more fallible than we think – Robert McNamara’s body count numbers in Vietnam – Michael Eisen tried to buy The Making of a Fly on Amazon in April 2011. Two established sellers offering the book for $1,730,045 and $2,198,177. Two week escalation to a peak of $23,698,655.93 on April 18 – Unsupervised algorithms priced the books for the two sellers.
  25. 25. Big Data Viktor Mayer-Schonberger & Kenneth Cukier, 2013 • Regulatory shift from “privacy by consent” to “privacy through accountability” • “Differential privacy” through deliberately blurring the data so hard to reidentify people • Openness, certification, disprovability • Algorithmists to perform “audits”
  26. 26. What Big Data Can’t Do David Brooks, NY Times, February 26, 2013 • Data struggles with the social • Data struggles with context • Data creates bigger haystacks (spurious correlations that are statistically significant) • Data has trouble with big problems • Data favors memes over masterpieces • Data obscures values
  27. 27. What Big Data Will Never Explain • “To datafy a phenomenon,” they explain, “is to put it in a quantified format so it can be tabulated and analyzed.” • Sentiment analysis mathematical model for grief called Good Grief Algorithm • “The mathematization of subjectivity will founder upon the resplendent fact that we are ambiguous beings. We frequently have mixed feelings, and are divided against ourselves.”
  28. 28. The Hidden Biases of Big Data • Big Data vs. Data with Depth • “With enough data, the numbers speak for themselves.” Chris Anderson • Can numbers actually speak for themselves? Sadly, they can't. Data and data sets are not objective; they are creations of human design. We give numbers their voice, draw inferences from them, and define their meaning through our interpretations. • Hidden biases in both the collection and analysis stages
  29. 29. The Hidden Biases of Big Data • Google Flu Trends vs. CDC – 11% vs. 6% of US population infected – Media coverage affected Google Flu Trends • Boston’s StreetBump smartphone app – 20,000 potholes a year need to be patched – Poor areas have less cell phones, less service • Hurricane Sandy 20 million tweets + 4square – Grocery shopping day before – Night life peaked day after – Illusion Manhattan was hub of disaster
  30. 30. Automate This Christopher Steiner, 2012 • Dr. Bot – Always be convenient and available – Know all your strengths and weaknesses – Know every risk factor past conditions might signal – Know your complete medical history – Know medical history of last 3 generations of family – Never make careless mistake in prescription
  31. 31. Automate This Christopher Steiner, 2012 • Dr. Bot – Always be up-to-date on treatments and discoveries – Never fall into bad habits or ruts – Monitor you at all times – Always be searching for the hint of a problem by monitoring pulse, cholesterol, blood pressure, weight, lung capacity, bone density, changes in the air you expel
  32. 32. Computers Are Just Not That Smart • Eric Horvitz, MD of Microsoft • Medical kiosk avatar interview mother & child with diarrhea • Avatar decides child does not need to go to ER • Avatar makes appointment with clinic • The moderator of AI panel thought the avatar was much more compassionate than the human triage nurses she has encountered in NYC ERs
  33. 33. Vinod Khosla (Sun Microsystems) • Being part of the health care system is a disadvantage to disrupting the status quo • Machine learning system will be cheaper, more accurate, and more objective than physicians • Machine expertise would need to be in the 80th percentile of human physician expertise
  34. 34. Vinod Khosla (Sun Microsystems) • Do we need doctors or algorithms • “Health is like witchcraft and just based on tradition” • 80% of physicians will be replaced by machines • 80% of doctors are below the top 20% • We will not need average doctors • Still need “doctors like Gregory House who solve biomedical puzzles beyond our best input ability”
  35. 35. Will Robots Steal Your Job? ml • “At this moment, there's someone training for your job. He may not be as smart as you are—in fact, he could be quite stupid—but what he lacks in intelligence he makes up for in drive, reliability, consistency, and price. He's willing to work for longer hours, and he's capable of doing better work, at a much lower wage. He doesn't ask for health or retirement benefits, he doesn't take sick days, and he doesn't goof off when he's on the clock. What's more, he keeps getting better at his job.”
  36. 36. How Robots Will Replace Doctors doctors/2011/08/25/gIQASA17AL_blog.html • “We’re not sitting in that room wrapped in a garment made of the finest recycled sandpaper because we were hoping for a good conversation. We’re there because we’re sick…, and we’re hoping this arrogant, hurried, credentialed genius can tell us what’s wrong. We go to doctors not because they’re great empaths, but because we’re hoping medical school has made them into the closest thing the human race has developed into robots.”