Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei

122 views

Published on

Presented at Women in Data Science Taipei, Mar 31, 2019 https://www.widstaipei.org/conference-2019

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei

  1. 1. Behavioral Big Data & Healthcare Research WiDS Taipei, 31 March 2019 Galit Shmueli 徐茉莉 Institute of Service Science Behavioral Big Data Researcher Human Subjects Research Question In memory of Prof Aya Cohen 1940-2019
  2. 2. 1994-2000 (MSc + PhD, Statistics) Israel Institute of Technology Faculty of IE & M 2000-2002 Carnegie Mellon University Department of Statistics 2002-2012 University of Maryland Smith School of Business 2011-2014 Indian School of Business Hyderabad, India 2014-… National Tsing Hua University Institute of Service Science My Academic Path My Research ‘Entrepreneurial’ statistical & data mining modeling Interdisciplinary Statistical Strategy • To Explain or To Predict? • Information Quality • Data Mining for Causality • Predicting with Causal Models • Behavioral Big Data 1991-1994 (BA, Statistics & Psychology) University of Haifa, Israel
  3. 3. What is Behavioral Big Data (BBD) Special type of Big Data • Behavioral: people’s measurable “everyday” behavior, interactions, self- reported opinions, thoughts, feelings • Human and social aspects: Intentions, deception, emotion, reciprocation, herding,… When aware of data collection -> modified behavior (legal risks, embarrassment, unwanted solicitation)
  4. 4. BBD vs. Inanimate Big Data Behavioral Big Data Researcher Human Subjects Research Question Inanimate Big Data Researcher Research Question 1. Aware, ongoing interaction with the BBD - “contaminate” BBD with intention, deception, emotion, herding… 2. Can be harmed by BBD
  5. 5. Figure 1: The types of physiological data points and the wearable sensors under development or on the market to monitor them. Elenko, Underwood & Zohar (2015), “Defining Digital Medicine”, Nature Biotechnology 33, 456-461 Physiological Big Data Human Subjects
  6. 6. BBD vs. Physio Big Data • Individual bodies • Physical measurements • Medical systems set data collection timing • Clinical trials: awareness & vested interest • Collection of connected people • Measurable behaviors: actions, interactions, self-reported feelings, opinions, thoughts • User chooses data generation content & timing • Experiments: users unaware; not always in user’s best interest Different research methods in life sciences and behavioral sciences • Measurement instruments • Models (latent variable models, social network analysis) • Human subjects risks
  7. 7. Getting Closer
  8. 8. “The main products of the 21st century economy will not be textiles, vehicles, and weapons but bodies, brains, and minds” https://www.ynharari.com/homo-deus-impact-digitalization-society/ “If you wear biometric sensors (such as a Fitbit band) and these sensors are connected to the computer, the computer will know exactly what your heart rate, blood pressure and adrenalin level are, and based on this information, it can identify your emotional state better than any human psychologist” https://www.yediot.co.il/articles/0,7340,L-4948868,00.html Physiological data translated to BBD
  9. 9. He’s part of a small but growing group of people who are wearing CGMs to track—and then hack—what goes on in their own bodies. Physiological data collection turns into BBD
  10. 10. BBD in Healthcare Research
  11. 11. Landscape Players Value
  12. 12. Landscape of health-related BBD
  13. 13. Data from a typical hospital, about… Patients Personal info Medical history (visits, tests, medication, hospitalization...) Scheduled events, billing Physicians Scheduled + actual appointments, procedures, prescriptions,… Entries of patient info/data Nurses Location, work hours,… Pharmacy staff Speed of service Quality of service Lab staff Speed of service Quality of service Other staff Finance/accounting Cleaning Receptionists Volunteers Food court! Data Collection Technologies: • Medical devices • HIT systems (EHR, HR for Health Info System) • WiFi --- Smart Hospital • Cameras • Sensors • GPS • IoT
  14. 14. Interactions between Patients – doctors/nurses Doctors – other doctors Patients – other patients Patient family – hospital staff Patients – social network ”friends” ... New data #1: Recorded Interactions
  15. 15. Chiu, C. C., Tripathi, A., Chou, K., Co, C., Jaitly, N., Jaunzeikare, D., ... & Tansuwan, J. (2017). Speech recognition for medical conversations. arXiv preprint arXiv:1711.07274. Data: • 90, 000 conversations between doctors and patients during clinical visits. • 151 types of medical visits for different purposes • Each conversation is typically between a single doctor and a patient, sometimes also including a nurse, or family member.
  16. 16. Telemedicine / Telehealth Remote Patient Monitoring mHealth/ eHealth New data #2: smart hospital remote medical services
  17. 17. Mobile health apps and wearable devices that use artificial intelligence to help diagnose or even treat medical conditions pose a new regulatory challenge for the U.S. Food and Drug Administration This comes at a time when medical devices have evolved from fairly self-contained gadgets into implants and wearables that communicate wirelessly with medical software on separate computers or in the cloud. The definition of medical device has also stretched as smartphone apps and online services—often backed by machine- learning algorithms—promise to deliver medical diagnoses that once would have required a visit to a doctor's office and specialized lab equipment.
  18. 18. This is where it becomes ethically challenging: Who’s collecting the data and for what purpose? Are users aware of the data collection and usage? What are users’ benefits & risks from sharing their data?
  19. 19. New data #3: Health-related online behavior
  20. 20. Health-related BBD: Online • Medical/health websites • Online forums • Social networks • Search engines Data voluntarily entered by users: personal details, photos, comments, messages, search terms, likes, payment information, connections with “friends” Passive footprints: duration on the website, pages browsed, sequence, referring website, Internet browser, operating system, location, IP address
  21. 21. New data #4: Health-related behavior self-logged on Apps Every day, women manually log around 1.4 M new data points including cycle history, ovulation and pregnancy tests results, age, height, weight, lifestyle statistics about sleep, activity, and nutrition. In addition, more data comes from wearable devices like Fitbit & Apple Watch. Data voluntarily entered by users: health condition, symptoms, behaviors (eating, exercise, sleep, sex, parking, feelings…) Passive footprints: app log times, pages browsed, sequence, location…
  22. 22. Flo became the most downloaded app worldwide in its category within months after introducing neural networks to its prediction algorithm. In addition to logging a menstruation and health diary, users can join a number of different themed groups including weight loss, clothing, fitness, relationships, and travel. These groups look and work much like “message board”-style social network To date, Meet You has reportedly accumulated two million daily active users, 1.2 million daily active users of its social network, and over 800,000 daily posts.
  23. 23. Sea Hero Quest, a mobile app that measures spatial navigation ability. Credit: Hugo Spiers et al. Since its launch in May 2016, some 2.5 million people have played Sea Hero Quest Health-related BBD: Gaming
  24. 24. New data #5: Health-related behavior from IoT
  25. 25. New data #6: Health-“unrelated” (implicit) behaviors
  26. 26. “Some hospitals are collecting new information from patients directly, while others have sought data from companies that sell consumer and financial information, or federal agencies that provide statistics on poverty, housing density and unemployment.” The big obstacle: access to the data. Doctors and nurses have limited time to collect new data and patients bombarded with questions about their lives may suffer “interview fatigue”
  27. 27. This is where it becomes ethically challenging: Who’s collecting the data and for what purpose? Are users aware of the data collection and usage? What are users’ benefits & risks from sharing their data?
  28. 28. Quantified self devices also collect…
  29. 29. Subjects went home with an app that measured the ways they touched their phone’s display (swipes, taps, and keyboard typing) Before starting Mindstrong, Paul Dagum, its founder and CEO, paid for two Bay Area–based studies to figure out whether there might be a systemic measure of cognitive ability—or disability—hidden in how we use our phones. 150 research subjects came into a clinic and underwent a standardized neurocognitive assessment memory problems… can be spotted by looking at things including how rapidly you type and what errors you make (such as how frequently you delete characters), as well as by how fast you scroll down a list of contacts.
  30. 30. “thousands of people are using the app, and the company now has five years of clinical study data to confirm its science and technology.” PRIVACY: “while Mindstrong says it protects users’ data, collecting such data at all could be a scary prospect for many of the people it aims to help. Companies may be interested in, say, including it as part of an employee wellness plan, but most of us wouldn’t want our employers anywhere near our mental health data”
  31. 31. Microsoft Xbox 360 comes with a microphone, a camera and technology that recognizes a user's voice and face • sign in and sign off • games you played • game-score statistics • Xbox console hardware & operating performance data • manufacturing codes from game discs • network performance data • data that indicates the quality of the Xbox service to prevent cheating • IP address • operating system • Xbox Live software version to improve your experience • Bing search terms • samples of voice commands to perform search • what you watched on Xbox One’s TV service • music & videos you watched or listened to using Xbox Live At home/school/work
  32. 32. At work
  33. 33. provide a ride-hailing platform available specifically to healthcare providers, letting clinics, hospitals, rehab centers and more easily assign rides for their patients and clients from a centralized dashboard – without requiring that the rider even have the Uber app, or a smartphone. Uber Health’s creation was rooted in some alarming statistics about patient care and healthcare client absentee rates.
  34. 34. Researchers Using Health BBD
  35. 35. Research Fields using Health BBD Operations Researchers and Industrial Engineers For: Hospital Management and Operations (staffing, scheduling,…) Medical/Healthcare Researchers & Clinicians For: Improved Medical Treatment (safety, effectiveness,…) Information Systems Researchers For: Improved Design & Use of Medical IS (value of IS, effectiveness, standardization,…) Marketing Advertising Insurance Machine Learning Social science
  36. 36. How Do Researchers Get Health BBD? 1. Open/Publicly Available Data Constantly refreshed or single data dump API, web scraping Hacked data 2. Partner with Company/Organization • Both parties interested in research question • Data purchase • Personal connections, sabbaticals, internships • Partnership between school and organization • Third party (WCAI) 3. Crowdsourcing 4. China (!)
  37. 37. Research Using New Health BBD: Challenges Behavioral Big Data Researcher Human Subjects Research Question Scientific vs. Clinical vs. Commercial Explain vs. Predict Different (conflicting) Goals: Unit of analysis vs. Unit of measurement Under/over- coverage New risks (privacy, liability, security, HIPAA compliance) New ethical challenges: Generalization Challenges: Acquire + analyze data Users (self-selection, spill-over, knowledge of allocation, network) Company algorithms Average effect vs. individual effect Data contaminated by:New modes of connection & information (social networks, forums, IoT, Apps) ATE vs. Individual Technical expertise larger distance Old Q, new data: Operationalize new variables New Q: Lack of literature
  38. 38. Value
  39. 39. Two examples of high-profile studies using new health BBD Emotional contagion in social networks Kramer et al. (PNAS, 2014) Detecting influenza epidemics using search engine query data Ginsberg et al. (Nature, 2009)
  40. 40. Example #1
  41. 41. • No Ethics Board Review (IRB) “[The work] was consistent with Facebook’s Data Use Policy, to which all users agree prior to creating an account on Facebook, constituting informed consent for this research.” • PNAS editorial Expression of Concern • Varied response from public, academia, press, ethicists, corporates Where do Data Scientists get Ethics Training?
  42. 42. New Q: Lack of literature Behavioral Big Data Explain vs. Predict Different (conflicting) Goals: Unit of analysis vs. Unit of measurement Under/over- coverage Generalization Challenges: Acquire + analyze data Users (self-selection, spill-over, knowledge of allocation, network) Company algorithms Average effect vs. individual effect Data contaminated by: ATE vs. Individual Technical expertise Old Q, new data: Operationalize new variables Scientific vs. Clinical vs. Commercial Researcher Human Subjects New risks (privacy, liability, security, HIPAA compliance) New ethical challenges: New modes of connection & information (social networks, forums, IoT) Research Question
  43. 43. Example #2 • “Up-to-date influenza estimates may enable public health officials and health professional to better respond to seasonal epidemics” • BBD: automated search results for 50M keywords on Google.com (2003-2007). For each query: {query text, IP address} • Fit 450M different models, correlating each query text with CDC data; Combined 45 queries with highest correlation
  44. 44. Researchers: epidemiologists + data science academics Dalton et al. (2016), “Flutracking weekly online community survey of influenza-like illness annual report, 2015” Communicable diseases intelligence quarterly report Challenge: Acquire data
  45. 45. • The algorithm detects “flu” or “winter”? • Persistent over-estimation • Performs worse than lagged CDC 3-week-old data • Never released 45 terms used • Lazer et al. recommend combining/ calibrating GFT with CDC data But most importantly…
  46. 46. Changes made by Google’s search algorithm to display potential diagnoses + recommend search for treatment (more advertising) -> increased search
  47. 47. This type of BBD research is still popular
  48. 48. New Q: Lack of literature Average effect vs. individual effect Human Subjects Under/over- coverage New risks (privacy, liability, security, HIPAA compliance) New ethical challenges: Users (self-selection, spill-over, knowledge of allocation, network) New modes of connection & information (social networks, forums, IoT) ATE vs. Individual Old Q, new data: Operationalize new variables Explain vs. Predict Scientific vs. Clinical vs. Commercial Different (conflicting) Goals: Unit of analysis vs. Unit of measurement Research Question Generalization Challenges: Acquire + analyze data Technical expertise Company algorithms Data contaminated by: Behavioral Big Data Researcher
  49. 49. Uses Google searches to measure sensitive behaviors/opinions/thoughts on racism, self-induced abortion, depression, child abuse, hateful mobs, the science of humor, sexual preference, anxiety, son preference, and sexual insecurity, among many other topics.
  50. 50. New Q: Lack of literature Old Q, new data: Operationalize new variables Research Question Scientific vs. Clinical vs. Commercial Explain vs. Predict Different (conflicting) Goals: Unit of analysis vs. Unit of measurement Under/over- coverage Generalization Challenges: Acquire + analyze data Users (self-selection, spill-over, knowledge of allocation, network) Company algorithms Average effect vs. individual effect Data contaminated by: ATE vs. Individual Technical expertise Let’s Discuss Data Privacy Researcher larger distance Human Subjects Behavioral Big Data New risks (privacy, liability, security, HIPAA compliance) New ethical challenges: New modes of connection & information (social networks, forums, IoT, Apps)
  51. 51. Data Privacy is a Big Issue Right Now Behavioral Big Data Researcher Human Subjects Research Question Scientific vs. Clinical vs. Commercial Explain vs. Predict Different (conflicting) Goals: Unit of analysis vs. Unit of measureme New risks (privacy, liability, security, HIPAA compliance) New ethical challenges: Generalization Cha Acquire + analyze data Users (self-se spill-over, kn allocation, ne Company alg Average effec Data contaNew modes of connection & information (social networks, forums, IoT) ATE vs. Individual Technical expertise
  52. 52. What we’ve learned… is that we need to take a more proactive role in a broader view of our responsibility. It’s not enough to just build tools, we need to make sure that they’re used for good
  53. 53. “patients as well as medical staff will be communicating in a non-private environment. It is very important to understand, monitor and control your own content for its privacy implications. More dangerous and needing control will be the reach of patient-to-patient identification and communication.”
  54. 54. Medical data privacy is typically regulated What about BBD? several Israeli hospitals have been conducting a pilot program which used AI software to assist in deciding whether patients should undergo surgery. However these patients were subjected to these tests without their knowledge. The software has been developed by a startup named MEDecide in Tel Aviv and used
  55. 55. Recent data privacy regulations is reshaping the collection & use of BBD
  56. 56. Using BBD for Research: Human Subjects Institutional Review Board (IRB) “ethics committee” University-level committee designated to approve, monitor, and review biomedical and behavioral research involving humans. Medical and behavioral researchers are aware of IRB. What about data science researchers?
  57. 57. The “Final Rule” (July 19, 2018): Update to the “Common Rule” New exemption category: Research involving “benign behavioral interventions” Exemption for secondary research using identifiable private information or identifiable biospecimens No review needed under certain circumstances: - publicly available data - participant cannot readily be identified - participant is regulated under HIPAA for purposes of “health-care operations,” “research,” or “public health activities”—but not where the investigator plans to report individual research results
  58. 58. • Am I respecting the rights of my data subjects? • Are my data pseudonymized? • Is my research “minimal risk?” • Do I have broad consent for secondary analysis? Greene, Shmueli, Ray, and Fell (2019) Adjusting to the GDPR: The Impact on Data Scientists and Behavioral Researchers
  59. 59. Health-”unrelated” behavior New healthcare BBD offers new research opportunities Health-related behavior
  60. 60. … and new challenges Behavioral Big Data Researcher Human Subjects Research Question Scientific vs. Clinical vs. Commercial Explain vs. Predict Different (conflicting) Goals: Unit of analysis vs. Unit of measurement Under/over- coverage New risks (privacy, liability, security, HIPAA compliance) New ethical challenges: Generalization Challenges: Acquire + analyze data Users (self-selection, spill-over, knowledge of allocation, network) Company algorithms Average effect vs. individual effect Data contaminated by:New modes of connection & information (social networks, forums, IoT, Apps) ATE vs. Individual Technical expertise New Q: Lack of literature Old Q, new data: Operationalize new variables
  61. 61. Anal yt ics Humanit y Responsibil it y Galit Shmueli 徐茉莉 Institute of Service Science Shmueli, G. (2017), Research Dilemmas With Behavioral Big Data, Big Data, vol 5 issue 2, pp. 98-119

×