SlideShare a Scribd company logo
1 of 47
Download to read offline
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
Data, Decisions, and MongoDB
Tuesday, November 29, 2016
© 2015 8 Path Solutions LLC. All Rights Reserved.
	
  Validating	
  an	
  Open	
  Society	
  	
  
Jennifer Shin
Founder, 8 Path Solutions
Senior Principal Data Scientist, Nielsen
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
●  Background
š Native New Yorker
š Undergraduate degree in Economics, Mathematics & Creative
Writing from Columbia University
š Graduate degree in Statistics from Columbia University
Introduction
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
●  Professional Experience
š Founder & Chief Data Scientist at 8 Path Solutions
š Senior Principal Data Scientist at Nielsen
š Management consultant at Fortune 100 companies
š Top Contributor for IBM Data Magazine
š Faculty in the MIDS Graduate Program at UC Berkeley
Introduction
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
●  Recent Talks & Presentations
š Institute of Computational & Experimental Research in Mathematics
(ICERM) at Brown University
š TDWI Accelerate 2016
š Data Dialogs Conference – UC Berkeley
š IBM World of Watson 2016
Introduction
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
š  Adverse Drug Reactions (ADR)
š  FDA Adverse Events Reporting System (FAERS)
š  openFDA API
š  openFDA + MongoDB
Today’s Talk
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
š Monitoring the safety of medicinal products
š Adverse Drug Reactions (ADR):
unwanted, uncomfortable, or dangerous effects that a drug may have
š In the US, 3 to 7% of all hospitalizations are due to ADR1
Pharmacovigilance
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA SOURCE
š  FDA Adverse Event Reporting System (FAERS)
●  A computerized information database designed to support the FDA's
post-marketing safety surveillance program for all approved drug &
therapeutic biologic products
●  Used to monitor for new adverse events and medication errors that
might occur with these marketed products
© 2015 8 Path Solutions LLC. All Rights Reserved.
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA SOURCE
© 2015 8 Path Solutions LLC. All Rights Reserved.
Option 1: Quarterly FAERS Data Files
○  Available each quarter from the FDA
○  Data from 2004 to 2012 available in ASCII/SGML
Data after 2012 available in ASCII/XML
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA CHALLENGES
© 2015 8 Path Solutions LLC. All Rights Reserved.
○  Requires downloading and consolidating quarterly reports in a
databases.
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA ISSUES
© 2015 8 Path Solutions LLC. All Rights Reserved.
○  Duplicate Reports
○  Spelling errors
○  Inaccurate information
○  One field for all drug names (e.g. Brand Name & Generic) and
active ingredients
○  Multiple drugs included in a single report
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
openFDA API
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
š  Beta launch - June 2014
š  New website
Food & Drug Administration openFDA API
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
š  Facilitate access and use of big important FDA public datasets by
developer, researchers, and the public through harmonization of
data across disparate FDA datasets provided via application
programming interfaces (APIs)
API OBJECTIVES
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
š  Drug adverse events: Reports of drug side effects, product use
errors, product quality problems, and therapeutic failures.
š  Drug product labeling: Structured product information, including
prescribing information, for approved drug products.
š  Drug recall enforcement reports: Drug product recall enforcement
reports.
API DATA
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
š  Drug product labeling: Structured product information, including
prescribing information, for approved drug products.
š  Drug recall enforcement reports: Drug product recall enforcement
reports.
API DATA
š  Drug adverse events: Reports of drug side effects, product use
errors, product quality problems, and therapeutic failures
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
API DATA
© 2015 8 Path Solutions LLC. All Rights Reserved.
○  Access to FAERS database using API calls
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
API DATA
3 ways to download data from openFDA.
š Download manually. 
There’s a downloads section on each endpoint’s openFDA.
š Write code to download the data automatically. 
Use a special API query (see below) to get a list of all the current
data files for each endpoint.
š Synchronize with the openFDA S3 bucket. 
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA ISSUE SOLUTION
Drug Name Harmonization Process
© 2015 8 Path Solutions LLC. All Rights Reserved.
○  Benefits
•  Harmonizes the FAERS data on drug identifiers using other
data sources, such as NDC & RxNorm
•  Separate data fields for brand names, generic names, and
active ingredients
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA ISSUE SOLUTION
Drug Name Harmonization Process
© 2015 8 Path Solutions LLC. All Rights Reserved.
○  Limitations
•  Cannot harmonize misspelled drug names
•  Validation process requires using FAERS data files
- Not necessarily easier
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
CASE STUDY: DROSPIRENONE
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
FAERS vs. OPENFDA
© 2015 8 Path Solutions LLC. All Rights Reserved.
FAERS Data Files
Data From 2004 Q1 to 2012 Q3
2 out of the 7 Reports:
DEMO, DRUG, REAC, RPSR, THER, OUTC, INDI
Consolidated using SQL Server 2012
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
FAERS vs. OPENFDA
© 2015 8 Path Solutions LLC. All Rights Reserved.
Drug Name Mapping
DRUGNAME	
  
YAZ	
  
DROSPIRENONE	
  W/ETHINYLESTRADIOL	
  (YAZ)	
  
YAZ	
  /06358701/	
  
YAZ	
  BAYER	
  HEALTHCARE	
  
YAZ	
  N/A	
  BAYER	
  HEALTHCARE	
  
YAZ	
  (24)	
  
YAZ	
  (DROSPIRENONE	
  +	
  ETHINYLESTRADIOL	
  20!G	
  (24+4)	
  [YAZ]	
  	
  
YAZ	
  (DROSPIRENONE/ESTRADIOL)	
  
YAZ	
  (ORAL	
  CONTRACEPTATIVE	
  NOS)	
  
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
FAERS vs. OPENFDA
© 2015 8 Path Solutions LLC. All Rights Reserved.
Data Fields
FAERS Data Files openFDA API
Brand Name
Yaz	
  
	
  
DRUGNAME	
  
	
  
pa.ent.drug.openfda.brand_name	
  	
  
	
  
Generic Name
Drospirenone	
  Ethinyl	
  
Estradiol	
  
	
  
DRUGNAME	
  
	
  
	
  
pa.ent.drug.openfda.generic_name	
  
Case Report
Identifier	
  
	
  
ISR	
  
	
  
safetyrepor.d	
  
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
FAERS vs. OPENFDA
© 2015 8 Path Solutions LLC. All Rights Reserved.
○  Inconsistent Query Data
•  Running the same query on 8/03/14 &
8/10/14 produced different results
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
FAERS vs. OPENFDA
© 2015 8 Path Solutions LLC. All Rights Reserved.
○  Inconsistent Query Data
•  No information as to the cause of these
changes could be found on the FDA website
•  According to the Github records, there were
no updates made between these two dates
•  For our brand name data analysis, the most
recent results were selected
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA ISSUES SOLUTION?
Drug Name Harmonization Process
© 2015 8 Path Solutions LLC. All Rights Reserved.
○  FAERS Data Files vs. openFDA API Query
•  Cannot harmonize misspelled drug names
•  Validation process requires using FAERS data files
- Not necessarily easier
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS 1:
BRAND NAME
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS 1: BRAND NAME
1
0
20
105
2,502
2,321
1,472
8,857
6,750
0
0
19
215
2,498
2,261
1,365
7,881
5,551
2004 2005 2006 2007 2008 2009 2010 2011 2012
openFDA FAERS
Comparing Reports for Yaz from Q1 2004 to Q3 2012
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS 1:
BRAND NAME
openFDA API
query results for
safetyreportid
“4990905-5”
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS 1:
BRAND NAME
DRUGNAME for ISR number “4990905” only includes
DROSPIRENONE AND ETHINYL ESTRADIOL
openFDA API
query results for
safetyreportid
“4990905-5”
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS II:
GENERIC NAME
QUERY	
   openFDA	
  
Results	
  
Initial Query
hOps://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20120930]+AND+	
  
pa^ent.drug.openfda.generic_name:”DROSPIRENONE+ETHINYL
+ESTRADIOL”&count=pa^ent.drug.openfda.brand_name	
  
	
  
Total:	
  	
  	
  	
  	
  714	
  
	
  
Yaz:	
  	
  	
  	
  	
  	
  	
  	
  107	
  
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS II:
GENERIC NAME
QUERY	
   openFDA	
  
Results	
  
Initial Query
hOps://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20120930]+AND+	
  
pa^ent.drug.openfda.generic_name:”DROSPIRENONE+ETHINYL
+ESTRADIOL”&count=pa^ent.drug.openfda.brand_name	
  
	
  
Total:	
  	
  	
  	
  	
  714	
  
	
  
Yaz:	
  	
  	
  	
  	
  	
  	
  	
  107	
  
Revised Query
hOps://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20120930]+AND
+pa^ent.drug.openfda.generic_name:"DROSPIRENONE”+AND
+pa^ent.drug.openfda.generic_name:"ETHINYL”+AND
+pa^ent.drug.openfda.generic_name:"ESTRADIOL”&count=pa^ent.drug.openfda.brand_na
me	
  
	
  
Total:	
  	
  31,051	
  
	
  
Yaz:	
  	
  	
  	
  	
  22,028	
  
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
Results
The drug harmonization process incorrectly associated reports for
Drospirenone Ethinyl Estradiol with the drug Yaz.
š Raises concerns about the drug harmonization process for Yaz as well
as other drugs
š Further study is needed to validate the accuracy of the openFDA data
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS II:
GENERIC NAME
The reported cases for Yaz from
the openFDA API and FAERS
Data Files varied widely when
compared based on the year of
the report.
š For 2006, the API only included
105 cases, which is 51% less
than the 215 cases in FAERS.
š For 2011, the API included
8,857 cases, which is 12.4%
more than the 7,881 cases in
FAERS.
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
Data, Trust, and Reproducibility
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
Which is better?
© 2015 8 Path Solutions LLC. All Rights Reserved.
@8PATHSOLUTIONS
š  Traditional methods vs. newer approaches
š  Data processing & data validation
š  Access via API vs. database 
š  Implications for pharmaceutical research, data science,
data technology & development
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
Data Dependence
š  Risk of relying on API data
EX: http://download.open.fda.gov/
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
openFDA Data
3 ways to download data from openFDA.
š Download manually. 
There’s a downloads section on each endpoint’s openFDA.
š Write code to download the data automatically. 
Use a special API query (see below) to get a list of all the current
data files for each endpoint.
š Synchronize with the openFDA S3 bucket. 
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
MongoDB
š Collecting query records
š Storing query results
š Setting up data environment
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
LIVE DEMO
© 2015 8 Path Solutions LLC. All Rights Reserved.
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
LIVE DEMO
© 2015 8 Path Solutions LLC. All Rights Reserved.
openFDA API website
https://open.fda.gov/index.html
FDA’S Example Query
https://open.fda.gov/api/reference/#example-query
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
LIVE DEMO
© 2015 8 Path Solutions LLC. All Rights Reserved.
FDA’S Example Query: https://open.fda.gov/api/reference/#example-query
Original Query
https://api.fda.gov/drug/event.json?
search=patient.drug.openfda.pharm_class_epc:"nonsteroidal+anti-inflammatory
+drug"&count=patient.reaction.reactionmeddrapt.exact
Our Query
https://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20120930]+AND
+patient.drug.openfda.brand_name:"Yaz"
@8PathSolutions© 2015 8 Path Solutions LLC. All Rights Reserved.
LIVE DEMO
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.š https://api.fda.gov/drug/event.json?
api_key=83k0zAKbRQbk5rCbedjpDs8DdKwoagWvojeW2ATf&search
=receivedate:[20040101+TO+20120930]+AND+receiptdate:
[20040101+TO+20120930]+AND+patient.drug.medicinalproduct:
%22YAZ%22&count=patient.drug.openfda.brand_name.exact
© 2016. 8 Path Solutions LLC.
THANK YOU
JENNIFER SHIN
jshin@8pathsolutions.com
@8pathsolutions
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved. @8PATHSOLUTIONS
○  http://www.ibmbigdatahub.com/blog/exploring-public-open-data-project-part-1
○  http://www.ibmbigdatahub.com/blog/exploring-public-open-data-project-part-2
○  http://www.ibmbigdatahub.com/blog/exploring-public-open-data-project-part-3
○  http://www.ibmbigdatahub.com/blog/exploring-public-open-data-project-part-4
Additional Resources
© 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
Footnotes
š  1. http://www.merckmanuals.com/professional/clinical-pharmacology/adverse-drug-
reactions/adverse-drug-reactions

More Related Content

Similar to MongoDB_Talk_ValidatingAnOpenSociety_112916_Final

BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareSkillspeed
 
MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB
 
Strategic Sourcing in the Digital Economy
Strategic Sourcing in the Digital EconomyStrategic Sourcing in the Digital Economy
Strategic Sourcing in the Digital EconomySAP Ariba
 
Gateway Webinar: Strategies for Biowaiver Application for Generic Nasal Spray...
Gateway Webinar: Strategies for Biowaiver Application for Generic Nasal Spray...Gateway Webinar: Strategies for Biowaiver Application for Generic Nasal Spray...
Gateway Webinar: Strategies for Biowaiver Application for Generic Nasal Spray...BCNorris Consulting
 
Telehealth Evolution & Revolution - James Hammond, Dell Healthcare / NTT Data...
Telehealth Evolution & Revolution - James Hammond, Dell Healthcare / NTT Data...Telehealth Evolution & Revolution - James Hammond, Dell Healthcare / NTT Data...
Telehealth Evolution & Revolution - James Hammond, Dell Healthcare / NTT Data...VSee
 
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...Work-Bench
 
Modeling Big Data with the ArchiMate 3.0 Language
Modeling Big Data with the ArchiMate 3.0 LanguageModeling Big Data with the ArchiMate 3.0 Language
Modeling Big Data with the ArchiMate 3.0 LanguageIver Band
 
Saama Technologies Award Write Up
Saama Technologies Award Write UpSaama Technologies Award Write Up
Saama Technologies Award Write UpClaudia Toscano
 
Iterative Development From Soup to Nuts
Iterative Development From Soup to NutsIterative Development From Soup to Nuts
Iterative Development From Soup to NutsInfostretch
 
The Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen ModernizationThe Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen ModernizationHortonworks
 
Tracxn Startup Research — Life Sciences Landscape, October 2016
Tracxn Startup Research — Life Sciences Landscape, October 2016Tracxn Startup Research — Life Sciences Landscape, October 2016
Tracxn Startup Research — Life Sciences Landscape, October 2016Tracxn
 
Supporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationSupporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationDenodo
 
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...Seeling Cheung
 
The Secret Names Of Things.pdf
The Secret Names Of Things.pdfThe Secret Names Of Things.pdf
The Secret Names Of Things.pdfMark Fortner
 
Healthcare Information Analytics
Healthcare Information AnalyticsHealthcare Information Analytics
Healthcare Information AnalyticsFrank Wang
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Pentaho
 

Similar to MongoDB_Talk_ValidatingAnOpenSociety_112916_Final (20)

US FDA Quality Metrics Technical Conformance Guide
US FDA  Quality Metrics Technical  Conformance GuideUS FDA  Quality Metrics Technical  Conformance Guide
US FDA Quality Metrics Technical Conformance Guide
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in Healthcare
 
MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in Healthcare
 
Strategic Sourcing in the Digital Economy
Strategic Sourcing in the Digital EconomyStrategic Sourcing in the Digital Economy
Strategic Sourcing in the Digital Economy
 
Gateway Webinar: Strategies for Biowaiver Application for Generic Nasal Spray...
Gateway Webinar: Strategies for Biowaiver Application for Generic Nasal Spray...Gateway Webinar: Strategies for Biowaiver Application for Generic Nasal Spray...
Gateway Webinar: Strategies for Biowaiver Application for Generic Nasal Spray...
 
Telehealth Evolution & Revolution - James Hammond, Dell Healthcare / NTT Data...
Telehealth Evolution & Revolution - James Hammond, Dell Healthcare / NTT Data...Telehealth Evolution & Revolution - James Hammond, Dell Healthcare / NTT Data...
Telehealth Evolution & Revolution - James Hammond, Dell Healthcare / NTT Data...
 
Chuck Half - MSA Electronics
Chuck Half - MSA ElectronicsChuck Half - MSA Electronics
Chuck Half - MSA Electronics
 
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
 
Modeling Big Data with the ArchiMate 3.0 Language
Modeling Big Data with the ArchiMate 3.0 LanguageModeling Big Data with the ArchiMate 3.0 Language
Modeling Big Data with the ArchiMate 3.0 Language
 
Saama Technologies Award Write Up
Saama Technologies Award Write UpSaama Technologies Award Write Up
Saama Technologies Award Write Up
 
Iterative Development From Soup to Nuts
Iterative Development From Soup to NutsIterative Development From Soup to Nuts
Iterative Development From Soup to Nuts
 
The Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen ModernizationThe Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen Modernization
 
Tracxn Startup Research — Life Sciences Landscape, October 2016
Tracxn Startup Research — Life Sciences Landscape, October 2016Tracxn Startup Research — Life Sciences Landscape, October 2016
Tracxn Startup Research — Life Sciences Landscape, October 2016
 
Supporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationSupporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data Virtualization
 
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
 
The Secret Names Of Things.pdf
The Secret Names Of Things.pdfThe Secret Names Of Things.pdf
The Secret Names Of Things.pdf
 
Oracle big data publix sector 1
Oracle big data publix sector 1Oracle big data publix sector 1
Oracle big data publix sector 1
 
Healthcare Information Analytics
Healthcare Information AnalyticsHealthcare Information Analytics
Healthcare Information Analytics
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
 
Choice View Pitch
Choice View PitchChoice View Pitch
Choice View Pitch
 

MongoDB_Talk_ValidatingAnOpenSociety_112916_Final

  • 1. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. Data, Decisions, and MongoDB Tuesday, November 29, 2016 © 2015 8 Path Solutions LLC. All Rights Reserved.  Validating  an  Open  Society     Jennifer Shin Founder, 8 Path Solutions Senior Principal Data Scientist, Nielsen
  • 2. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. © 2015 8 Path Solutions LLC. All Rights Reserved. ●  Background š Native New Yorker š Undergraduate degree in Economics, Mathematics & Creative Writing from Columbia University š Graduate degree in Statistics from Columbia University Introduction
  • 3. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. © 2015 8 Path Solutions LLC. All Rights Reserved. ●  Professional Experience š Founder & Chief Data Scientist at 8 Path Solutions š Senior Principal Data Scientist at Nielsen š Management consultant at Fortune 100 companies š Top Contributor for IBM Data Magazine š Faculty in the MIDS Graduate Program at UC Berkeley Introduction
  • 4. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. © 2015 8 Path Solutions LLC. All Rights Reserved. ●  Recent Talks & Presentations š Institute of Computational & Experimental Research in Mathematics (ICERM) at Brown University š TDWI Accelerate 2016 š Data Dialogs Conference – UC Berkeley š IBM World of Watson 2016 Introduction
  • 5. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. © 2015 8 Path Solutions LLC. All Rights Reserved. š  Adverse Drug Reactions (ADR) š  FDA Adverse Events Reporting System (FAERS) š  openFDA API š  openFDA + MongoDB Today’s Talk
  • 6. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. © 2015 8 Path Solutions LLC. All Rights Reserved. š Monitoring the safety of medicinal products š Adverse Drug Reactions (ADR): unwanted, uncomfortable, or dangerous effects that a drug may have š In the US, 3 to 7% of all hospitalizations are due to ADR1 Pharmacovigilance
  • 7. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. DATA SOURCE š  FDA Adverse Event Reporting System (FAERS) ●  A computerized information database designed to support the FDA's post-marketing safety surveillance program for all approved drug & therapeutic biologic products ●  Used to monitor for new adverse events and medication errors that might occur with these marketed products © 2015 8 Path Solutions LLC. All Rights Reserved.
  • 8. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. DATA SOURCE © 2015 8 Path Solutions LLC. All Rights Reserved. Option 1: Quarterly FAERS Data Files ○  Available each quarter from the FDA ○  Data from 2004 to 2012 available in ASCII/SGML Data after 2012 available in ASCII/XML
  • 9. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. DATA CHALLENGES © 2015 8 Path Solutions LLC. All Rights Reserved. ○  Requires downloading and consolidating quarterly reports in a databases.
  • 10. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. DATA ISSUES © 2015 8 Path Solutions LLC. All Rights Reserved. ○  Duplicate Reports ○  Spelling errors ○  Inaccurate information ○  One field for all drug names (e.g. Brand Name & Generic) and active ingredients ○  Multiple drugs included in a single report
  • 11. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. openFDA API
  • 12. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. © 2015 8 Path Solutions LLC. All Rights Reserved. š  Beta launch - June 2014 š  New website Food & Drug Administration openFDA API
  • 13. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. © 2015 8 Path Solutions LLC. All Rights Reserved. š  Facilitate access and use of big important FDA public datasets by developer, researchers, and the public through harmonization of data across disparate FDA datasets provided via application programming interfaces (APIs) API OBJECTIVES
  • 14. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. © 2015 8 Path Solutions LLC. All Rights Reserved. š  Drug adverse events: Reports of drug side effects, product use errors, product quality problems, and therapeutic failures. š  Drug product labeling: Structured product information, including prescribing information, for approved drug products. š  Drug recall enforcement reports: Drug product recall enforcement reports. API DATA
  • 15. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. © 2015 8 Path Solutions LLC. All Rights Reserved. š  Drug product labeling: Structured product information, including prescribing information, for approved drug products. š  Drug recall enforcement reports: Drug product recall enforcement reports. API DATA š  Drug adverse events: Reports of drug side effects, product use errors, product quality problems, and therapeutic failures
  • 16. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. API DATA © 2015 8 Path Solutions LLC. All Rights Reserved. ○  Access to FAERS database using API calls
  • 17. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. API DATA 3 ways to download data from openFDA. š Download manually.  There’s a downloads section on each endpoint’s openFDA. š Write code to download the data automatically.  Use a special API query (see below) to get a list of all the current data files for each endpoint. š Synchronize with the openFDA S3 bucket. 
  • 18. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. DATA ISSUE SOLUTION Drug Name Harmonization Process © 2015 8 Path Solutions LLC. All Rights Reserved. ○  Benefits •  Harmonizes the FAERS data on drug identifiers using other data sources, such as NDC & RxNorm •  Separate data fields for brand names, generic names, and active ingredients
  • 19. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. DATA ISSUE SOLUTION Drug Name Harmonization Process © 2015 8 Path Solutions LLC. All Rights Reserved. ○  Limitations •  Cannot harmonize misspelled drug names •  Validation process requires using FAERS data files - Not necessarily easier
  • 20. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. CASE STUDY: DROSPIRENONE
  • 21. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. FAERS vs. OPENFDA © 2015 8 Path Solutions LLC. All Rights Reserved. FAERS Data Files Data From 2004 Q1 to 2012 Q3 2 out of the 7 Reports: DEMO, DRUG, REAC, RPSR, THER, OUTC, INDI Consolidated using SQL Server 2012
  • 22. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. FAERS vs. OPENFDA © 2015 8 Path Solutions LLC. All Rights Reserved. Drug Name Mapping DRUGNAME   YAZ   DROSPIRENONE  W/ETHINYLESTRADIOL  (YAZ)   YAZ  /06358701/   YAZ  BAYER  HEALTHCARE   YAZ  N/A  BAYER  HEALTHCARE   YAZ  (24)   YAZ  (DROSPIRENONE  +  ETHINYLESTRADIOL  20!G  (24+4)  [YAZ]     YAZ  (DROSPIRENONE/ESTRADIOL)   YAZ  (ORAL  CONTRACEPTATIVE  NOS)  
  • 23. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. FAERS vs. OPENFDA © 2015 8 Path Solutions LLC. All Rights Reserved. Data Fields FAERS Data Files openFDA API Brand Name Yaz     DRUGNAME     pa.ent.drug.openfda.brand_name       Generic Name Drospirenone  Ethinyl   Estradiol     DRUGNAME       pa.ent.drug.openfda.generic_name   Case Report Identifier     ISR     safetyrepor.d  
  • 24. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. FAERS vs. OPENFDA © 2015 8 Path Solutions LLC. All Rights Reserved. ○  Inconsistent Query Data •  Running the same query on 8/03/14 & 8/10/14 produced different results
  • 25. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. FAERS vs. OPENFDA © 2015 8 Path Solutions LLC. All Rights Reserved. ○  Inconsistent Query Data •  No information as to the cause of these changes could be found on the FDA website •  According to the Github records, there were no updates made between these two dates •  For our brand name data analysis, the most recent results were selected
  • 26. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. DATA ISSUES SOLUTION? Drug Name Harmonization Process © 2015 8 Path Solutions LLC. All Rights Reserved. ○  FAERS Data Files vs. openFDA API Query •  Cannot harmonize misspelled drug names •  Validation process requires using FAERS data files - Not necessarily easier
  • 27. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. RESULTS 1: BRAND NAME
  • 28. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. RESULTS 1: BRAND NAME 1 0 20 105 2,502 2,321 1,472 8,857 6,750 0 0 19 215 2,498 2,261 1,365 7,881 5,551 2004 2005 2006 2007 2008 2009 2010 2011 2012 openFDA FAERS Comparing Reports for Yaz from Q1 2004 to Q3 2012
  • 29. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. RESULTS 1: BRAND NAME openFDA API query results for safetyreportid “4990905-5”
  • 30. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. RESULTS 1: BRAND NAME DRUGNAME for ISR number “4990905” only includes DROSPIRENONE AND ETHINYL ESTRADIOL openFDA API query results for safetyreportid “4990905-5”
  • 31. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. RESULTS II: GENERIC NAME QUERY   openFDA   Results   Initial Query hOps://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20120930]+AND+   pa^ent.drug.openfda.generic_name:”DROSPIRENONE+ETHINYL +ESTRADIOL”&count=pa^ent.drug.openfda.brand_name     Total:          714     Yaz:                107  
  • 32. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. RESULTS II: GENERIC NAME QUERY   openFDA   Results   Initial Query hOps://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20120930]+AND+   pa^ent.drug.openfda.generic_name:”DROSPIRENONE+ETHINYL +ESTRADIOL”&count=pa^ent.drug.openfda.brand_name     Total:          714     Yaz:                107   Revised Query hOps://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20120930]+AND +pa^ent.drug.openfda.generic_name:"DROSPIRENONE”+AND +pa^ent.drug.openfda.generic_name:"ETHINYL”+AND +pa^ent.drug.openfda.generic_name:"ESTRADIOL”&count=pa^ent.drug.openfda.brand_na me     Total:    31,051     Yaz:          22,028  
  • 33. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. Results The drug harmonization process incorrectly associated reports for Drospirenone Ethinyl Estradiol with the drug Yaz. š Raises concerns about the drug harmonization process for Yaz as well as other drugs š Further study is needed to validate the accuracy of the openFDA data
  • 34. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. RESULTS II: GENERIC NAME The reported cases for Yaz from the openFDA API and FAERS Data Files varied widely when compared based on the year of the report. š For 2006, the API only included 105 cases, which is 51% less than the 215 cases in FAERS. š For 2011, the API included 8,857 cases, which is 12.4% more than the 7,881 cases in FAERS.
  • 35. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. Data, Trust, and Reproducibility
  • 36. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. Which is better? © 2015 8 Path Solutions LLC. All Rights Reserved. @8PATHSOLUTIONS š  Traditional methods vs. newer approaches š  Data processing & data validation š  Access via API vs. database  š  Implications for pharmaceutical research, data science, data technology & development
  • 37. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. Data Dependence š  Risk of relying on API data EX: http://download.open.fda.gov/
  • 38. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. openFDA Data 3 ways to download data from openFDA. š Download manually.  There’s a downloads section on each endpoint’s openFDA. š Write code to download the data automatically.  Use a special API query (see below) to get a list of all the current data files for each endpoint. š Synchronize with the openFDA S3 bucket. 
  • 39. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. MongoDB š Collecting query records š Storing query results š Setting up data environment
  • 40. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. LIVE DEMO © 2015 8 Path Solutions LLC. All Rights Reserved.
  • 41. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. LIVE DEMO © 2015 8 Path Solutions LLC. All Rights Reserved. openFDA API website https://open.fda.gov/index.html FDA’S Example Query https://open.fda.gov/api/reference/#example-query
  • 42. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. LIVE DEMO © 2015 8 Path Solutions LLC. All Rights Reserved. FDA’S Example Query: https://open.fda.gov/api/reference/#example-query Original Query https://api.fda.gov/drug/event.json? search=patient.drug.openfda.pharm_class_epc:"nonsteroidal+anti-inflammatory +drug"&count=patient.reaction.reactionmeddrapt.exact Our Query https://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20120930]+AND +patient.drug.openfda.brand_name:"Yaz"
  • 43. @8PathSolutions© 2015 8 Path Solutions LLC. All Rights Reserved. LIVE DEMO
  • 44. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.š https://api.fda.gov/drug/event.json? api_key=83k0zAKbRQbk5rCbedjpDs8DdKwoagWvojeW2ATf&search =receivedate:[20040101+TO+20120930]+AND+receiptdate: [20040101+TO+20120930]+AND+patient.drug.medicinalproduct: %22YAZ%22&count=patient.drug.openfda.brand_name.exact
  • 45. © 2016. 8 Path Solutions LLC. THANK YOU JENNIFER SHIN jshin@8pathsolutions.com @8pathsolutions
  • 46. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. © 2015 8 Path Solutions LLC. All Rights Reserved. @8PATHSOLUTIONS ○  http://www.ibmbigdatahub.com/blog/exploring-public-open-data-project-part-1 ○  http://www.ibmbigdatahub.com/blog/exploring-public-open-data-project-part-2 ○  http://www.ibmbigdatahub.com/blog/exploring-public-open-data-project-part-3 ○  http://www.ibmbigdatahub.com/blog/exploring-public-open-data-project-part-4 Additional Resources
  • 47. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC. Footnotes š  1. http://www.merckmanuals.com/professional/clinical-pharmacology/adverse-drug- reactions/adverse-drug-reactions