More Related Content
Similar to MongoDB_Talk_ValidatingAnOpenSociety_112916_Final
Similar to MongoDB_Talk_ValidatingAnOpenSociety_112916_Final (20)
MongoDB_Talk_ValidatingAnOpenSociety_112916_Final
- 1. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
Data, Decisions, and MongoDB
Tuesday, November 29, 2016
© 2015 8 Path Solutions LLC. All Rights Reserved.
Validating
an
Open
Society
Jennifer Shin
Founder, 8 Path Solutions
Senior Principal Data Scientist, Nielsen
- 2. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
● Background
š Native New Yorker
š Undergraduate degree in Economics, Mathematics & Creative
Writing from Columbia University
š Graduate degree in Statistics from Columbia University
Introduction
- 3. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
● Professional Experience
š Founder & Chief Data Scientist at 8 Path Solutions
š Senior Principal Data Scientist at Nielsen
š Management consultant at Fortune 100 companies
š Top Contributor for IBM Data Magazine
š Faculty in the MIDS Graduate Program at UC Berkeley
Introduction
- 4. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
● Recent Talks & Presentations
š Institute of Computational & Experimental Research in Mathematics
(ICERM) at Brown University
š TDWI Accelerate 2016
š Data Dialogs Conference – UC Berkeley
š IBM World of Watson 2016
Introduction
- 5. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
š Adverse Drug Reactions (ADR)
š FDA Adverse Events Reporting System (FAERS)
š openFDA API
š openFDA + MongoDB
Today’s Talk
- 6. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
š Monitoring the safety of medicinal products
š Adverse Drug Reactions (ADR):
unwanted, uncomfortable, or dangerous effects that a drug may have
š In the US, 3 to 7% of all hospitalizations are due to ADR1
Pharmacovigilance
- 7. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA SOURCE
š FDA Adverse Event Reporting System (FAERS)
● A computerized information database designed to support the FDA's
post-marketing safety surveillance program for all approved drug &
therapeutic biologic products
● Used to monitor for new adverse events and medication errors that
might occur with these marketed products
© 2015 8 Path Solutions LLC. All Rights Reserved.
- 8. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA SOURCE
© 2015 8 Path Solutions LLC. All Rights Reserved.
Option 1: Quarterly FAERS Data Files
○ Available each quarter from the FDA
○ Data from 2004 to 2012 available in ASCII/SGML
Data after 2012 available in ASCII/XML
- 9. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA CHALLENGES
© 2015 8 Path Solutions LLC. All Rights Reserved.
○ Requires downloading and consolidating quarterly reports in a
databases.
- 10. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA ISSUES
© 2015 8 Path Solutions LLC. All Rights Reserved.
○ Duplicate Reports
○ Spelling errors
○ Inaccurate information
○ One field for all drug names (e.g. Brand Name & Generic) and
active ingredients
○ Multiple drugs included in a single report
- 11. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
openFDA API
- 12. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
š Beta launch - June 2014
š New website
Food & Drug Administration openFDA API
- 13. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
š Facilitate access and use of big important FDA public datasets by
developer, researchers, and the public through harmonization of
data across disparate FDA datasets provided via application
programming interfaces (APIs)
API OBJECTIVES
- 14. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
š Drug adverse events: Reports of drug side effects, product use
errors, product quality problems, and therapeutic failures.
š Drug product labeling: Structured product information, including
prescribing information, for approved drug products.
š Drug recall enforcement reports: Drug product recall enforcement
reports.
API DATA
- 15. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved.
š Drug product labeling: Structured product information, including
prescribing information, for approved drug products.
š Drug recall enforcement reports: Drug product recall enforcement
reports.
API DATA
š Drug adverse events: Reports of drug side effects, product use
errors, product quality problems, and therapeutic failures
- 16. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
API DATA
© 2015 8 Path Solutions LLC. All Rights Reserved.
○ Access to FAERS database using API calls
- 17. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
API DATA
3 ways to download data from openFDA.
š Download manually.
There’s a downloads section on each endpoint’s openFDA.
š Write code to download the data automatically.
Use a special API query (see below) to get a list of all the current
data files for each endpoint.
š Synchronize with the openFDA S3 bucket.
- 18. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA ISSUE SOLUTION
Drug Name Harmonization Process
© 2015 8 Path Solutions LLC. All Rights Reserved.
○ Benefits
• Harmonizes the FAERS data on drug identifiers using other
data sources, such as NDC & RxNorm
• Separate data fields for brand names, generic names, and
active ingredients
- 19. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA ISSUE SOLUTION
Drug Name Harmonization Process
© 2015 8 Path Solutions LLC. All Rights Reserved.
○ Limitations
• Cannot harmonize misspelled drug names
• Validation process requires using FAERS data files
- Not necessarily easier
- 20. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
CASE STUDY: DROSPIRENONE
- 21. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
FAERS vs. OPENFDA
© 2015 8 Path Solutions LLC. All Rights Reserved.
FAERS Data Files
Data From 2004 Q1 to 2012 Q3
2 out of the 7 Reports:
DEMO, DRUG, REAC, RPSR, THER, OUTC, INDI
Consolidated using SQL Server 2012
- 22. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
FAERS vs. OPENFDA
© 2015 8 Path Solutions LLC. All Rights Reserved.
Drug Name Mapping
DRUGNAME
YAZ
DROSPIRENONE
W/ETHINYLESTRADIOL
(YAZ)
YAZ
/06358701/
YAZ
BAYER
HEALTHCARE
YAZ
N/A
BAYER
HEALTHCARE
YAZ
(24)
YAZ
(DROSPIRENONE
+
ETHINYLESTRADIOL
20!G
(24+4)
[YAZ]
YAZ
(DROSPIRENONE/ESTRADIOL)
YAZ
(ORAL
CONTRACEPTATIVE
NOS)
- 23. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
FAERS vs. OPENFDA
© 2015 8 Path Solutions LLC. All Rights Reserved.
Data Fields
FAERS Data Files openFDA API
Brand Name
Yaz
DRUGNAME
pa.ent.drug.openfda.brand_name
Generic Name
Drospirenone
Ethinyl
Estradiol
DRUGNAME
pa.ent.drug.openfda.generic_name
Case Report
Identifier
ISR
safetyrepor.d
- 24. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
FAERS vs. OPENFDA
© 2015 8 Path Solutions LLC. All Rights Reserved.
○ Inconsistent Query Data
• Running the same query on 8/03/14 &
8/10/14 produced different results
- 25. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
FAERS vs. OPENFDA
© 2015 8 Path Solutions LLC. All Rights Reserved.
○ Inconsistent Query Data
• No information as to the cause of these
changes could be found on the FDA website
• According to the Github records, there were
no updates made between these two dates
• For our brand name data analysis, the most
recent results were selected
- 26. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
DATA ISSUES SOLUTION?
Drug Name Harmonization Process
© 2015 8 Path Solutions LLC. All Rights Reserved.
○ FAERS Data Files vs. openFDA API Query
• Cannot harmonize misspelled drug names
• Validation process requires using FAERS data files
- Not necessarily easier
- 27. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS 1:
BRAND NAME
- 28. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS 1: BRAND NAME
1
0
20
105
2,502
2,321
1,472
8,857
6,750
0
0
19
215
2,498
2,261
1,365
7,881
5,551
2004 2005 2006 2007 2008 2009 2010 2011 2012
openFDA FAERS
Comparing Reports for Yaz from Q1 2004 to Q3 2012
- 29. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS 1:
BRAND NAME
openFDA API
query results for
safetyreportid
“4990905-5”
- 30. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS 1:
BRAND NAME
DRUGNAME for ISR number “4990905” only includes
DROSPIRENONE AND ETHINYL ESTRADIOL
openFDA API
query results for
safetyreportid
“4990905-5”
- 31. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS II:
GENERIC NAME
QUERY
openFDA
Results
Initial Query
hOps://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20120930]+AND+
pa^ent.drug.openfda.generic_name:”DROSPIRENONE+ETHINYL
+ESTRADIOL”&count=pa^ent.drug.openfda.brand_name
Total:
714
Yaz:
107
- 32. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS II:
GENERIC NAME
QUERY
openFDA
Results
Initial Query
hOps://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20120930]+AND+
pa^ent.drug.openfda.generic_name:”DROSPIRENONE+ETHINYL
+ESTRADIOL”&count=pa^ent.drug.openfda.brand_name
Total:
714
Yaz:
107
Revised Query
hOps://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20120930]+AND
+pa^ent.drug.openfda.generic_name:"DROSPIRENONE”+AND
+pa^ent.drug.openfda.generic_name:"ETHINYL”+AND
+pa^ent.drug.openfda.generic_name:"ESTRADIOL”&count=pa^ent.drug.openfda.brand_na
me
Total:
31,051
Yaz:
22,028
- 33. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
Results
The drug harmonization process incorrectly associated reports for
Drospirenone Ethinyl Estradiol with the drug Yaz.
š Raises concerns about the drug harmonization process for Yaz as well
as other drugs
š Further study is needed to validate the accuracy of the openFDA data
- 34. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
RESULTS II:
GENERIC NAME
The reported cases for Yaz from
the openFDA API and FAERS
Data Files varied widely when
compared based on the year of
the report.
š For 2006, the API only included
105 cases, which is 51% less
than the 215 cases in FAERS.
š For 2011, the API included
8,857 cases, which is 12.4%
more than the 7,881 cases in
FAERS.
- 35. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
Data, Trust, and Reproducibility
- 36. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
Which is better?
© 2015 8 Path Solutions LLC. All Rights Reserved.
@8PATHSOLUTIONS
š Traditional methods vs. newer approaches
š Data processing & data validation
š Access via API vs. database
š Implications for pharmaceutical research, data science,
data technology & development
- 37. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
Data Dependence
š Risk of relying on API data
EX: http://download.open.fda.gov/
- 38. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
openFDA Data
3 ways to download data from openFDA.
š Download manually.
There’s a downloads section on each endpoint’s openFDA.
š Write code to download the data automatically.
Use a special API query (see below) to get a list of all the current
data files for each endpoint.
š Synchronize with the openFDA S3 bucket.
- 39. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
MongoDB
š Collecting query records
š Storing query results
š Setting up data environment
- 40. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
LIVE DEMO
© 2015 8 Path Solutions LLC. All Rights Reserved.
- 41. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
LIVE DEMO
© 2015 8 Path Solutions LLC. All Rights Reserved.
openFDA API website
https://open.fda.gov/index.html
FDA’S Example Query
https://open.fda.gov/api/reference/#example-query
- 42. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
LIVE DEMO
© 2015 8 Path Solutions LLC. All Rights Reserved.
FDA’S Example Query: https://open.fda.gov/api/reference/#example-query
Original Query
https://api.fda.gov/drug/event.json?
search=patient.drug.openfda.pharm_class_epc:"nonsteroidal+anti-inflammatory
+drug"&count=patient.reaction.reactionmeddrapt.exact
Our Query
https://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20120930]+AND
+patient.drug.openfda.brand_name:"Yaz"
- 44. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.š https://api.fda.gov/drug/event.json?
api_key=83k0zAKbRQbk5rCbedjpDs8DdKwoagWvojeW2ATf&search
=receivedate:[20040101+TO+20120930]+AND+receiptdate:
[20040101+TO+20120930]+AND+patient.drug.medicinalproduct:
%22YAZ%22&count=patient.drug.openfda.brand_name.exact
- 45. © 2016. 8 Path Solutions LLC.
THANK YOU
JENNIFER SHIN
jshin@8pathsolutions.com
@8pathsolutions
- 46. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
© 2015 8 Path Solutions LLC. All Rights Reserved. @8PATHSOLUTIONS
○ http://www.ibmbigdatahub.com/blog/exploring-public-open-data-project-part-1
○ http://www.ibmbigdatahub.com/blog/exploring-public-open-data-project-part-2
○ http://www.ibmbigdatahub.com/blog/exploring-public-open-data-project-part-3
○ http://www.ibmbigdatahub.com/blog/exploring-public-open-data-project-part-4
Additional Resources
- 47. © 2016. 8 Path Solutions LLC.© 2016. 8 Path Solutions LLC.
Footnotes
š 1. http://www.merckmanuals.com/professional/clinical-pharmacology/adverse-drug-
reactions/adverse-drug-reactions