Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

301 views

Published on

Graphs are used to map relations on unstructured data. Companies’ data are most from database and mined using traditional data mining approach. However, model relational data as a graph can reveal useful insights and discovery relation among data that is ignored by traditional data mining techniques. In this work we used graphs to map physician relations using claim data as a proxy and this approach reveal interesting insights from health insurance company.

Published in: Technology
  • Be the first to comment

Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017

  1. 1. Ana Paula Appel Data Scientist & Master Inventor Discovering the hidden treasure of data using graph analytic
  2. 2. © 2015 IBM Corporation2
  3. 3. IBM Research – Brazil view from Rio de Janeiro Lab Mission: To be known for our science and technology and vital to IBM, Brazil, our clients in the region and worldwide
  4. 4. Healthcare Data • Medical attention transactional data • Large healthcare insurance company in Brazil • Nationwide • Spanning 1.5 years (2013-2014) • 0.6 Tb (compressed)
  5. 5. © 2015 IBM Corporation5 Healthcare Data: Stakeholders Physicians Patients Healthcare providers Health Services Claims Health Insurance Company
  6. 6. © 2015 IBM Corporation6 • Paid Claims • Total: 109M • Doctors: 220k (almost half of all doctors In Brazil) • Patients: 2.2M • Unique Doctor-Patient pairs: 11.6M • Other support data: • Company • Providers • Authorizations ~3M • Claim denials ~13M • Geolocation • ... Over 40 tables, hundreds of fields Healthcare Data: Claims CLAIM • Physician ID • Patient ID • Timestamp • Service code • Disease – ICD9 • (80+ extra rows)
  7. 7. © 2015 IBM Corporation7 A Complex Network Perspective
  8. 8. © 2015 IBM Corporation8 PhysID ICD9 PatientID DATE SP45962 - 1001 09/04/13 SP45962 Z017 1001 26/04/13 SP47108 Z017 1001 06/12/13 SP47108 Z017 1001 16/12/13 SP45962 - 1002 11/07/13 SP45962 Z017 1002 12/07/13 SP45962 - 1002 19/08/13 SP59938 Z000 1002 24/10/13 … … … … Bipartite graph Weighted graph Directed graph • Bipartite network of doctors and patients • |V|=2.4M, |E|=11.6M • Keep only the largest connected component (92%-99% of all links) • Remove multiple edges and map to weights A Network Approach
  9. 9. © 2015 IBM Corporation9 Phys - Patient Nodes = 402 Links = 403 Patient - Patient Nodes = 377 Links = 5488 Phys - Phys Nodes = 25 Links = 30 Patient-Sharing networks Links represent a shared patient
  10. 10. © 2015 IBM Corporation10 One patient with 123 different physicians 409k patients with only 1 physician Patient Histogram Physician Histogram Physican and Patient Degree Distributions 26 physicians with more than 5k different patients, 1 with 30k (possibly spurious)
  11. 11. © 2015 IBM Corporation11 Network-Derived Metrics • Aim: extend the doctors description with relevant metrics • Metrics which, in combination with other data, will allow to: • classify • filter • reduce 35 0.1 3.2 0 4% 7% ... ... 17 0.2 5.1 1 9% 1% ... ... Compliant doctors Not-compliant doctors
  12. 12. Case: Build Metrics for Describe Physicians using Complex Network Mutual Reference CentralityLoyalty
  13. 13. Health Insurance: Similarity between Complex Network Friendship Physician Network
  14. 14. © 2015 IBM Corporation14 Mutual Reference
  15. 15. © 2015 IBM Corporation15 a b w(ab) = 17 Δt = 7 days w(ba) = 8 Δt = 2 days time 1 1 2 2 a b b a visit visit visit visit Patients Doctors Mutual Reference Same patient visits two doctors + Happens in both directions Δt = 7 days Δt = 2 days Reciprocal Link Goal Identify strong connections between each pair of physicians, in particular, the outliers.
  16. 16. © 2015 IBM Corporation16 BA DF SP Top 50Top 20 PE RJ Dens.: Dens.: 0.809 0.4470.8050.845 0.913 0.963 0.834 0.568 0.802 0.576 Mutual Reference
  17. 17. Mutual Reference Alergy Oftalmology
  18. 18. © 2015 IBM Corporation18 Mutual Reference Conclusions and Insights • Claim data is rich to identify connections among physicians and how a partnership is done. • The Mutual Reference is an indicative of physician relationship and can potentially generate other analyses, especially in a large volume of data. • The proposed metric makes possible a frequent computational analyze of that relationship. Physician A Physician B rm Rank MMS028 MMS027 1 1 MSP145 MSP144 0.31 10 Mutual Reference • Specialties that appear more • Ophthalmology to ophthalmology • Gynecologic and obstetrician to Gynecologic and obstetrician • DF has most of consultation with irregular interval • MDF010 and MDF009 with 267 consultations and average of days equal to 0 • Top pair; • 205 from MMS028 to MMS027 • 196 from MMS027 to MMS028
  19. 19. © 2015 IBM Corporation19 Patient Loyalty
  20. 20. © 2015 IBM Corporation20 Patient Loyalty Goal Identify (and quantify) doctors that have recurring patients in a systematic way, suggesting ‘loyalty’ 1. Consider patients with many visits to doctors 2. Compute the relative weight for each doctor visited 3. Count the relative number of ‘loyal’ patients for that doctor Time Consultations
  21. 21. © 2015 IBM Corporation21 Patient Loyalty São Paulo 1.00 • Weight wij represents the number of visits of patient i to dr. j • Strength s: sum of the weights attached to links belonging to a node (i.e., all visits from i) • Relative weight rw(ij): fraction of weight ij over total Strength s Degree k High rw Low rw
  22. 22. © 2015 IBM Corporation22 • The more patients with high rw and high s, the most likely the doctor is a candidate to have ‘loyalty’ capacity • Stability: Many doctors maintain sustained values of the metric across time. • A given doctor is in rank 1 or 2 during all 5 quarters. • 20% mean turnover across quarters • Top 5 specialty among physicians with higher loyalty (mf > 0.5) • Orthopedic and traumatology (5 in top 10) • Ophthalmology (3) • Gynecologic and obstetrician(2) • Pediatric (1) Patient Loyalty Relativeweight strength strength Cardio Cardio Physician mf RANK MSP 139 1.54 175 MSP 261 1.18 432 Loyalty
  23. 23. © 2015 IBM Corporation23 Centrality
  24. 24. © 2015 IBM Corporation24 Goal Identify physicians role in the network using their relative importance over other physicians. • We applied several centrality measures: • Eigenvalue; • Degree; • Betweeness; • Closeness • Do the values of these metrics change overtime? • Is it seasonal? Physician Centrality physician eigen Rank Grau MSP 153 1 1 253 MSP 139 0.55 8 335 2Q 2014 Centrality Conclusion and insights • Centrality recommends which physicians are important in the physician community • There is a set of physicians with high scores • This set of physician has a a higher number of patients in common building a block • The relative centrality has a positive correlation among close physicians • This group of physician with high score is stable overtime, with few change in each quartile.
  25. 25. © 2015 IBM Corporation25 Summary & Take Home Messages • Networks are all about relationships, as most data is. • Network-derived insights are usually not reachable from other analyses. • Complex Networks methods are very valuable to data science. • Large Healthcare claim database from Brazilian insurance company. • Applied complex network methods to find how physicians build their network. • Examples: Temporality, reciprocity and ‘loyalty’.
  26. 26. Where find more information.. Introduction basic Advanced
  27. 27. Database API’s Visualization GRAPH ANALYTICS
  28. 28. Thanks! apappel@br.ibm.com

×