Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Doctor Social Graph Project


Published on

The Doctor Social Graph Project, hosted on MedStartr (healthcare version of KickStartr) is a healthcare data project opening up Physician referral data and much more.

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

Introduction to Doctor Social Graph Project

  1. 1. Introduction to the DoctorSocial Graph projectBrandon Weinberg : November 29, 2012 This presentation is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
  2. 2. Before I Start...● As the Doctor Social Graph project rapidly progresses, obsolence will kick in rendering this content stale and "old news"● This presentation was published on Slideshare 11/29/2012 when the Doctor Social Graph Project was quite new● Details as of 11/29 are gradually emerging; Most content in these slides is paraphrased from official project announcements thus far● Lets get started!
  3. 3. Organizer● Fred Trotter● Celebrated Health IT Expert in USA● One of the Designees of the Direct Project (Mandated HIE Protocol in USA)● Co-Authored First Health IT Book for OReilly and Most Popular Book on Meaningful Use Standards: Meaningful Use And Beyond● Values Open Source
  4. 4. Announcement● Strata RX 2012- ORelly Strata Conference● October 16, 2012● San Francisco● Freds Keynote Titled "The Ethos of Healthcare Data Science"● This Was When Data Was Initially Released (Open Source Licensed), For Healthcare Data Scientists in Audience
  5. 5. Social Dataset● Collaborative Relationship Data● How Doctors, Hospitals, Labs and Other Healthcare Providers Collaborate To Treat Medicare Patients● Data Includes: Referrals to Specialists● Data Includes: Lab Providers and Hospitals A Doctor Often Works With● Data Includes: Real Names and Addresses● Representative of How USA Healthcare System is Delivering Care
  6. 6. Doctor Social Graphs● Graphical Representations of Group Interactions During Medicare Treatment● Diagrams Based on Math Models● Use Nodes and Connections● Nodes: Providers, e.g. Doctors, Hospitals, Labs, Etc● Connections: Degree to Which Providers Work Together Treating Specific Patients● Will Be Largest Real-Name Social Graph That is Publically Available, Of Any Kind!
  7. 7. Doctor Social Graphs● Visualization of Social Graph begins at 1:10●
  8. 8. Other Social Graphs● Facebook, Twitter, LinkedIn Exemplify Private Big Data Social Graphs● Most Portions of Data Remain In-House● Do You Know Any Data Scientists Good at Graphing and Graph Theory? They May Appreciate Doctor Social Graph
  9. 9. Preparing Data● Initial Dataset Was Obtained by Fred Trotter● He Filed A Freedom of Information Act Request Against Medicare Claims Database● For Phase 1 Improvement, He Purchased Board Credentialing Data in All 50 States● Was $50-$1,000 Per State to Download● Board Credentialing Data is Analogous to "Credit History" for Doctors. e.g. Medical Schools, Board Certifications and Board- Imposed Punishments
  10. 10. Preparing Data● After Merging Initial (Referral and Teaming) Dataset with State Credentialing Data, the Data Was Formatted For Usability, e.g. Disparate Data Sources Will Be Formatted in CSV, JSON, XML● Merged Dataset To Be Released in Late November or Early December to MedStartr Backers (Explained Later)
  11. 11. Doctor Performance● Fairly Evaluate Doctor Quality in USA● "My Most Important Project For This Data Is Simple: I Want To Create Algorithms To Rate Doctors That Patients Find Useful And That Doctors Find Fair." Fred Trotter (paragraph 10)● "The Development of Objective, Fair and Useful Doctor Rating Systems" Fred Trotter
  12. 12. Doctor Performance● Referrals From Doctors, For Example, May Be Used As Doctor "Votes" For Each Other● Scroll Down to Third Paragraph Why This Matters To Patients For Challenges and Biases in Current Doctor Rating Systems● Examples Abound How Patients, Doctors, Insurance Companies, Hospitals, Labs, Academics, Scientists, Health Policy Makers and Others May Leverage Data For Their Particular Research Interests
  13. 13. Hospital Performance● Hospital Performance Data Sources Will Be Merged and Improve Dataset● e.g. Phase 3● Example Question: Which Cardiologists Refer to Hospitals With Poor Central Line Infection Rates?● "We Want to Turn This Into the Ultimate Source For Open Doctor and Hospital Data." Fred Trotter
  14. 14. Overview of Data● 2011 Dataset is 1.3 GB file● 3.7 Million Entries● Contains Nearly One Million Nodes● Node = Person or Organization That Provided Health Care Service to a Medicare Patient● Graph Data is Keyed Using National Provider Identifiers (NPIs)
  15. 15. NPI● NPI = Unique Provider Number● Individual and Organization Providers● NPI is Mandated by HIPPA (as a Replacement to UPIN)● Doctors and Hospitals Must Use Their NPI for Medical Billing, e.g. Medicare Billing or Prescribing Medication
  16. 16. Sample Data● A few lines from a random search (grep) on a specific NPI... grep 1548387418 refer.2011.csv > Methodist_Hospital_Referrals.csv NPI_Seen_First,NPI_Seen_Second,Seen_Count 1184710477,1548387418,55 1548387418,1326047754,62 1548387418,1598971913,24● Pretty Cool, Huh? Full Sample is on Pastebin
  17. 17. Tip For Providers● Are You A Health Care Provider?● Good Time To Update Your NPI Record● e.g. No Need to List Your Home Address● Public Database● Updated Weekly● Fred Built a Very Clean NPI Search Tool● Or Use Government NPI Search Tool
  18. 18. Referral and Teaming● Graph Has 49,685,586 Referring Party Pairs (Collaborative Relationships)● When Providers Work On The Same Group of Patients Within The Same Time Frame = Teaming Relationship● Interactions Traditionally Considered Referral Relationships = Majority of Data● If Provider A Sees the Same Patient As Provider B Within 30 Days, It Counts As +1
  19. 19. Referral and Teaming● Whats Counted is How Often Two Providers Bill Medicare For The Same Patients in 30 Days● How Can Patient Identification Be Avoided, You May Ask● For Each Entry in Dataset, At Least 11 Patients Were Involved in Transaction● 11 = CMS Standard● 11 Solves "Elvis Problem"
  20. 20. Elvis Problem● Everyone Knows Elvis Doctor● Everyone Knows Elvis Doctor Has One Patient● If Elvis Doctor Refers to a Cardiologist, Then Everyone Knows Elvis Has Heart Problems● At Least 11 Patients Take Part In Each Given "Referral Count"● Enforcing a Minimum of 11 Patients in the Transaction Addresses Said Problem
  21. 21. Privacy● Aside From Knowing a Score Reflects 11 or More Patients, Little Else Can Be Derived From Relationship Scores About Patients● e.g. Referral Relationship Score = 1,100● You Know it Reflects 11 or More Patients● Was It 11 Patients With 100 Referrals?● Was It 100 Patients With 11 Referrals?● Bottom Line. Data Reflects the Relationship Score Between Two Nodes, While Omitting Patient-Specific Data
  22. 22. Privacy● No Patient-Specific Data is Released in Dataset; Patient-Specific Data is Entirely Omitted (Not Deidentified)● Doctors Who Bill Medicare Are Government Contractors; Some Will Be Surpised As Public Data Becomes Increasingly Accessible● Freedom of Information Act Makes Government Contractor Data Available to Public for Accountability
  23. 23. Privacy● It is Fair to Presume Organizations Are Already Using Such Healthcare Data● e.g. Insurance Companies, Pharmacy Chains, Government, Etc● Ironically, Patients and Doctors Have Had Least Access To Study Such Data
  24. 24. Data Overlay● Information Will Be Discoverable By Overlaying Private or Public Data On Top of the Dataset● Dataset With Medicare Referral and Teaming Patterns Was a Starting Point to Merge Data● Dataset Will Be Steadily Improved● In Phase 2, For Example, Publically Available Nursing Home Data To Be Merged
  25. 25. Geo-Encoded● Each Provider Identifier Contains Practice Location Address and Mailing Address● Data Can Be Overlayed Geographically and Merged With Geo Databases● Graph Gets Input From a Geo-Encoded Key● 80%: Specific Latitude or Longitude● 20%: Zip Code for General Location● Localized Healthcare Data
  26. 26. Sample Data, Re-Examined● 1112223334,5556667778,1111● 1112223334 = NPI of Node That Saw Medicare Patient First● 5556667778 = NPI of Node That Saw Medicare Patient Second● 1111 = Number of Times This Happened in a 30-Day Period During A Year (Connection)● 1111 = Relationship Score Between Real- Named Nodes and Connections● Often (Not Always) the PCP = First Variable
  27. 27. Most Popular Referrals● Fred Uploaded the Top 100 Organizations by Number of Nodes in Dataset to Pastebin● One of Most Frequent "Referrals" is to Get Lab Work Done at LabCorp, Quest or Other Local Lab Providers● Also Very Common "Referrals" are to Hospital Emergency Departments and Treatment Facilities Like DaVita
  28. 28. Taxonomy● Public NPI File Has Provider-Type Ontology Classifying Doctor and Organization Types● Hospitals, Primary Care Doctors, Specialist Types and Labs are Coded in NPI File in This Provider-Type Ontology; Which is Maintained by AMAs National Uniform Claim Committee● Not Perfect, But Usually Accurate
  29. 29. Funding Overview● Funding is Occasionally Needed to Improve Dataset and Fred Uses Crowdfunding Model● Project is Currently Hosted on MedStartr (Healthcare Version of KickStarter)● Backers Can Receive Early Access (6 Months) to a Rich Healthcare Dataset● Entire Dataset Will Become Open Sourced in Mid-2013 and Free to the Public● License To Be Creative Commons Attribution-ShareAlike 3.0 Unported License
  30. 30. Funding Overview● MedStartr Backers Have Bought 1 of 2 Data Licenses● Open Source Data License● $100-$120: Access to Entire Database and Sharing of Any Integrated Data Required● Proprietary-Friendly Data License● $1,200-$5,000: Access to Entire Database and Sharing of Integrated Data Not Required
  31. 31. Funding Details● For Phase 1 Improvements to the Initial Dataset $23,720 Collected From 88 MedStartr Backers; 51 Receive Data● 39 Get Open Source Data License and 12 Get Proprietary-Friendly Data License● Data Price Rises Per Phase Between Now and Mid-2013 (Until Data Becomes Free)● Dual-License = No Data Hoarding; Lets Organizations Pay Steep Price to Innovate in Private, Without Blocking Open Research
  32. 32. Crowdfunding● Fred Effectively Said, "If A Few Hundred People Want To Pool Small Amounts of Money Together For This Project, Ill Buy and Prepare Public-Yet-Inaccessible Healthcare Data So Scientists Can Use It To Improve Healthcare, and It Will Never Be Hoarded."● Clinical Trial Fundraiser Diabetes App● Extend Features Patient Relationship App● Not-Just-For-Profits: Transparent Funding
  33. 33. Call To Innovators● "All of The Cool Discoveries in This Dataset Should Happen in the First Six Months." Fred Trotter● "All of the Really Amazing Discoveries in This Dataset Will Be Made in the Next Few Months, By Those Who Either Attended Strata RX, or Who Participate in This Project." Fred Trotter● Phase 2 Underway on MedStartr
  34. 34. Conclusion● This presentation was made for people learning about the Doctor Social Graph project● I hope it provides them a few things which make understanding the project and data easier and faster● Have fun using the Doctor Social Graph● Questions/Comments: Brandon Weinberg● Email: