Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using advertising data to model migration, poverty and digital gender gaps

145 views

Published on

Talk given at the Machine Learning and Data Analytics Symposium (MLDAS 2019). https://qcai.qcri.org/index.php/events/mldas-2019/.
Contact me if you're interested in the topic of poverty mapping or data for development in general.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Using advertising data to model migration, poverty and digital gender gaps

  1. 1. Using Advertising Data to Model Migration, Poverty and Digital Gender Gaps Ingmar Weber April 1, 2019 MLDAS @ingmarweber
  2. 2. Great Collaborators • Mapping poverty in the Philippines – with UNICEF and Thinking Machines • Tracking digital gender gaps – with Data2X and University of Oxford • Monitoring the Venezuelan exodus – with UNHCR, UNICEF and iMMAP Joao Palotti Masoomali Fatehkia
  3. 3. https://business.facebook.com/adsmanager/creation/
  4. 4. http://fb-doha.qcri.org
  5. 5. http://fb-doha.qcri.org
  6. 6. http://fb-nyc.qcri.org
  7. 7. Mapping Poverty
  8. 8. Why Map Poverty? • Monitor sustainable development • Plan better poverty reduction interventions • Impact assessment of interventions – Low latency a huge plus
  9. 9. Obtaining Training Data • 2017 household survey implemented by the Philippine Statistics Authority (PSA) • Representative sample of ~40 households in n=1214 “clusters” • Asset ownership based wealth index (y=WI) => standard regression task
  10. 10. Sources of Ground Truth Noise • Sampling noise – Wealth index depends on particular households – Expected R^2 = .95 (bootstrap estimate) • Spatial perturbation – True location is (x,y), but reported at (x’,y’) – Protects privacy – Expected R^2 = .89 (simulations) • Combined – Expected R^2 = .84 – “Expected upper bound”
  11. 11. Features to Map Poverty 24 variables on connection type, device manufacturer, device type
  12. 12. Modeling the Wealth Index ● Model selection using LASSO: Wealth Index / 1000 = - 96 + 115 * (frac.FB users with 4G) + 216 * (frac. FB users with WiFi) + 48 * (frac. FB users with iOS) - 89 * (frac. FB users with Cherry Mobile) + 11 * (frac. FB users with high end phones) + 30 * (FB penetration) + 3 * (log population density) Tried regression trees, didn’t help
  13. 13. Modeling the Wealth Index 2017 2019 R^2 = 0.58 (10-fold CV) Offl. baseline R^2 = .37 Upper bound: R^2 = .84 Due to DHS noise
  14. 14. Performance Across Distribution 0- 10% 10- 20% 20- 30% 30- 40% 40- 50% 50- 60% 60- 70% 70- 80% 80- 90% 90- 100% Kend. τ .213 .104 .082 .115 .011 .094 .146 .145 .060 .180 Ongoing: look more at which features help at which point of distribution
  15. 15. Summary - Challenging in low population areas (k-anonymity) - Can catch temporal changes? Unclear. + Potentially more “causal” than satellite features + Supports demographic dis-aggregation + Does not break down at lowest decile + Promising to combine with other data sources • Interested? Launching poverty mapping initiative
  16. 16. Additional Slides
  17. 17. Tracking Digital Gender Gaps
  18. 18. Digital Gender Gaps
  19. 19. Digital Gender Gaps
  20. 20. Digital Gender Gaps
  21. 21. Digital Gender Gaps
  22. 22. www.digitalgendergaps.org
  23. 23. www.digitalgendergaps.org
  24. 24. www.digitalgendergaps.org
  25. 25. www.digitalgendergaps.org
  26. 26. www.digitalgendergaps.org
  27. 27. Model Evaluation
  28. 28. Monitoring the Venezuelan Exodus
  29. 29. Monitoring Venezuelan Exodus
  30. 30. Previously Unavailable Estimates Brazil - Facebook. Feb 2019 Peru - Facebook. Feb 2019 Ecuador - Facebook. Feb 2019
  31. 31. Predicted Income Based on OS
  32. 32. Advertising Audience Estimates + Global reach with over 2 billion users + FB, LinkedIn, Google, Snapchat, IG, ... + Real-time estimates + Uses anonymous and aggregate data + Gender, age, location, country of origin, ….
  33. 33. Advertising Audience Estimates - Black box on how attributes are inferred - Needs modeling for bias correction - Usage patterns change over time - Only includes people who are online - Could create “use FB!” incentives - Risk of misuse
  34. 34. Thanks! iweber@hbku.edu.qa https://www.slideshare.net/IngmarWeber/ https://ingmarweber.de/publications/

×