Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big data for development


Published on

Talk at the Computer Lab, University of Cambridge

Published in: Education
  • Be the first to comment

Big data for development

  1. 1. Big data for development (BD4D) Junaid Qadir InformationTechnology University (ITU), Pakistan
  2. 2. Session Outline The big data revolution •  What’s driving big data? What’s new about big data? Big data for development (BD4D) •  BD4D techniques •  BD4D data sources •  BD4D applications -big data for health; mobile-based BD4D applications BD4D research at ITU/ Punjab IT board •  Main ideas pursued •  Some success stories 0 1 2 BD4D challenges and pitfalls3 •  Data ownership, privacy, bias, causation, and false positives
  3. 3. the big data revolution 0
  4. 4. Big interest in big data
  5. 5. Applications of big data Business Recommendations Sports People Operations Transport smart buildings and energy analytics
  6. 6. what’s driving big data?
  7. 7. Digital life and Data exhaust
  8. 8. What’s driving the data deluge The Internet of Things
  9. 9. Democratization of data
  10. 10. what’s new about big data?
  11. 11. More data trumps clever algorithms
  12. 12. big data for development (BD4D) 1
  13. 13. Traditional use of data “Today the data is siloed off and unavailable. When data is in silos you can't make use of it either for evil or for the public good, and we need the public good. We need to stop pandemics. We need to make a greener world. We need to make a fairer world.” Alex “Sandy” Pentland
  14. 14. What are we using big data for?
  15. 15. how can we use data, and our ability to store it and process it, for social good and development of human beings (especially for underdeveloped countries)? Can we use big data for development? Under Revision, at the Big Data Analytics (BDAN) Journal
  16. 16. BD4D techniques big data analytics/ data science for development
  17. 17. Although the buzzwords describing the field have changed - from ‘Knowledge Discovery’ to ‘Data Mining’ to ‘Predictive Analytics’, and now to ‘Data Science’, the essence has remained the same - discovery of what is true and useful in the mountains of data. Gregory Piatetsky-Shapiro, Data Mining Pioneer Various faces of data analytics
  18. 18. Types of Analytics The BD4D research frontier
  19. 19. Data science techniques for BD4D Visual Analytics
  20. 20. Data science techniques for BD4D Machine Learning Computer Output Data Program/ Model Classification to predict quantity: regression methods Classifications to predict category: trees, forests, kNN, etc. Supervised Learning Clustering to self-organize in categories: k-means etc. Unsupervised Learning
  21. 21. Data science techniques for BD4D Time Series Analysis Deep Learning
  22. 22. BD4D data sources
  23. 23. 
 mobile phones 
 Mobile phones reach almost four-fifths of the world’s people More households in developing countries own a mobile phone than have access to electricity or improved sanitation. the ``leapfrogging’’ effect Kenya’s M-Pesa reached 80% of households in 4yrs!
  24. 24. 
 crowdsourcing through social media (for data generation and big data processing) Food requests after the Haiti earthquake. The Ushahidi-Haiti crisis map helps organizations intuitively ascertain where supplies are most needed.
  25. 25. 
 crowdsourcing (crowd computing) (for data generation and big data processing) The aggregation of information in groups results in decisions that are often better than could have been made by any single member of the group.
  26. 26. BD4D applications 1.  Emergency/ Crisis Response 2.  Healthcare 3.  Better governance 4.  Education 5.  Agriculture, Hunger, Food
  27. 27. Big data for development Humanitarian emergencies/ crisis response
  28. 28. Use of technology in Haiti (2010) The dawn of digital humanitarianism
  29. 29. The precursor of big data geo-analytics A classic example of Crisis Mapping Analytics is John Snow’s Cholera Map. Snow studied the severe outbreak of cholera in 1854 near the Broad Street in London, England.
  30. 30. What’s new: big crisis data analytics The Internet, Open Source, and Open Data Mobilization of the Wisdom of the Crowds Crowd Mapping Crowd Computing
  31. 31. big data for health Healthcare is the killer development ‘app’ As the United Nations launches a 17-point agenda for helping the world's poor, 267 economists from 44 countries on Friday published a declaration advocating one particular way: Make people healthier.
  32. 32. What’s wrong with diagnosis today? Hit and miss and done on intuition MIT graduate student Steven Keating, whose live was saved by his curiosity, and his medical health data that he curated. How medical selfies can save your life!
  33. 33. Since the beginning of time, most people have been isolated, without information about or access to the best health practices. But in just the last decade, this situation has changed completely: through the spread of cell phone networks, the vast majority of humanity now has a two-way digital connection that can send voice, text, and most recently, images and digital sensor data. Big data will revolutionize healthcare
  34. 34. mobile phone based development analytics Call detailed record (CDR)
  35. 35. Mobile big data dimensions mobility As mobile phone users send and receive calls and messages through different cell towers, it is possible to “connect the dots” and reconstruct the movement patterns of a community. social interaction The geographic distribution of one’s social connections may be useful both for building demographic profiles of aggregated call traffic and understanding changes in behavior. economic activity Mobile network operators use monthly airtime expenses to estimate the household income of anonymous subscribers in order to target appropriate services to them through advertising.
  36. 36. Some CDR-based applications Disaster Response Migration Analysis Digital Epidemiology Proxy Census Map Socioeconomic Indicators Optimizing Transporation
  37. 37. Mobile-based reality mining Unobtrusive Personal Analytics
  38. 38. What was observed by us … may be observed so well that all the disputes that for so many generations have vexed philosophers are destroyed by visible certainty, and we are liberated from wordy arguments. Computational social science “Because of the success of science, there is a kind of a pseudo-science. Social science is an example of a science which is not a science.
  39. 39. BD4D research at ITU/ PITB2 ~120 million people
  40. 40. Smartphones and good governance
  41. 41. Promoting open data (
  42. 42. The use of crowdsourcing (e.g., for open education data accountability)
  43. 43. Creating new mobile apps: DATAPLUG
  44. 44. Some Pakistani success stories
  45. 45. Greater transparency
  46. 46. Dengue disease epidemic (Lahore, 2011) 21,000 confirmed patients in Punjab 17,000 patients in Lahore alone. 352 deaths. Disease prevention
  47. 47. Dengue activity tracking system (DATs)
  48. 48. Crime analytics
  49. 49. BD4D challenges/ pitfalls Data ownership, privacy, bias, causation, and false positives 3
  50. 50. What can we do with big data? (the optimistic view) “With enough data and the ability to crunch it, virtually any challenge facing humanity today can be solved.” Eric Schmidt et al, How Google Works, 2014
  51. 51. What can we do with big data? (the pragmatic view) “Technology amplifies human intent and capacity; it doesn't substitute for them.”---Kentaro Toyama Digital dividends depend on key “analog complements” that include appropriate policies, regulatory frameworks, accountability, and capable human workforce.
  52. 52. Social networks and development Human behavior is massively influenced by their social networks. Connections can completely transform you! Dev. Insight!
  53. 53. human development is complex … Human behavior is complex to model, influence, and predict. What drives human/ societal development is the right mindset, attitude, and intent—which is not necessarily dependent on data. Should governments shape individual choices?
  54. 54. Personal data & data ownership “Personal data is the new oil of the internet and the new currency of the digital world.” —Meglena Kuneva European Consumer Commissioner
  55. 55. Privacy and big data In a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. Many open ethical data use/ transparency/ privacy questions..
  56. 56. The (big?) bias/ noise of big data “Is there any point to which you would wish to draw my attention?” “To the curious incident of the dog in the night-time.” “The dog did nothing in the night-time.” “That was the curious incident,” remarked Sherlock Holmes. Sometimes what is not in the data is more important; i.e., the data may be an unrepresentative sample It is to mistake the noise in the big data to be the signal Another self-introduced bias in big data can be outlier filtering/ deletion.
  57. 57. Correlations vs. Causation Is induction the new big-data era mode of science? We should not use the same information to construct and test the same hypothesis Correlation based analysis often suffices for traditional big data applications, but will it for BD4D?
  58. 58. Machine predictions going awry! In the movie Minority Report, the cop tackles and handcuffs individuals who have commi:ed no crime (yet), proclaiming stuff like: “By mandate of the District of Columbia Precrime Division, I’m placing you under arrest for the future murder of Sarah Marks and Donald Dubin.” The arrested person confronts Cruise and asks: “You ever get any false positives?” In fact, it is very easy to do design a pre-crime criminal catching algorithm that will catch ALL the criminals!
  59. 59. Open research challenges 1.  Mitigating the highlighted pitfalls/ challenges 2.  Multimodal BD4D analytics 3.  Predictive BD4D analytics 4.  Combining humans, crowds, and AI 5.  Unsupervised BD4D analytics Open research challenges 1.  Mitigating the highlighted pitfalls/ challenges 2.  Multimodal BD4D analytics 3.  Predictive BD4D analytics 4.  Combining humans, crowds, and AI 5.  Unsupervised BD4D analytics
  60. 60. Credits/ Acknowledgments Figures from various sources: Digital life and data exhaust: “Demography, meet Big Data; Big Data, meet Demography”, Emmanuel Letouzé Mobile CDR-based applications from: “Big Data in Action For Development” (The World Bank). Cover pages of various magazines and research papers Various PITB managed websites and applications. Wikipedia/ Online Pictures of Feynman, Galileo, Galton, Pentland, Shapiro. Snapshots from Youtube Videos and Online Websites. Stock photos from various places; the respective owners are the rightful owners of the content. Books Connected (Christakis & Fowler), Inside the Nudge Unit (David Halpern), Geek Heresy (Toyama), The Patient Will See You Now (Eric Topol), Work Rules! (Bock), World Bank Annual Reports. These resources have been used in these lecture slides for educational purpose under the fair use doctrine. The ownership of these resources, if copyrighted, is retained by their respective copyright owners. Various presentations at slideshare, including:
  61. 61. Concluding remarks The area of BD4D offers an opportunity to do good research that also leads to tangible human impact and development. We will love to hear your ideas on how we can use (big data & networking) technology for development of Pakistan (and other underdeveloped countries)