Alex Rascanu delivered the "Big Data for International Development" presentation at the International Development Conference that took place on February 7, 2015 at University of Toronto Scarborough.
1. BIG DATA FOR
INTERNATIONAL DEVELOPMENT
Alex Rascanu
Digital marketing strategist
Founder of EVENTS& and Marketers Without Borders
February 7, 2015
International Development Conference
at University of Toronto Scarborough
2. AGENDA
1. What is big data?
2. How can we better
understand and use big data?
3. How to work with big data
(human and technical requirements)
4. Challenges & considerations
when working with big data
5. Opportunities to use big
data in the development sector
6. Conclusion & useful
resources (reports, books, events)
3. 1. WHAT IS BIG DATA?
Big Data = large amounts of digital data continually
generated by the global population
40% is the amount of
available digital data is
projected to increase
annually
4. 2. HOW CAN WE BETTER UNDERSTAND AND
USE BIG DATA?
1) Start with questions, not with
data.
2) Figure out what resources and
process you need.
3) Create actionable insights with
the objective of influencing
behaviour inside your organization
or within the society at large.
5. 2. HOW CAN WE BETTER UNDERSTAND AND
USE BIG DATA? (continued)
6. 3. HOW TO WORK WITH BIG DATA
Software: MS Access is good for basic analysis but will scale poorly in the face
of dozens of gigabytes of data. For complex analysis, use Hadoop (free, Java-
based programming framework), Google Could Platform, Microsoft Azure, or
Amazon’s EC2.
You should feel comfortable using APIs (Application Programming Interfaces).
Collect the following info. to ensure that your analysis is transparent regarding
its assumptions: “the type of information contained in the data”, “the observer or
reporter”, “the channel through which the data was acquired”, “whether the data
is quantitative or qualitative,” and “the spatio-temporal granularity of the data, i.e.
the level of geographic disaggregation (province, village, or household) and the
interval at which data is collected”
7. Your staffing needs: a good data scientist needs to have computer science
and math skills as well as a “deep, wide-ranging curiosity, is innovative and is
guided by experience as well as data”. Other necessary skills include the
ability to clean and organize large data sets, particularly those that are
unstructured, and to be able to communicate insights in actionable language.
An intimate knowledge of the real world situation of interest is also critical.
Limited budget? Get help from volunteer technical communities through
initiatives such as hackatons.
3. HOW TO WORK WITH BIG DATA (continued)
8. 4. CHALLENGES & CONSIDERATIONS WHEN
WORKING WITH BIG DATA
1. Privacy (most sensitive issue)
2. Access & sharing (some new data sources
are openly available on the web, but most of it is
privately held by corporations)
3. Analysis & interpretation (what type of
data is being analyzed? who is the representative
sample of the population? realize that correlation
doesn’t necessarily mean causation)
4. Anomaly detection (figuring out (ab)normaly
in human ecosystems is very difficult, you need to
come up with ways to characterize and detect
socioeconomic anomalies in their context)
9. 4. CHALLENGES & CONSIDERATIONS WHEN
WORKING WITH BIG DATA (continued)
It’s hard to collect household data in
real-time, so development progress
is difficult to track
Correlation does not necessarily
equal causation
11. 5. OPPORTUNITIES TO USE BIG DATA IN THE
DEVELOPMENT SECTOR (continued)
Timeframe to
intervene is relative
to the context:
Twitter-based vs.
Official Influenza
Rate in the U.S.
13. 5. OPPORTUNITIES TO USE BIG DATA IN THE
DEVELOPMENT SECTOR (continued)
Detailed reports for these projects are available at www.unglobalpulse.org/research/projects.
14. 6. CONCLUSION & USEFUL RESOURCES
How can Big Data achieve its potential in international
development?
1. Incentives need to be created for private sector to share
data
2. Create opportunities for academic researchers to
collaborate
3. New partnerships and technologies for the safe and
responsible sharing and use of data for the public good