A THINKING PERSON’S
GUIDE TO:
BIG DATA FOR
DEVELOPMENT:
MYTHS, OPPORTUNITIES, AND
PITFALLS
Junaid Qadir
Associate Professor,
Information Technology
University (ITU), Pakistan
Big interest in big data
Applications of big data
Business
Recommendations
Sports
People Operations
Transport
smart buildings and
energy analytics
Opportunities—for details see:
https://www.slideshare.net/junaidq/
https://ihsanlab.itu.edu.pk
Talk’s
agenda:1 Is big data a panacea?
2 Why human social systems are complex?
• Does the data does not speak for itself?
• Big data’s big bias problem
• What about missing data?
• Most patterns are just illusions
• Are proper experiments no longer necessary?
• Big data obviates the need for nuanced subjective understanding
• The law of unintended consequences
3 Preventing big data from becoming a weapon of Math
destruction• When models become self-enforcing
• The way forward
Is big data a panacea?
1
Big data/ technological utopians
Forget taxonomy,
ontology, and psychology.
Who knows why people do
what they do? The point is
they do it, and we can track
and measure it with
unprecedented fidelity.
With enough data,
the numbers speak
for themselves.
The article attributed to Peter Norvig
(of Google), “All models are wrong, and
increasingly you can succeed without them.”
Does the data does not speak for
itself?
CopernicusPtolemy
Big Data Hubris: Big Data Fundamentalism
To set the record straight:
That's a silly statement,
I didn't say it, and I disagree with it.
Using data and statistics is not a new
idea in science, and it is always done
with respect to a theory.
Predictions depends on theory:
• Prediction requires a leap of faith (i.e., belief in some assumptions)
• The theory that the future will be similar to the past;
Sometimes things happen just by chance
Science requires differentiating effects from chance
Does the ‘magic tablet’
work as advertised or is it
just a gimmick?
"If you’re trying to establish cause-and-effect relationships,
do try to do so with a properly designed experiment." - Robert
Hooke.
Big data’s big bias
Is not “data’s bigness” sufficient?
1936 US
Elections
A. Landon
People polled: 10 million
Replies received: 2.3 million
Prediction: landslide victory for Landon
F. Roosevelt
People polled: 50,000
Prediction: Roosevelt
No amount of big data can help in drawing correct
inferences if the dataset is systematically biased
Survivorship bias or survival bias is the logical error of concentrating
only on the people or things that made it past some selection process
Bullet holes in the plane:
which part to strengthen?
Abraham Wald
What about the missing data?
“Is there any point to which you would wish to draw my attention?”
“To the curious incident of the dog in the night-time.”
“The dog did nothing in the night-time.”
“That was the curious incident,” remarked Sherlock Holmes.
The central question is not to be fooled by randomness;
hard to escape this if your findings are not guided by
theory
Easy to mistake noise for the signal
Most patterns are just illusions
Deriving a theory
after seeing the data
is dangerous
‘Proofiness’ prejudice
“The first lesson that you must learn is
that, when I call for statistics about the
rate of infant mortality, what I want is
proof that fewer babies died when I was
Prime Minister than when anyone else
was Prime Minister.
That is a political statistic.” – Churchill
“The art of using bogus mathematical arguments to prove something
that you know in your heart is true—even when it’s not.”
“If you torture the data enough, it will confess”
The problem of overfitting using data mining
Why are human social systems
complex?
2
“Technology amplifies human intent and capacity;
it doesn't substitute for them.”—Kentaro Toyama
Everything that counts cannot be counted!
Jay Forrester (MIT)
Omitting structures or variables
known to be important because
numerical data are unavailable is
actually less scientific and less
accurate than using your best
judgment to estimate their values.
To omit such variables is
equivalent to saying they have zero
effect—probably the only value that
is known to be wrong!
Forrester (1961, p. 57).
Fitting the reality to fit the model
The man leaves the store wearing the suit, his right elbow crooked and
sticking out.
The only way he can walk is with a herky-jerky, spastic gait.
A man after trying a made-to-order suit said to the
tailor,“The sleeve is two inches too long!”
The tailor says, “No, just bend your elbow like this.
See, it pulls up the sleeve.”
The man says, “but look now at the collar! When I bend my
elbow, the collar goes halfway up the back of my head.”
The tailor says,“Raise your head up and back. Perfect.”
Just then, two passersby notice him.
the first: “Look at that poor crippled guy. My heart goes out to him.”
the second: “Yeah, but his tailor must be a genius! The suit is a perfect fit!”
Humanities: what use? Isn’t it all obvious?
When every answer and its opposite
appears equally obvious, then
“something is wrong with the entire
argument of ‘obviousness.’
Once told, the people could still intuitively
understand it. “Aha, City men are more used
to working in crowded conditions and in
corporations, with chains of command, strict
standards of clothing and social etiquette,
and so on.” Even this is obvious!
Sociologists concluded on the basis of a large expensive study (comprising
600,000 WW2 servicemen) that “Men from rural backgrounds were usually in
better spirits during their Army life than soldiers from city backgrounds”.
Most people reacted as: “Aha, that makes
perfect sense. Rural men are accustomed
to harsher living standards and more
physical labor than city men, so naturally
they had an easier time adjusting.”
But actually, the finding
was the opposite.
Why social systems are complex?
Social systems are complex adaptive systems
Counterintuitive behavior of social systems
Jay Forrester
(MIT)
Most people believe cause and effect are
closely related in time and space, while in
complex dynamic systems cause and effect
are often distant in time and space.
Social systems are not linear but
belong to the class of systems called
multi-loop nonlinear feedback
systems.
Interrelationships in systems are far
more interesting and important than
separate details.
Nonlinearity means that the act of playing the game
has a way of changing the rules, Chaos (Gleick)
The whole can be different from the parts
The sectional views differ considerably
qualitatively from the reality
1) Today’s problems come from yesterday’s solutions.
2) The harder you push, the harder the system pushes back.
3) Behavior grows better before it grows worse.
4) The easy way out usually leads back in.
5) The cure can be worse than the disease.
6) Faster is slower.
7) Cause and effect are not closely related in time and
space.
8) Small changes can produce big results – but the areas of
highest leverage are often the least obvious.
9) You can have your cake and eat it too – but not at once.
10) Dividing an elephant in half does not produce two small
elephants.
11) There is no blame.
The importance of “Systems Thinking”
The law of the unintended
consequences
Social systems act as
“an enigma within a riddle within a mystery”
The Cobra Effect
(unintended consequences)
refers to the way
that measures taken
to improve a
situation can directly
make it worse.
During the British Raj, a bounty
system was devised to counter
the rise of venomous cobras.
The system worked really well; a
lot of cobras were killed and
Except that, entrepreneurs
figured out they could make
money by farming cobras and
killing more of them.
After the government scrapped
the system, there were more
cobras than before.
Campbell’s Law
“The more any
quantitative social
indicator is used for social
decision-making, the
more subject it will be to
corruption pressures and
the more apt it will be to
distort and corrupt the
social processes it is
intended to monitor.”
Goodhart’s Law
“When a measure
becomes a target, it
ceases to be a good
measure.”
Mahbub Ul Haq
“GNP can increase while
human lives shrivel”—
Mahbub ul Haq
Preventing big data from becoming a
weapon of math destruction
3
THERE ARE ETHICAL
CHOICES IN EVERY SINGLE
ALGORITHM WE BUILD
“
Should someone buy expensive medicine for his sick
child, depriving the rest of the family essential nutrition?
An ethico-philosophical question not algorithmic question
When dimensions are
heterogeneous, every
index reflects subjective
preferences.
Machine predictions going awry!
In the movie Minority Report, the cop tackles and
handcuffs individuals who have committed no crime
(yet), proclaiming stuff like:
“By mandate of the District of Columbia
Precrime Division, I’m placing you under
arrest for the future murder of Sarah Marks
and Donald Dubin.”
The arrested person confronts Cruise and asks:
“You ever get any false
positives?”
In fact, it is very easy to do design a pre-crime criminal catching
algorithm that will catch ALL the criminals!
When models can become self-
enforcing
The Pygmalion Effect
A famous map of NY created by
famous cartographers Lindberg
and Alpers
Agloe NY was not a real town. It
was a paper town—a booby trap to
catch plagiarizers.
People figured based on the map that Agloe
must have gone missing and rebuilt it!
A few years after Lindberg and
Alpers set their map trap, the fake
town appeared on a Rand McNally
map, prompting the two
mapmakers to sue for copyright
infringement.
Big data can help strengthen stereotypes
Harvard Professor
Latanya Sweeney
Conclusions
1. Big data is a fantastic tool for supplementing
traditional data analysis methods.
2. But a thinking person realizes that big data do
not automatically solve the problem that has
obsessed statisticians and scientists for centuries
3. We need AI/ML/big data algorithms that are
ethical (i.e., fair, transparent, and generally
beneficial).

A Thinking Person's Guide to Using Big Data for Development: Myths, Opportunities, and Pitfalls

  • 1.
    A THINKING PERSON’S GUIDETO: BIG DATA FOR DEVELOPMENT: MYTHS, OPPORTUNITIES, AND PITFALLS Junaid Qadir Associate Professor, Information Technology University (ITU), Pakistan
  • 2.
  • 3.
    Applications of bigdata Business Recommendations Sports People Operations Transport smart buildings and energy analytics
  • 4.
  • 5.
    Talk’s agenda:1 Is bigdata a panacea? 2 Why human social systems are complex? • Does the data does not speak for itself? • Big data’s big bias problem • What about missing data? • Most patterns are just illusions • Are proper experiments no longer necessary? • Big data obviates the need for nuanced subjective understanding • The law of unintended consequences 3 Preventing big data from becoming a weapon of Math destruction• When models become self-enforcing • The way forward
  • 6.
    Is big dataa panacea? 1
  • 7.
    Big data/ technologicalutopians Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves. The article attributed to Peter Norvig (of Google), “All models are wrong, and increasingly you can succeed without them.”
  • 8.
    Does the datadoes not speak for itself? CopernicusPtolemy
  • 9.
    Big Data Hubris:Big Data Fundamentalism To set the record straight: That's a silly statement, I didn't say it, and I disagree with it. Using data and statistics is not a new idea in science, and it is always done with respect to a theory. Predictions depends on theory: • Prediction requires a leap of faith (i.e., belief in some assumptions) • The theory that the future will be similar to the past;
  • 10.
    Sometimes things happenjust by chance Science requires differentiating effects from chance Does the ‘magic tablet’ work as advertised or is it just a gimmick? "If you’re trying to establish cause-and-effect relationships, do try to do so with a properly designed experiment." - Robert Hooke.
  • 11.
  • 12.
    Is not “data’sbigness” sufficient? 1936 US Elections A. Landon People polled: 10 million Replies received: 2.3 million Prediction: landslide victory for Landon F. Roosevelt People polled: 50,000 Prediction: Roosevelt No amount of big data can help in drawing correct inferences if the dataset is systematically biased
  • 13.
    Survivorship bias orsurvival bias is the logical error of concentrating only on the people or things that made it past some selection process Bullet holes in the plane: which part to strengthen? Abraham Wald What about the missing data? “Is there any point to which you would wish to draw my attention?” “To the curious incident of the dog in the night-time.” “The dog did nothing in the night-time.” “That was the curious incident,” remarked Sherlock Holmes.
  • 14.
    The central questionis not to be fooled by randomness; hard to escape this if your findings are not guided by theory Easy to mistake noise for the signal
  • 15.
    Most patterns arejust illusions Deriving a theory after seeing the data is dangerous
  • 16.
    ‘Proofiness’ prejudice “The firstlesson that you must learn is that, when I call for statistics about the rate of infant mortality, what I want is proof that fewer babies died when I was Prime Minister than when anyone else was Prime Minister. That is a political statistic.” – Churchill “The art of using bogus mathematical arguments to prove something that you know in your heart is true—even when it’s not.” “If you torture the data enough, it will confess”
  • 17.
    The problem ofoverfitting using data mining
  • 18.
    Why are humansocial systems complex? 2 “Technology amplifies human intent and capacity; it doesn't substitute for them.”—Kentaro Toyama
  • 19.
    Everything that countscannot be counted! Jay Forrester (MIT) Omitting structures or variables known to be important because numerical data are unavailable is actually less scientific and less accurate than using your best judgment to estimate their values. To omit such variables is equivalent to saying they have zero effect—probably the only value that is known to be wrong! Forrester (1961, p. 57).
  • 20.
    Fitting the realityto fit the model The man leaves the store wearing the suit, his right elbow crooked and sticking out. The only way he can walk is with a herky-jerky, spastic gait. A man after trying a made-to-order suit said to the tailor,“The sleeve is two inches too long!” The tailor says, “No, just bend your elbow like this. See, it pulls up the sleeve.” The man says, “but look now at the collar! When I bend my elbow, the collar goes halfway up the back of my head.” The tailor says,“Raise your head up and back. Perfect.” Just then, two passersby notice him. the first: “Look at that poor crippled guy. My heart goes out to him.” the second: “Yeah, but his tailor must be a genius! The suit is a perfect fit!”
  • 21.
    Humanities: what use?Isn’t it all obvious? When every answer and its opposite appears equally obvious, then “something is wrong with the entire argument of ‘obviousness.’ Once told, the people could still intuitively understand it. “Aha, City men are more used to working in crowded conditions and in corporations, with chains of command, strict standards of clothing and social etiquette, and so on.” Even this is obvious! Sociologists concluded on the basis of a large expensive study (comprising 600,000 WW2 servicemen) that “Men from rural backgrounds were usually in better spirits during their Army life than soldiers from city backgrounds”. Most people reacted as: “Aha, that makes perfect sense. Rural men are accustomed to harsher living standards and more physical labor than city men, so naturally they had an easier time adjusting.” But actually, the finding was the opposite.
  • 22.
    Why social systemsare complex? Social systems are complex adaptive systems
  • 23.
    Counterintuitive behavior ofsocial systems Jay Forrester (MIT) Most people believe cause and effect are closely related in time and space, while in complex dynamic systems cause and effect are often distant in time and space. Social systems are not linear but belong to the class of systems called multi-loop nonlinear feedback systems. Interrelationships in systems are far more interesting and important than separate details. Nonlinearity means that the act of playing the game has a way of changing the rules, Chaos (Gleick)
  • 24.
    The whole canbe different from the parts The sectional views differ considerably qualitatively from the reality
  • 25.
    1) Today’s problemscome from yesterday’s solutions. 2) The harder you push, the harder the system pushes back. 3) Behavior grows better before it grows worse. 4) The easy way out usually leads back in. 5) The cure can be worse than the disease. 6) Faster is slower. 7) Cause and effect are not closely related in time and space. 8) Small changes can produce big results – but the areas of highest leverage are often the least obvious. 9) You can have your cake and eat it too – but not at once. 10) Dividing an elephant in half does not produce two small elephants. 11) There is no blame. The importance of “Systems Thinking”
  • 26.
    The law ofthe unintended consequences Social systems act as “an enigma within a riddle within a mystery”
  • 27.
    The Cobra Effect (unintendedconsequences) refers to the way that measures taken to improve a situation can directly make it worse. During the British Raj, a bounty system was devised to counter the rise of venomous cobras. The system worked really well; a lot of cobras were killed and Except that, entrepreneurs figured out they could make money by farming cobras and killing more of them. After the government scrapped the system, there were more cobras than before.
  • 28.
    Campbell’s Law “The moreany quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”
  • 29.
    Goodhart’s Law “When ameasure becomes a target, it ceases to be a good measure.” Mahbub Ul Haq “GNP can increase while human lives shrivel”— Mahbub ul Haq
  • 30.
    Preventing big datafrom becoming a weapon of math destruction 3
  • 31.
    THERE ARE ETHICAL CHOICESIN EVERY SINGLE ALGORITHM WE BUILD “ Should someone buy expensive medicine for his sick child, depriving the rest of the family essential nutrition? An ethico-philosophical question not algorithmic question When dimensions are heterogeneous, every index reflects subjective preferences.
  • 32.
    Machine predictions goingawry! In the movie Minority Report, the cop tackles and handcuffs individuals who have committed no crime (yet), proclaiming stuff like: “By mandate of the District of Columbia Precrime Division, I’m placing you under arrest for the future murder of Sarah Marks and Donald Dubin.” The arrested person confronts Cruise and asks: “You ever get any false positives?” In fact, it is very easy to do design a pre-crime criminal catching algorithm that will catch ALL the criminals!
  • 33.
    When models canbecome self- enforcing
  • 34.
    The Pygmalion Effect Afamous map of NY created by famous cartographers Lindberg and Alpers Agloe NY was not a real town. It was a paper town—a booby trap to catch plagiarizers. People figured based on the map that Agloe must have gone missing and rebuilt it! A few years after Lindberg and Alpers set their map trap, the fake town appeared on a Rand McNally map, prompting the two mapmakers to sue for copyright infringement.
  • 35.
    Big data canhelp strengthen stereotypes Harvard Professor Latanya Sweeney
  • 36.
    Conclusions 1. Big datais a fantastic tool for supplementing traditional data analysis methods. 2. But a thinking person realizes that big data do not automatically solve the problem that has obsessed statisticians and scientists for centuries 3. We need AI/ML/big data algorithms that are ethical (i.e., fair, transparent, and generally beneficial).