Peace, Conflict, and Data


Published on

Full video at:

We might be able to do better at conflict resolution -- making peace in violent conflicts -- with the help of good data analysis. There have long been data sets about war and violent conflict at the state level, but we now have much more.

There are now extraordinarily detailed, open-source event data streams that can be used for violence prediction. Conflict "microdata" from social media and communications records can be used to visualize the divisions in society. I also suggest a long term program of conflict data collection to learn, over many cases, what works in conflict resolution and what doesn't.

We're really just at the beginning of all of this. There are huge issues around data collection, interpretation, privacy, security, and politics. But the potential is too great to ignore.

Peace, Conflict, and Data

  2. 2. TRADITIONAL CONFLICT DATA SETS Correlates of War (COW) – Mostly inter-state wars. Lists all instances of "sustained combat, involving organized armed forces, resulting in a minimum of 1,000 battle-related fatalities" per year. 1816-2007 Uppsala Conflict Data Program (UCDP) – Now includes an extensive database of non-state violence. None of the parties can be the government of a state. All conflicts causing at least 25 deaths per year. 1989-2011. Polity data series – Classifies government type (e.g. democratic, authoritarian) in a fine-grained, multi-variable way. 1800-2012.
  3. 3. WHAT KIND OF QUESTIONS CAN WE ANSWER? "Has there ever been a war between two democratic states?" "Have non-state conflicts increased over the last twenty years?" "Is the transition from an authoritarian government to a liberal democracy likely to involve violence?"
  4. 4. "Do not guess, try to count. And if you cannot count, admit that you are guessing." - G. Kitson Clark, Historian
  5. 5. SOME OTHER QUESTIONS WE WOULD LIKE TO ANSWER "Where is violence happening right now?" "Who has influence in this area?" "How does religion relate to ethnicity / geography / class?" "Where should we expect violence next?" "What is most likely to lead to a stable peace in this situation?"
  6. 6. USES FOR CONFLICT DATA 1. Analysis. (What is happening here? Who needs help?) 2. Prediction. (What will happen? Will our plan work?) 3. Learning. (What do we know? Is our theory right?)
  8. 8. GDELT GEO-POLITICAL EVENTS DATABASE GDELT – Global Data on Events, Location and Tone Open sources, open code, open data Web scraping AP Reuters BBC Local news Computer Classification Data- base
  9. 9. GDELT EVENT CODING Turns English sentences into coded events. Codes for agents, verb, time, location. Students and police fought in the Egyptian capital coded as EDU fought COP in CAIRO on 2012-07-01 Agent dictionary has 60,000 entries. Includes major organizations from the UN to the LRA. Also 1500 religious denominations, 650 ethnic groups. Now over 230 million events, 1979 to present. Updated daily! Free!
  10. 10. PREDICTING VIOLENCE USING DATA Predicting future levels of violence in Afghanistan, Yonamine 2013
  11. 11. ASSESSING PREDICTION ACCURACY Just guessing that existing violence continues at the current level almost always gives a close prediction. Does your predictive model do better than this naive conflict persistence model? Yonamine's GDELT model beats naive model in 47 out of 48 months, reduces error in predicted number of violent events by 16%.
  12. 12. WHY PREDICTION? 1. Operationally useful. 2. Simple success metrics. 3. A critical part of theory validation. An "explanation" is much stronger if you have some way to check if it is correct.
  13. 13. OPERATIONALIZING THEORY We can't build quantitative models without quantities, but: How do we record "ripeness?" Can we decide if there is a "hurting stalemate?" How many "ethnic groups" are there? When is a conflict "religious" in nature? How do we decide if a historical factor is "important?" Has there been a successful "reconciliation?" This is the "counting" step of data collection. This is where we move from qualitative to quantitative.
  14. 14. CRISIS MAPPING Geographical information systems – maps – have long been used in crisis response. But we're seeing a new generation which incorporates data from more sources, and sometimes allows regular citizens to participate.
  17. 17. CROWDSOURCED MAPPING Ushahidi election violence mapping, Kenya, 2008
  18. 18. CROWDSOURCED TRANSLATION Translation during Haiti earthquake emergency response, January 2010 Message   translated,   categorized  &   geolocated Location  is   refined  &   actionable  items   are  identified “Fanm gen tranche pou fè yon pitit nan Delmas 31” “Fanm gen tranche pou fè yon pitit nan Delmas 31” Undergoing children delivery Delmas 31 18.495746829274168, 72.31849193572998 Emergency (18.4957, -72.3185) “Fanm gen tranche pou fè yon pitit nan Delmas 31” Undergoing children delivery Delmas 31 18.495746829274168, 72.31849193572998 Emergency Average turnaround = 10 mins
  19. 19. CROWDSOURCED RADIATION MONITORING Post-meltdown radiation, mapped on Pachube, Japan, April 2011
  20. 20. LIBYA CRISIS MAP Standby Task Force for UN OCHA, March 2011
  21. 21. VOLUNTEER REFUGEE CAMP MAPPING Counting structures via satelite imagery, Somalia, August 2011
  22. 22. DATA EXHAUST AND CONFLICT MICRODATA The activities of daily life leave digital traces. This is called "data exhaust." •  phone calls and text messages •  emails and instant messages •  search queries and web site visits •  purchases of any type (except cash) •  social media posts •  your location 24 hours per day (via phone) •  web pages, links, and comments Microdata = data about individuals. Like "microeconomics"
  23. 23. Facebook friends ONLINE SOCIAL NETWORKS
  24. 24. US Political book sales, August 2008 Valdis Krebs CO-CONSUMPTION
  25. 25. Map of the Persian blogosphere, Berkman Center 2008
  26. 26. "A week on foursquare," Wall Street Journal LOCATION TRACES
  27. 27. INFORMATION DIFFUSION Kony 2012 Twitter network, SocialFlow
  28. 28. Twitter bio word cloud of Kony 2012 network, SocialFlow
  29. 29. TWITTER POLARIZATION Political polarization on Twitter, Connor et al., 2011
  31. 31. PATTERNS OF ASSOCIATION Homophily (i.e., love of the same) is the tendency of individuals to associate and bond with similar others. ... More than 100 studies that have observed homophily in some form or another and they establish that similarity breeds connection. These include age, gender, class, and organizational role. - Wikipedia
  32. 32. THE CHALLENGES OF DATA Interpretation •  How is the data collected? •  Statistical significance •  Causality Use •  Privacy •  Security •  Data can cause harm!
  33. 33. DATA DOES NOT SPEAK FOR ITSELF Numbers are not truth. Not everything can be counted. How you count changes the answer. "Raw data" is an oxymoron. Data has politics.
  34. 34. QUESTIONS ABOUT THE COLLECTION PROCESS Where do these numbers come from? Who recorded them? How? For what purpose was this data collected? How do we know it is complete? What are the demographics? Is this the right way to quantify this issue? Who is not included in these figures? Who is going to look bad or lose money as a result of these numbers? What arbitrary choices had to be made to generate the data? Is the data consistent with other sources? Who has already analyzed it? Does it have known flaws? Are there multiple versions? . . .
  35. 35. "About 75% of the world now has access to a mobile phone." - World Bank, ICT-4D report, 2012 "Are you mapping election violence or are you mapping the location of phone towers?" - Kenyan data analyst UNEVEN REPRESENTATION
  36. 36. Mobile phone use, Kenya 2009 Wesolowski et al. 2012
  37. 37. What doesn't a Twitter map show?
  38. 38. NYC population colored by income
  39. 39. STATISTICAL SIGNIFICANCE You see a pattern. Is it "really there"? Or just chance?
  41. 41. Which charts show real trends, which are random data? HOW OFTEN DOES COINCIDENCE FOOL US?
  42. 42. How many people in this province are green? DOES YOUR SAMPLE GENERALIZE? You talked to 4 people, 3 were green = 75% Actually 3 out of 16 people are green = 19% Your social network isn't random, so your personal experience is probably not representative.
  43. 43. CAUSALITY Correlation is not causation
  45. 45. YX X causes Y YX Y causes X YX random chance! YX hidden variable causes X and Y YX Z causes X and Y Z HOW CORRELATION HAPPENS
  46. 46. YX telling a woman she's beautiful makes her not respond YX if a woman is beautiful, 1) she'll respond less 2) people will tell her that Z Beauty is a "confounding variable." The correlation is real, but you've misunderstood the causal structure. BEAUTY AND RESPONSES
  47. 47. PRIVACY "I have nothing to hide" is a statement of the privileged. Many private organizations know incredible amounts about you. How would you feel if they ran algorithms to determine: •  where you spend your nights •  if you have cancer •  if you are gay •  if you are a victim of domestic violence •  how many protests you have attended
  48. 48. SECURITY – DATA CAN HARM! We call it conflict data... they call it military intelligence. For example, locations of refugees can be used to plan attacks. Especially a problem for crowdsourced maps. Must keep some data secret. Can selectively release data, to selected people, or at selected times. Public Libya map was delayed 24 hours.
  50. 50. THE EPISTEMOLOGY OF CONFLICT RESOLUTION Or: how do we know what works?
  51. 51. DID CONFLICT RESOLUTION WORK IN THE CASE OF ______? "Very few of the mediations or the interventions I have been involved in, I would say were successful. Probably 99% of the time I don't think I helped." - Dr. Joyce Neu
  52. 52. MEDICINE VS. CONFLICT RESOLUTION There is never a counter-factual, so we cannot really do experiments. Do something Treatment Group Unique situation Many similar cases Good outcome Bad outcome Control Group or... What if we'd done something else?The treatment works! Good outcomes Bad outcomes
  53. 53. LEARNING WITHOUT CONTROLLED EXPERIMENTS Our only hope is to systematically collect data on many cases, about: •  the situation •  what was done •  what happened Then, try to find similarities in the situation and what was done, and compare to what happened. See: case-control study design.
  54. 54. SYSTEMATIC QUALITATIVE DATA We don't have to quantize when we collect the data! We can record detailed qualitative documentation. Records, transcripts, journals... as well as all the "hard" conflict data. This is ethnographic data collection. Then choose variables and operational definitions later. But we must be systematic. Collection must include: •  cases where we do nothing •  secret proceedings •  data relevant to theories we disagree with •  hopeless cases and failures
  55. 55. EVIDENCE-BASED CONFLICT RESOLUTION If we're serious about learning what works, we have to collect high-quality data on every case. Many questions: •  What data should be collected? •  Data recorded by humans versus machines •  By who? •  How can we respect confidentiality and privacy? •  Who will archive and distribute it? •  Will this interfere with the peace process? •  What variables should we operationalize now?
  56. 56. DON'T BE THIS CAT Do the math. Future generations are counting on you.