I’m Sara Terp - Director of Data Projects at Ushahidi, and long-term crisis data nerd. I’ve been asked to talk about how this magic happened, and how we’re creeping up on crisis data science. Mappers each have our own stories - the story of an Information Management Officer inside UNOCHA is different to the story of an academic at Harvard, is different to mine. Five years ago, I saw a data system that was badly broken and decided to dedicate 5 years of my life to help to fix it. I’ve watched and been part of some of the evolution of humanitarian data management. I’ve been in one tornado, two hurricanes, snowstorms, floods, 1 nuclear alert, 1 conflict, 1 cold war, but never been an on-the-ground responder. This is what I’ve seen happening over the years, and where I think (or hope) we’re all going. Linked data is about the relationships between things, so let’s see where the relationships in crisis community data are.
Let’s spoil the surprise a bit. This is what I’ve observed over years of handling crisis, development and community data.
Let’s talk through some of what’s happened over the past few years, from the perspective of a volunteer data nerd. I won’t talk about every crisis and every deployment - most of the big ones are well covered already, but I will talk about some of the ways that we got to where we are today. This is development data science before I got involved. 2004-2009 I was designing UAV and intelligence systems for the UK military and running an innovations group to combine new ideas, new technologies and business ideas. I saw tragedies like the Boxing Day Tsunami, but couldn’t see any way that I, as a technologist, could help. * Sahana is an information management product, designed to be used by disaster response groups * Ushahidi is a crowdsourcing tool, designed for easy reporting through SMS and emails, and easy summary through categorized datasets and maps.
I was working on ways for humans and intelligent agents to work together on time-limited tasks. I could sum that up as humans don’t have all the info, machines don’t have all the smarts, but sometimes you have to compromise to get stuff done in time.
I also designed unmanned vehicle systems, where safety means you have to clearly define the control, responsibilities and interactions shared between a vehicle system and its human pilot. Variable autonomy is about sharing the load when needed, e.g. When to use humans, when to use machines as “surge capacity”? How much can we trust the machine to do? I saw a lot of the overload problems in crisis data as a place that we might be able to apply variable autonomy theory.
And working on better ways to manage company knowledge and innovations, looking at things like DKCP which codifies the interactions between concept and knowledge space.
And then the 2010 Haiti earthquake hit, and a call went out across London for anyone who’d recently run a Barcamp, to organize one not in the usual 3-4 months, but in 2-3 days. It all started here for many crisis data nerds.
A bunch of us got together in London. My favorite quote was from one of the disaster relief people who worked with us: just simply “I’ve always wanted to do this”. And that was really the point of the CrisisCamps: not just to produce data and technologies, but also, because we were outside the system, to do the things that NGO information people wanted to do but couldn’t get top-cover or resources for.
Picture: CrisisCamp London during the Haiti response.
And a bunch of people got together and dialed in from all over the world.
The original VTCs were grassroots organizations - some still are. In Haiti and several crises that followed, local people and diaspora connected and were part of each response. And when you process crisis data, you have to remember that you only have the data that could be created or sent, and that local knowledge is often the thing that makes the difference.
Picture: Haitian developers and data nerds, designing and building a data system for gender-based violence counsellors.
This is the skills list we made at the first CrisisCamp London. We had some serious experts in the room, both on specific subjects (e.g. UX), platforms (e.g. OpenStreetMap) and uses. Crisiscommons were talking to ngo imos, govts, military, ngo responders, community responders, digital responders and ronins. Ronins are important: they’re the people outside groups - they can bring new skills and knowledge if carefully connected; they can bring chaos and divided attention if not.
Or, if think about that in terms of Drew Conway’s data science diagram, we had the hacking skills and substantive expertise, but we weren’t ready to use any of the stats knowledge in the room. Much of the story of development data science has been about pulling these skills together.
Or, if we look at in in terms of data processing, mappers were obtaining datasets, cleaning, interpreting and communicating them, but were missing the “explore" and “try models” skills.
List: OSEMN: Obtain-Scrub-Explore-Model-Interpret model
One of the changes in Internet2.0 was that people stopped being consumers of handed-out information and started producing information and having conversations about it. The Internet did what the Internet did; the large NGOs were still on the old broadcast model. This had the potential to either blow up or be a carefully targeted information source in Haiti.
A team at Tufts and elsewhere set up an Ushahidi instance so that any SMS message sent to 4636 would end up on a categorized map. To get that map, volunteers around the world (e.g. the far table in the London photo) translated, categorized and geolocated those SMS messages. It also had to follow “First, do no harm” - we saw many personal details being sent into the map and assessing their potential risk was hard.
Here’s the basic process. But you can’t geolocate without a map. This was a problem in Haiti, and it’s still a problem for every new crisis: we haven’t mapped every road, building, town or even region in the whole world, and most of those missing areas are in crisis zones. OpenStreetMap fixed that gap - volunteers traced aerial data and used paper maps and local memories to draw and tag Port-au-Prince in days. This triggered the formation of both Humanitarian OpenStreetMap and the Missing Maps project.
When you design LOD for crises, you have to build in that some of your datasets will be created on-the-fly.
Mappers also generated data from images: here, they searched aerial imagery for the tarpaulins over an informal displaced persons’ camp, then marked it as a campsite on OpenStreetMap.
Image: aerial image of Port-au-Prince with informal camp (tarpaulins).
Mappers also had a lot of developer volunteers, and developers like to build stuff. November 2010 CrisisCommons projects list: http://wiki.crisiscommons.eu/wiki/Project_Statuses_November_2010. Mappers also worked on a lot of technologies with RHOK - and learnt that the successful projects were the ones with end-user buy-in.
But more tech wasn’t the problem: the problem was getting better data and collaboration between groups.
Other volunteer projects (NB lots of groups collaborated on projects) included * “We Have, We Need” Craigslist of self-identified needs and requests by non-profits assisting in Haiti relief operations. Built in days. Biggest moment: getting generator fuel to a hospital 20 minutes after they tweeted for help * Haiti Hospital Capacity Finder. Listed free beds in field hospitals
The London camp missed the first few days of the response: by the time they pitched in, there were many development projects already happening. They looked at the projects list, and took a different approach: using their connections to help improve already-existing systems.
Groups built interfaces from paper to GIS maps and back. They also built apps to move data from google spreadsheets to APIs and back too (it isn’t hard to do - just technical and needs a bit of thought - e.g. if you can use the top lhs cell of a spreadsheet as a flag for whether to update it or not).
Image: walking papers map; these are used to convert OSM maps to paper and back (note the QR code in the corner of the map; this is used to position the uploaded paper map onto the OSM map).
Public Laboratory (PLOTS) formed during the Mexican gulf oil spill, to create and use low-cost, community-based sensors and bring measurement science to new communities. They’ve been creating kite-based aerial data for years now, and have designed things like the CD-and-cardboard spectrometer.
Mappers thought about what we meant by data needs: what was needed, and when. Pre-crisis resilience was important, and many efforts to produce pre-crisis datasets (including CrisisWiki) were tried, but often failed from inter-crisis lack of enthusiasm.
2010 CrisisCampLondon slide
Public Lab started designing and using low-cost sensors in the BP oil spill. This was yet another potential data source.
And the types of design question we needed to ask of the data.
And the analysis needed to produce that. Mappers framed these needs in their working environments: * Physical environment - phone signal, internet, * Political environment - e.g. data as power * Technical environment
This was the first deployment that worried me. Despite all our good intentions, our ability to track social media and reach out to people, we didn’t have a good way to help thousands of people in trouble. I started wondering how we might start handling data points that moved in time and space. Eventually those thoughts made it into the W3C GIS standards.
This was our vision of an effective crisis information ecosystem: * Established data gathering technologies (mostly) * Cooperation as standard * Open data systems = crisis data systems We’ve thought about (almost) everything People know where to get information People know where to help out
And this is what we lived by. In many ways, Haiti was the start of something very beautiful; in other ways, it was a hot mess of crowds of people all trying to help the same responders.
At the same time, Project EPIC was quietly created human-tagged Twitter feeds.
SBTF launched in late 2010, with a different focus to the earlier groups; it had specialist teams, and would only activate if asked to by a responding partner (NGO, news agency, local group etc). Here’s an example of the type of workflow the volunteers followed; this workflow is from a later deployment (the 2012 Kenyan election violence monitoring), but pretty much covers the teams: SMS team handling SMS messages sent to a Ushahidi platform; media monitors searching online for related information; translation team converting to English; Geolocation finding lat/long from addresses; report team adding categories, and verification team making sense of groups of rather than single reports. Mappers also tested new technologies and listed found datasources.
Here are some of the tasks that human crowdsourcers did. Some of these can be automated; some can be partially automated; others we might need to keep as human activities How we do this depends partly on the point of the dataset: in 2010, the emphasis on reporting needs shifted to an emphasis on “tell us what you see” - this changed the message-sensitive pressure to catch every relevant message (e.g. cries for help) into a pressure to produce a timely summary of the situation.
We worked on a lot of disasters that year, and in the years to follow. 2010 was special because existing crisis camp leads encouraged camps to form around the world, with local leaders working in local languages. The Chile community managed their own earthquake data with help from neighboring spanish-speaking countries; the Thailand camp formed during the Pakistan floods then went on to handle floods in Thailand too (which was great because there are limits to Google Translate). This only stopped when the tensions between a distributed barcamp-style federation model and a centralized hierarchical control one became too strong.
I joined the UN’s big data team (UN Global Pulse), in the hope it would fill in the missing pieces of that Drew Conway diagram. Mostly we concentrated on ways to reduce the time between a development crisis starting and data becoming available on it. At the time there were very few developers in the UN. I discovered that the UN is full of people who succeed despite its politics, and had many opportunities to meet with them and talk about humanitarian GIS and data science.
Here, we started thinking about crises that weren’t going to be over in hours, days or (at a stretch) weeks.
And then all heck broke loose. Two major crises happened at the same time, and mappers found themselves dealing with Fukushima’s radiation data (crowdsourced radiation monitoring) and trying to “do no harm” during a conflict (Libya crisis map).
Here’s a snapshot of the Libya Crisis Map report form. Every report in a Ushahidi platform must (unless we’ve disabled some code). Mappers started collecting and listing these categories, and thinking hard about what a standard set would be, before we started having problems comparing datasets.
Internews wrote a lovely report on these category lists too: https://innovation.internews.org/sites/default/files/research/InternewsWPCrowdGlobe_Web.pdf
This was the first cross-check of data generation by both machines and humans. Both were asked to tag buildings (shacks, houses and large buildings) as a proxy for population densities in the Afgooye region of Somalia. The machine (EU) and human results were comparable. This took over my Christmas, and the Christmas of many other volunteers. Tomnod has gone on to run many other satellite-tagging deployments, including tagging the seats of wildfires in Australia.
At some point that year, I got annoyed that data.un.org didn’t have an API, created my own and started looking at the crosswalks between datasets in it. I munched lists of csv headers, and starting looking at variations between the datasets under those headers. It didn’t take long to start running into problems combining datasets. The one that I chose as an example was country names… this is still causing problems today, and is going to be an issue for anyone linking data today. The CrisisNet team has started working on auto-detecting data column types.
By 2012, data science was well underway and the open data crowd had lots of experience using algorithms to clean datasets. The USAID dataset was pre-cleaned (automated geolocation) before volunteers coded the locations that couldn’t be found by machines. This wasn’t a sudden-onset deployment, but a useful test of people and machines sharing tasks.
This was the ACAPS DNA deployment: a test to see if volunteers could help gather the standard data for their DNA product, without losing accuracy. Mappers built scrapers and designed an automated crisis data collection system, but we still needed humans to search obscure corners of the web for relevant information. The country name crosswalks were very useful in this.
We still built maps. Lots of maps.
Mappers still spent a lot of time searching the internet and archives for small pieces of data; in this case, a list of operational health facilities in Libya after the crisis there. Much of this data search is an exercise in lateral thinking and connecting to other groups (like the Libyan doctors’ facebook group), and in the frustrations of geolocating buildings that had no street addresses (but were referred to in terms of the journey to them, e.g. left at this mosque etc).
Need useful, actionable data.
And here are some of the issues we saw. Note that almost nobody wanted to work on crisis data in-between crises - it’s not as sexy as “saving lives with data”.
But the whole system (communities, processes, tech and innovation) still needed work.
Slide from 2012 talk
Mappers used some of the wells data from Sudan at the Guardian’s 2012 Development Data Hackathon (http://www.eventbrite.co.uk/e/development-data-challenge-london-tickets-3990385350). Mappers also long conversations about linking funding and project data given the variable level of representation in the IATI standard.
Image: Micromappers using PyBossa platform (Python version of Stanford’s Bossa crowdsourcing platform) for Typhoon Yolanda deployment. One of the UN volunteers (Simon?) produced a beautiful choropleth mashup of poverty data and damage estimates - this was the first linked data visualization that I’d seen in this space.
What Micromappers brought was control over the system inputs (e.g. code from people like Hemant could filter messages before passing to volunteers) and cleaner workflows: 2 clicks instead of the 6-8 clicks to tag a message in standard Ushahidi (Ushahidi had workflow code, but it wasn’t widely used).
This still led to maps.
We saw overload in a lot of crises, including the Boston bombing.
This is a t-shirt printed after the 2010 Chile earthquake. The message on it reads “plz send help to 1712 estacion central, santiago chile. im stuck under a building with my child. #hitsunami #chile we have no supplies”.
iHub data wrote a research report on the other 3Vs: viability, verification, validity: http://community.ihub.co.ke/blogs/15644/3vs-crowdsourcing-framework-for-elections-launched http://www.ihub.co.ke/ihubresearch/jb_VsReportpdf2013-8-29-07-38-56.pdf
But… it’s hard to take people from 0 to data scientist. It’s easier to build tools that are easy to install and use.
The OpenCrisis team tried building a humanitarian data and data links store (the Humanitarian Data Project); somewhere to collect all the datasets we’d found over the past few years, and make sure other people could find things like Karen Payne’s wonderful spreadsheet of crisis data links. Mappers searched old deployment folders for data source lists and wrote code to import data from sources including Google spreadsheets into CKAN instances, but it was painful, beyond painful, setting up and managing a CKAN instance for this, and difficult to sustain as a volunteer group without sysadmins.
We were greatly relieved when UNOCHA’s Humanitarian Data Exchange appeared later that year doing the same thing, and pulled the plug on HDP.
We build tools for democratizing information, increasing transparency and lowering the barriers for individuals to share their stories. We’re willing to take risks in the pursuit of changing the traditional way that information flows in the world.
40,000 Deployments; 49,000 Mobile Downloads.
I joined Ushahidi to improve data literacy in one of the Crisismappers’ most-used tools. I still don’t work on crisis data, but I have had long conversations about what a data scientist would need in the main platform. Ushahidi Platform is built on a database. It has datasets embedded in it. In Platform V2 you can access the reports list (the dots on the map) through the API and CSV download, but there’s much much more sitting in each platform, waiting to be claimed.
Or alternatively: community reports in, stories out. And we can already do some basic reasoning with this: for instance, using point-in-polygon methods to check GIS labels like region and country against lat/longs.
Ushahidi platform V2 has admin-defined forms. That means that users can create a set of report forms with different fields in them; that in turn makes for a more powerful set of tables. Ushahidi tried using forms to represent sensor data, but in practice it’s easier to add sensor data as attachments to existing reports. We went a little further with this, creating an Ushahidi plugin that connects Ushahidi instances together, to share data in common data fields and categories. This gives individual groups control over their own category lists, without disrupting the central view of all instances.
But we still have the issue of a “dumb” input feed. This is where we need some filtering and intelligence.
But we still have the issue of a “dumb” input feed. This is where we need some filtering and intelligence, before data gets to human processors and the map.
We’ve also put in a lot of D3 code, to give non-specialist users access to visualisations of their datasets. And thought about other automations, for instance Named entity recognition using the Umati plugin, and geolocation from those named entities. Autotranslation through google translate api Auto-categorisation from text; auto-tagging using external programs Retweet removal using external programs; and, please please, spam removal
I’ve been working on the Pheme project, with a consortium working on veracity checking on social media, including detecting contradiction/controversy and tagging rumors as non-verified potential facts, misinformation and disinformation, in multiple languages, going from “about the same thing” to “confirms/contradicts”. Part of this is starting to look at Ushahidi platform data on the word level. Another part is treating Ushahidi reports as a dataset that can be tagged from outside (eg. using categories and the API) to highlight message components and influences.
Typhoon Ruby, like Yolanda, started small - a footnote in an article about a previous supertyphoon. Text: from http://philnews.ph/2014/12/01/pagasa-forecasts-an-lpa-to-enter-philippines-on-friday-bagyong-ruby/
Having decided on a country, the storm was veering around so much that it was hard to tell where it would make landfall. This is important because none of the Filipino coasts are completely mapped, and anywhere the storm hit would need maps very quickly.
In crises, government dataset pages fail (e.g. in Sandy and Ruby). They’re usually back up during the response, but it helps to be prepared for this. There are also still PDF datasets out there - Ruby produced a dataset that was a PDF image of a slightly-tilted spreadsheet that none of the tools I had (including OCR) could scrape. Often these things are better solved politically than technically.
And this itself created an issue: the government groups were reluctant to share with the NGOs operating on their turf.
http://www.gov.ph/crisis-response/typhoon-ruby/#section-1. Micromappers happened, classifying images just after a similar exercise classifying tree damage from Cyclone Yolanda. I start working out if I can get away with filtering out “Pray” and “God” from the tweet dataset.
HDX started to be used.
OSM map changes during Typhoon Ruby. This is important. Most people in the Philippines speak English, but Tagalog is the official language, with about 100 dialects (http://en.wikipedia.org/wiki/Languages_of_the_Philippines)
Mappers already saw data science in 2013 with the choropleth mashups of poverty against typhoon track, but in 2015, data science and development data are really starting to converge, spurred by a combination of available tools, easily-available data science training and curiosity.
At the same time as crisis mapping took off, so did data science. It had enthusiastic commercial sponsors (e.g. O’Reilly Media), conferences (STRATA), knowledge aggregators (Data Science Central), MOOCs, meet ups and DataKind. Humanitarian data science has been a little slower to take off. Mappers have datastores, mashups and map projects (like Missing Maps). Mappers’re missing ?
We still have Ushahidi maps, now acting as data sources for other applications. The Ebola crisis has been a long one, with many groups involved; that length has given them time to clean datasets and try new technologies.
HDX page from the Ebola response. Simon Johnson of the Red Cross has been producing many Tableau visualizations and dashboards from these datasets. Ebola geonode is also available, and the HDX standards group is slowly making headway. Humanitarians have been working with new technologies for a while now (e.g. UNOOSA’s work on UAV systems), including Unmanned vehicles, Low-cost sensors, Wearable technology, Data science and AI, but it’s also important to build for what people have available to them: Mobile phones, USB sticks, Excel, Googledocs
But the most important thing that’s happening is that we’ve shifted back to improving crisis resilience (not the same thing as preparedness: resilience should be built into the normal operations of a community, not just remembered in a crisis) at the community level. And that’s where linked data can really help.
Image: Rockfeller Resilience Initiative front page, http://www.100resilientcities.org/#/-_/
Image: Taarifa map of water points in Tanzania. This help deal with the issues that will always be there in a crisis.
We have things like Karen Payne’s list of data sources, and ontologies (e.g. WWHGD) of basic data needs - why aren’t we automating putting these together?
This lot have structure
Evolution of the Humanitarian Data Ecosystem
Evolution of the
Sara Terp, AAAI 2015
SJ’s Stages of Data Use
• Hand-scraping (including lists of where to look),
random categories, SMS, maps
• Standards and dataset visualisations
• Mashups and statistical analysis
• Stable datastores and local data scientists
• December 2004: Boxing Day Tsunami kills 230,000 people. Sri
Lankan techs create Sahana
• January 2008: Kenyan news blackout during post-election violence.
Bloggers create Ushahidi
• June 2009: CrisisCommons forms after a tweet-up
• October 2009: ICCM conference, Cleveland
• 2009: Ushahidi creates CrisisMappers
• 2009: First RHOK hackathon creates PeopleFinder
• 2009: CDAC forms after a discussion in a bar
Good at: complex analysis,
translations, creative data
finding, sudden onset
Not so good at: high volume,
repetitive, 24/7 accurate
Good at: high volume,
pattern finding, long term
Not so good at:
complexity, human foibles
Unmanned Vehicle Control
PACT locus of Authorith Computer Autonomy PACT Level Sheridan & Verplank
Computer monitored by
Full 5b Computer does everything autonomously
Computer chooses action, performs it &
Computer backed up by
Action unless revoked 4b
Computer chooses action & performs it
unless human disapproves
Computer chooses action & performs it if
Human backed up by
Advice, and if
Computer suggests options and proposes
one of them
Human assisted by
Advice 2 Computer suggests options to human
Human assisted by
computer only when
Advice only if requested 1
Human asks computer to suggest options
and human selects
Operator None 0
Whole task done by human except for
“Don’t be Imperial”
Pro: “Laboratory” =
on behalf of
Per: “Community” =
Para: “Grassroots” –
by and within
Volunteer Skills Used
IT project management
Relief work experience
Communications & PR
Facilitation and admin
• People add features to OpenStreepMap
• Person sends SMS to 4636
• Message goes to CrowdFlower
• Person translates and geolocates message
• Message goes to Ushahidi display
• Message gets to responders, public, aunts, Sahana etc.
• CDAC website review
• Field Voices
• Haiti Amps Network
• Haitian Voices
• Machine Translation System
• Oil Spill Response
• PAP outskirts food relief
• Telecommunications technical project
• Low-bandwidth Ushahidi
• Kapab Medical Facility Capacity Finder
• Disaster Accountability Public Database
• Sync the Sheet
• Testing Crabgrass
• Translators in Action - other translation tools were
• Mining Relief Data
• Automating Aid Request via a Voice Phone Call
• Building A Refugee Camp Cell Phone Early
• Community Tool Box
• CrisisCommons Roledex
• Facebook for ARC Safe and Well site
• Haitian Skilled Workforce Retention
• Post Disaster Child Protection
• CDAC Radio Website
• Disaster Accountability Hotline
• Incident visualisation
• Needs Categorization
• World Academic TeaCHing Hospitals disaster
• ReliefWeb UX redesign
• Ushahidi UX redesign
• CDAC website review
• OpenStreetMap development, at other end of table;
OpenStreetMap users at the other
What’s an appropriate crisis to help?
– Information deluge
– Knowledge drought
– Local infrastructure is overwhelmed
– Existing information channels
user questions for pkfloods
• Where can I find out who needs my help?
• Where can I find people to help me deliver aid?
• Where can I find out information?
• How do I find out if I'm about to be flooded?
• Who should I alert/give my information to?
• Where can I find general information out about #pkfloods?
• Where can I search for people? (I cannot find my grandmother/relative)
• I have been 'found' - who should I alert/give my status to?
• I need food/water/supplies, how can I tell people I need something?
• I have food/water/supplies, how can I find out where there's a need?
• I want to get to location x, where can I find out about the state of the roads?
• I am observing/know the state of the roads, who should I alert/give my
• How can I find out where there are information blackspots/there is no
• I know where the telecoms/information blackspots are, who should I give my
alert/information to and how?
Pkfloods Use Cases
What if the datapoints move?
• Ash cloud from Snæfellsjökull left planes on ground
and thousands of people stranded
• UK crisis mappers started news and twitter watches
• Needed a tool that let us track who was stranded
and ways for people to get home
• But all the methods we had were static
The 2010 Vision:
effective crisis information ecosystems
Droughts, agriculture, food insecurity, conflict,
education, disease, employment, shelter, trade,
endemic violence, GBV etc.
“Human development is a process of enlarging people’s choices.
The most critical ones are to lead a long and healthy life, to be
educated and to enjoy a decent standard of living. Additional
choices include political freedom, guaranteed human rights and
self-respect – what Adam Smith called the ability to mix with
others without being ashamed to appear in public” – UNDP Human
DR Congo in Data.UN.Org:
“Congo, Democratic Republic of the”, “Congo Democratic”, “Democratic Republic of the
Congo”, “Congo (Democratic Republic of the)”, “Congo, Dem. Rep.”, “Congo Dem.
Rep.”, “Congo, Democratic Republic of”, “Dem. Rep. of Congo”, “Dem. Rep. of the
DR Congo in common standards:
“Democratic Republic of the Congo” (UN Stats), “Congo, The Democratic Republic of
the” (ISO3166), “Congo, Democratic Republic of the” (FIPS10, Stanag), “180” (UN
Stats), “COD” (ISO3166, Stanag), “CG” (FIPS10)
Common Data Needs
• Rolodexes: which response groups to follow, and who’s
likely to bring what
• 3Ws: who’s doing what where
• GIS data: knowing where medical facilities, schools, roads,
• Communications: cell tower locations and signal maps
• Technology and social media use to demographics
Commonly Available Data
• Direct messages (SMS etc)
• Social media messages (tweets etc)
• Demographic data (e.g. surveys)
• News reports
• 3Ws, situation reports (both official, via news sources and on
social media), field notes
• Photos: ground, aerial, satellite, videos
• CSVs, webpages, PDFs, audio recordings (e.g. radio)
• Massively dispersed and unstructured data (still)
• Named entity and category mismatches between datasets
• Personally Identifiable Information (and risk)
* Crisis response is time-limited
* Crisis data response is resource-limited
* Crisis preparation is attention-limited (if you want resilience,
either pay or lead)
(Some of) What’s Broken
• Crisis Data
– Remote vs Ground disconnect
– Crisis vs Development disconnect
– Deployment lead overload
• Development Data
– Broken data formats, access, coverage, standards
– Ignored data sources
– Human vs Data disconnect
– Stovepipes, fiefdoms, imperialism, finding…
My Personal Three Vs
– Data all over the place
– Csv, json, xml, excel, pdf, text, webpages, rss, scanned pages, images,
videos, audiofiles, maps, proprietary. Etc.
– Streams updating too fast for a mapping team (100-200 people) to handle
– Pages updating too frequently to check by hand
– Can’t open the data in a spreadsheet
– Can’t fit the data on my laptop
– Maxes out my credit card (thank you Amazon!)
Here are some missing
• Basic vocabularies, e.g. stopword lists for most languages
(including SMSspeak in different languages)
• Pre-crisis datasets for many crisis-prone countries
• Philippines: local response groups set up
• Missing Maps project for GIS data
• What about the rest?
• User datasets in existing tools
• E.g. adding own gazetteers into Ushahidi.