0
Big Data and Me: experiencesfrom the front lineSara-Jayne FarmerChange AssemblyApril 22nd 2013
ME
Me• Data Scientist• Using data to:– connect communities– improve access to information– so people can make better decision...
Some of those People(smart, talented, dedicated hackers in Haiti, January 2013)
My Personal Three Vs• Variety– Data all over the place– Csv, json, xml, excel, pdf, text, webpages, rss, scannedpages, ima...
VARIETY
“more people have mobile phones than toilets”– UN, March 2013
But… but… there are always data issues…• Datasets were difficult to find• No data available after 2010• Hard to track prov...
e.g. Country NamesDR Congo in Data.UN.Org:• “Congo, Democratic Republic of the”, “CongoDemocratic”, “Democratic Republic o...
And coding
And interpretation• Hang on… don’t some people have more than onephone?• And how do you count the people without toilets?•...
And purpose
And Communication
And Alternative Data Sources
And alternative alternatives…• Social media proxies• Grassroots maps• Etc.
VELOCITY AND VOLUME
2013 Boston bombings
The Humans+Tools Solution: Crisismapping
Find…
Listen…
Estimate…
Geolocate…
Create maps…
Analyse
Explain
Use
BUT WE NEED MORE DATASCIENTISTS…
Build and Connect Communities
Train Non-Techies
Create Higher-level Tools
Big Data and Me: experiencesfrom the front lineSara-Jayne Farmerhttp://www.changeassembly.com/@bodaceacat
MORE REFERENCES
strataconf.com
datasciencecentral.com
analytictalent.com
Tools
Formal (Free) Training
NYC Meetups (see meetup.com)
Volunteering: datakind.org
Upcoming SlideShare
Loading in...5
×

Big Data and Me

331

Published on

Talk given to Touro College's Leadership in Digital Technology innovation course, April 22nd 2013

Published in: Technology, Travel, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
331
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Big Data and Me"

  1. 1. Big Data and Me: experiencesfrom the front lineSara-Jayne FarmerChange AssemblyApril 22nd 2013
  2. 2. ME
  3. 3. Me• Data Scientist• Using data to:– connect communities– improve access to information– so people can make better decisions– on both small and large scales• It’s all about people:– Local people: know their needs; need more information– Local technologists: have skills; need connections– Large organisations: have resources; need guidance
  4. 4. Some of those People(smart, talented, dedicated hackers in Haiti, January 2013)
  5. 5. My Personal Three Vs• Variety– Data all over the place– Csv, json, xml, excel, pdf, text, webpages, rss, scannedpages, images, videos, audiofiles, maps, proprietary. Etc.• Velocity– Streams updating too fast for a mapping team (100-200 people)to handle– Pages updating too frequently to check by hand• Volume– Can’t open the data in a spreadsheet– Can’t fit the data on my laptop– Maxes out my credit card (thank you Amazon!)
  6. 6. VARIETY
  7. 7. “more people have mobile phones than toilets”– UN, March 2013
  8. 8. But… but… there are always data issues…• Datasets were difficult to find• No data available after 2010• Hard to track provenance – e.g. what decisions didthe people creating these datasets make? Whatassumptions?• Data was rounded up• Countrynames didn’t match between sets• Multiple charactersets (e.g. Å, A, Ԇ)• Messy formatting (merges, ‘explanations’ etc)
  9. 9. e.g. Country NamesDR Congo in Data.UN.Org:• “Congo, Democratic Republic of the”, “CongoDemocratic”, “Democratic Republic of the Congo”, “Congo(Democratic Republic of the)”, “Congo, Dem. Rep.”, “CongoDem. Rep.”, “Congo, Democratic Republic of”, “Dem. Rep.of Congo”, “Dem. Rep. of the Congo”DR Congo in common standards:• “Democratic Republic of the Congo” (UNStats), “Congo, The Democratic Republic of the”(ISO3166), “Congo, Democratic Republic of the”(FIPS10, Stanag), “180” (UN Stats), “COD”(ISO3166, Stanag), “CG” (FIPS10)
  10. 10. And coding
  11. 11. And interpretation• Hang on… don’t some people have more than onephone?• And how do you count the people without toilets?• What if the cities have lots of phones and toilets, andthe rural areas don’t?• Where does my composting toilet fit in this?• How big were these surveys?• What do we do with the zeros?• Etc…
  12. 12. And purpose
  13. 13. And Communication
  14. 14. And Alternative Data Sources
  15. 15. And alternative alternatives…• Social media proxies• Grassroots maps• Etc.
  16. 16. VELOCITY AND VOLUME
  17. 17. 2013 Boston bombings
  18. 18. The Humans+Tools Solution: Crisismapping
  19. 19. Find…
  20. 20. Listen…
  21. 21. Estimate…
  22. 22. Geolocate…
  23. 23. Create maps…
  24. 24. Analyse
  25. 25. Explain
  26. 26. Use
  27. 27. BUT WE NEED MORE DATASCIENTISTS…
  28. 28. Build and Connect Communities
  29. 29. Train Non-Techies
  30. 30. Create Higher-level Tools
  31. 31. Big Data and Me: experiencesfrom the front lineSara-Jayne Farmerhttp://www.changeassembly.com/@bodaceacat
  32. 32. MORE REFERENCES
  33. 33. strataconf.com
  34. 34. datasciencecentral.com
  35. 35. analytictalent.com
  36. 36. Tools
  37. 37. Formal (Free) Training
  38. 38. NYC Meetups (see meetup.com)
  39. 39. Volunteering: datakind.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×