Big Data and Me
Upcoming SlideShare
Loading in...5
×
 

Big Data and Me

on

  • 437 views

Talk given to Touro College's Leadership in Digital Technology innovation course, April 22nd 2013

Talk given to Touro College's Leadership in Digital Technology innovation course, April 22nd 2013

Statistics

Views

Total Views
437
Views on SlideShare
432
Embed Views
5

Actions

Likes
0
Downloads
2
Comments
0

1 Embed 5

https://twitter.com 5

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big Data and Me Big Data and Me Presentation Transcript

  • Big Data and Me: experiencesfrom the front lineSara-Jayne FarmerChange AssemblyApril 22nd 2013
  • ME
  • Me• Data Scientist• Using data to:– connect communities– improve access to information– so people can make better decisions– on both small and large scales• It’s all about people:– Local people: know their needs; need more information– Local technologists: have skills; need connections– Large organisations: have resources; need guidance
  • Some of those People(smart, talented, dedicated hackers in Haiti, January 2013)
  • My Personal Three Vs• Variety– Data all over the place– Csv, json, xml, excel, pdf, text, webpages, rss, scannedpages, images, videos, audiofiles, maps, proprietary. Etc.• Velocity– Streams updating too fast for a mapping team (100-200 people)to handle– Pages updating too frequently to check by hand• Volume– Can’t open the data in a spreadsheet– Can’t fit the data on my laptop– Maxes out my credit card (thank you Amazon!)
  • VARIETY
  • “more people have mobile phones than toilets”– UN, March 2013
  • But… but… there are always data issues…• Datasets were difficult to find• No data available after 2010• Hard to track provenance – e.g. what decisions didthe people creating these datasets make? Whatassumptions?• Data was rounded up• Countrynames didn’t match between sets• Multiple charactersets (e.g. Å, A, Ԇ)• Messy formatting (merges, ‘explanations’ etc)
  • e.g. Country NamesDR Congo in Data.UN.Org:• “Congo, Democratic Republic of the”, “CongoDemocratic”, “Democratic Republic of the Congo”, “Congo(Democratic Republic of the)”, “Congo, Dem. Rep.”, “CongoDem. Rep.”, “Congo, Democratic Republic of”, “Dem. Rep.of Congo”, “Dem. Rep. of the Congo”DR Congo in common standards:• “Democratic Republic of the Congo” (UNStats), “Congo, The Democratic Republic of the”(ISO3166), “Congo, Democratic Republic of the”(FIPS10, Stanag), “180” (UN Stats), “COD”(ISO3166, Stanag), “CG” (FIPS10)
  • And coding
  • And interpretation• Hang on… don’t some people have more than onephone?• And how do you count the people without toilets?• What if the cities have lots of phones and toilets, andthe rural areas don’t?• Where does my composting toilet fit in this?• How big were these surveys?• What do we do with the zeros?• Etc…
  • And purpose
  • And Communication
  • And Alternative Data Sources
  • And alternative alternatives…• Social media proxies• Grassroots maps• Etc.
  • VELOCITY AND VOLUME
  • 2013 Boston bombings
  • The Humans+Tools Solution: Crisismapping
  • Find…
  • Listen…
  • Estimate…
  • Geolocate…
  • Create maps…
  • Analyse
  • Explain
  • Use
  • BUT WE NEED MORE DATASCIENTISTS…
  • Build and Connect Communities
  • Train Non-Techies
  • Create Higher-level Tools
  • Big Data and Me: experiencesfrom the front lineSara-Jayne Farmerhttp://www.changeassembly.com/@bodaceacat
  • MORE REFERENCES
  • strataconf.com
  • datasciencecentral.com
  • analytictalent.com
  • Tools
  • Formal (Free) Training
  • NYC Meetups (see meetup.com)
  • Volunteering: datakind.org