SlideShare a Scribd company logo
1 of 39
Big Data and Me: experiences
from the front line
Sara-Jayne Farmer
Change Assembly
April 22nd 2013
ME
Me
• Data Scientist
• Using data to:
– connect communities
– improve access to information
– so people can make better decisions
– on both small and large scales
• It’s all about people:
– Local people: know their needs; need more information
– Local technologists: have skills; need connections
– Large organisations: have resources; need guidance
Some of those People
(smart, talented, dedicated hackers in Haiti, January 2013)
My Personal Three Vs
• Variety
– Data all over the place
– Csv, json, xml, excel, pdf, text, webpages, rss, scanned
pages, images, videos, audiofiles, maps, proprietary. Etc.
• Velocity
– Streams updating too fast for a mapping team (100-200 people)
to handle
– Pages updating too frequently to check by hand
• Volume
– Can’t open the data in a spreadsheet
– Can’t fit the data on my laptop
– Maxes out my credit card (thank you Amazon!)
VARIETY
“more people have mobile phones than toilets”
– UN, March 2013
But… but… there are always data issues…
• Datasets were difficult to find
• No data available after 2010
• Hard to track provenance – e.g. what decisions did
the people creating these datasets make? What
assumptions?
• Data was rounded up
• Countrynames didn’t match between sets
• Multiple charactersets (e.g. Å, A, Ԇ)
• Messy formatting (merges, ‘explanations’ etc)
e.g. Country Names
DR Congo in Data.UN.Org:
• “Congo, Democratic Republic of the”, “Congo
Democratic”, “Democratic Republic of the Congo”, “Congo
(Democratic Republic of the)”, “Congo, Dem. Rep.”, “Congo
Dem. Rep.”, “Congo, Democratic Republic of”, “Dem. Rep.
of Congo”, “Dem. Rep. of the Congo”
DR Congo in common standards:
• “Democratic Republic of the Congo” (UN
Stats), “Congo, The Democratic Republic of the”
(ISO3166), “Congo, Democratic Republic of the”
(FIPS10, Stanag), “180” (UN Stats), “COD”
(ISO3166, Stanag), “CG” (FIPS10)
And coding
And interpretation
• Hang on… don’t some people have more than one
phone?
• And how do you count the people without toilets?
• What if the cities have lots of phones and toilets, and
the rural areas don’t?
• Where does my composting toilet fit in this?
• How big were these surveys?
• What do we do with the zeros?
• Etc…
And purpose
And Communication
And Alternative Data Sources
And alternative alternatives…
• Social media proxies
• Grassroots maps
• Etc.
VELOCITY AND VOLUME
2013 Boston bombings
The Humans+Tools Solution: Crisismapping
Find…
Listen…
Estimate…
Geolocate…
Create maps…
Analyse
Explain
Use
BUT WE NEED MORE DATA
SCIENTISTS…
Build and Connect Communities
Train Non-Techies
Create Higher-level Tools
Big Data and Me: experiences
from the front line
Sara-Jayne Farmer
http://www.changeassembly.com/
@bodaceacat
MORE REFERENCES
strataconf.com
datasciencecentral.com
analytictalent.com
Tools
Formal (Free) Training
NYC Meetups (see meetup.com)
Volunteering: datakind.org

More Related Content

What's hot

Libraries in the Gigabit World
Libraries in the Gigabit WorldLibraries in the Gigabit World
Libraries in the Gigabit WorldNate Hill
 
Hyperlocal data journalism - Andy Dickinson
Hyperlocal data journalism - Andy DickinsonHyperlocal data journalism - Andy Dickinson
Hyperlocal data journalism - Andy DickinsonDataJournalismUK
 
Building Data-centric Media Organizations
Building Data-centric Media OrganizationsBuilding Data-centric Media Organizations
Building Data-centric Media OrganizationsJ T "Tom" Johnson
 
Towards a critical data journalism practice
Towards a critical data journalism practiceTowards a critical data journalism practice
Towards a critical data journalism practiceLiliana Bounegru
 
Mapping the Australian Twittersphere
Mapping the Australian TwittersphereMapping the Australian Twittersphere
Mapping the Australian TwittersphereAxel Bruns
 
Data! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 yearsData! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 yearsPaul Bradshaw
 

What's hot (7)

Libraries in the Gigabit World
Libraries in the Gigabit WorldLibraries in the Gigabit World
Libraries in the Gigabit World
 
Hyperlocal data journalism - Andy Dickinson
Hyperlocal data journalism - Andy DickinsonHyperlocal data journalism - Andy Dickinson
Hyperlocal data journalism - Andy Dickinson
 
Building Data-centric Media Organizations
Building Data-centric Media OrganizationsBuilding Data-centric Media Organizations
Building Data-centric Media Organizations
 
Open Data Journalism
Open Data JournalismOpen Data Journalism
Open Data Journalism
 
Towards a critical data journalism practice
Towards a critical data journalism practiceTowards a critical data journalism practice
Towards a critical data journalism practice
 
Mapping the Australian Twittersphere
Mapping the Australian TwittersphereMapping the Australian Twittersphere
Mapping the Australian Twittersphere
 
Data! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 yearsData! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 years
 

Similar to Big Data Experiences from the Front Line

Evolution of the Humanitarian Data Ecosystem
Evolution of the Humanitarian Data EcosystemEvolution of the Humanitarian Data Ecosystem
Evolution of the Humanitarian Data EcosystemSara-Jayne Terp
 
Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for developmentSara-Jayne Terp
 
Open Data Islands and Communities
Open Data Islands and CommunitiesOpen Data Islands and Communities
Open Data Islands and CommunitiesAlan Dix
 
2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kind2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kindSara-Jayne Terp
 
Digital divide and computer assisted reporting
Digital divide and computer assisted reportingDigital divide and computer assisted reporting
Digital divide and computer assisted reportingAnna Polud
 
2013 01-21 open itp crisis and development data
2013 01-21 open itp crisis and development data2013 01-21 open itp crisis and development data
2013 01-21 open itp crisis and development dataSara-Jayne Terp
 
Icc2013 country names
Icc2013 country namesIcc2013 country names
Icc2013 country namessirf13
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with DataRitvvij Parrikh
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptxDennicaRivera
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
Big data and development
Big data and developmentBig data and development
Big data and developmentSimone Sala
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data Sharjeel Imtiaz
 
Big Data, Open Data, Big Costs - tim willoughby
Big Data, Open Data, Big Costs  - tim willoughbyBig Data, Open Data, Big Costs  - tim willoughby
Big Data, Open Data, Big Costs - tim willoughbyTim Willoughby
 
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014StampedeCon
 
The big story of small data.
The big story of small data. The big story of small data.
The big story of small data. Alan Dix
 
Designing for the Prime Interface
Designing for the Prime InterfaceDesigning for the Prime Interface
Designing for the Prime InterfaceBen Taylor
 

Similar to Big Data Experiences from the Front Line (20)

Evolution of the Humanitarian Data Ecosystem
Evolution of the Humanitarian Data EcosystemEvolution of the Humanitarian Data Ecosystem
Evolution of the Humanitarian Data Ecosystem
 
Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for development
 
Open Data Islands and Communities
Open Data Islands and CommunitiesOpen Data Islands and Communities
Open Data Islands and Communities
 
2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kind2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kind
 
Digital divide and computer assisted reporting
Digital divide and computer assisted reportingDigital divide and computer assisted reporting
Digital divide and computer assisted reporting
 
2013 01-21 open itp crisis and development data
2013 01-21 open itp crisis and development data2013 01-21 open itp crisis and development data
2013 01-21 open itp crisis and development data
 
Icc2013 country names
Icc2013 country namesIcc2013 country names
Icc2013 country names
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with Data
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Gettind data used
Gettind data usedGettind data used
Gettind data used
 
Making data more human
Making data more humanMaking data more human
Making data more human
 
Big data and development
Big data and developmentBig data and development
Big data and development
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data
 
Big Data, Open Data, Big Costs - tim willoughby
Big Data, Open Data, Big Costs  - tim willoughbyBig Data, Open Data, Big Costs  - tim willoughby
Big Data, Open Data, Big Costs - tim willoughby
 
Open Data Journalism
Open Data JournalismOpen Data Journalism
Open Data Journalism
 
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
 
The big story of small data.
The big story of small data. The big story of small data.
The big story of small data.
 
Designing for the Prime Interface
Designing for the Prime InterfaceDesigning for the Prime Interface
Designing for the Prime Interface
 

More from Sara-Jayne Terp

Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Sara-Jayne Terp
 
Risk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageRisk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageSara-Jayne Terp
 
disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...Sara-Jayne Terp
 
Cognitive security: all the other things
Cognitive security: all the other thingsCognitive security: all the other things
Cognitive security: all the other thingsSara-Jayne Terp
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of DisinformationSara-Jayne Terp
 
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umarylandSara-Jayne Terp
 
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...Sara-Jayne Terp
 
2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeleySara-Jayne Terp
 
Using AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksUsing AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksSara-Jayne Terp
 
2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_secSara-Jayne Terp
 
2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copySara-Jayne Terp
 
BSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideBSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideSara-Jayne Terp
 
Social engineering at scale
Social engineering at scaleSocial engineering at scale
Social engineering at scaleSara-Jayne Terp
 
engineering misinformation
engineering misinformationengineering misinformation
engineering misinformationSara-Jayne Terp
 
Online misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowOnline misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowSara-Jayne Terp
 
Sj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSara-Jayne Terp
 
Belief: learning about new problems from old things
Belief: learning about new problems from old thingsBelief: learning about new problems from old things
Belief: learning about new problems from old thingsSara-Jayne Terp
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger dataSara-Jayne Terp
 
Session 09 learning relationships.pptx
Session 09 learning relationships.pptxSession 09 learning relationships.pptx
Session 09 learning relationships.pptxSara-Jayne Terp
 

More from Sara-Jayne Terp (20)

Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...
 
Risk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageRisk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of age
 
disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...
 
Cognitive security: all the other things
Cognitive security: all the other thingsCognitive security: all the other things
Cognitive security: all the other things
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of Disinformation
 
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
 
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
 
2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley
 
Using AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksUsing AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworks
 
2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec
 
2020 09-01 disclosure
2020 09-01 disclosure2020 09-01 disclosure
2020 09-01 disclosure
 
2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy
 
BSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideBSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guide
 
Social engineering at scale
Social engineering at scaleSocial engineering at scale
Social engineering at scale
 
engineering misinformation
engineering misinformationengineering misinformation
engineering misinformation
 
Online misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowOnline misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz now
 
Sj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_belief
 
Belief: learning about new problems from old things
Belief: learning about new problems from old thingsBelief: learning about new problems from old things
Belief: learning about new problems from old things
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger data
 
Session 09 learning relationships.pptx
Session 09 learning relationships.pptxSession 09 learning relationships.pptx
Session 09 learning relationships.pptx
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Big Data Experiences from the Front Line

  • 1. Big Data and Me: experiences from the front line Sara-Jayne Farmer Change Assembly April 22nd 2013
  • 2. ME
  • 3. Me • Data Scientist • Using data to: – connect communities – improve access to information – so people can make better decisions – on both small and large scales • It’s all about people: – Local people: know their needs; need more information – Local technologists: have skills; need connections – Large organisations: have resources; need guidance
  • 4. Some of those People (smart, talented, dedicated hackers in Haiti, January 2013)
  • 5. My Personal Three Vs • Variety – Data all over the place – Csv, json, xml, excel, pdf, text, webpages, rss, scanned pages, images, videos, audiofiles, maps, proprietary. Etc. • Velocity – Streams updating too fast for a mapping team (100-200 people) to handle – Pages updating too frequently to check by hand • Volume – Can’t open the data in a spreadsheet – Can’t fit the data on my laptop – Maxes out my credit card (thank you Amazon!)
  • 7. “more people have mobile phones than toilets” – UN, March 2013
  • 8. But… but… there are always data issues… • Datasets were difficult to find • No data available after 2010 • Hard to track provenance – e.g. what decisions did the people creating these datasets make? What assumptions? • Data was rounded up • Countrynames didn’t match between sets • Multiple charactersets (e.g. Å, A, Ԇ) • Messy formatting (merges, ‘explanations’ etc)
  • 9. e.g. Country Names DR Congo in Data.UN.Org: • “Congo, Democratic Republic of the”, “Congo Democratic”, “Democratic Republic of the Congo”, “Congo (Democratic Republic of the)”, “Congo, Dem. Rep.”, “Congo Dem. Rep.”, “Congo, Democratic Republic of”, “Dem. Rep. of Congo”, “Dem. Rep. of the Congo” DR Congo in common standards: • “Democratic Republic of the Congo” (UN Stats), “Congo, The Democratic Republic of the” (ISO3166), “Congo, Democratic Republic of the” (FIPS10, Stanag), “180” (UN Stats), “COD” (ISO3166, Stanag), “CG” (FIPS10)
  • 11. And interpretation • Hang on… don’t some people have more than one phone? • And how do you count the people without toilets? • What if the cities have lots of phones and toilets, and the rural areas don’t? • Where does my composting toilet fit in this? • How big were these surveys? • What do we do with the zeros? • Etc…
  • 15. And alternative alternatives… • Social media proxies • Grassroots maps • Etc.
  • 18. The Humans+Tools Solution: Crisismapping
  • 26. Use
  • 27. BUT WE NEED MORE DATA SCIENTISTS…
  • 28. Build and Connect Communities
  • 31. Big Data and Me: experiences from the front line Sara-Jayne Farmer http://www.changeassembly.com/ @bodaceacat
  • 36. Tools
  • 38. NYC Meetups (see meetup.com)