SlideShare a Scribd company logo
DATA MINING THE CITY
Weds 7p-9p 200 Buell
Violet Whitney, vw2205@columbia.edu
please submit your attendance
and medium profile:
http://shoutkey.com/beer
New room except…
...Nov 8 in Avery 114
Attendance is
sometime at the end of
class and will expire
Zach White
Windows : (
Next Week!
Location & Accuracy
Bias
D<>D Abstracting Data
Reflection/Attendance
Subjectivity
Python
Project + Hypothesis
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
Regional science
Location theory
Regional Policy Analysis
Location modeling
Land-use
Migration analysis
Spatial economics
Transportation
subjectivity
¯(°_o)/¯
Regional science
Location theory
regional policy analysis
Location modeling
Land-use
Migration analysis
Spatial economics
Transportation
God’s Eye View
subaltern
Gayatri Spivak
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
objective subjective
subjectivity
¯(°_o)/¯
Skeptics Quantified Self
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
Illusory Correlation
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
Whorf hypothesis
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
reducing complexity
subjectivity
¯(°_o)/¯
overfitting
subjectivity
¯(°_o)/¯
underfitting?
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
subjectivity
¯(°_o)/¯
Data synthesis
Data Object 1 Data Object 2
attribute
attribute
attribute
location
attribute
attribute
attribute
location
location &
accuracy
ಠ╭╮ಠ
Location & accuracy
ಠ╭╮ಠ
Location & accuracy
ಠ╭╮ಠ
GPS
Location & accuracy
ಠ╭╮ಠ
Wifi Pinging
Location & accuracy
ಠ╭╮ಠ
Wifi Pinging
Location & accuracy
ಠ╭╮ಠ
beacons
Location & accuracy
ಠ╭╮ಠ
Computer Vision
Location & accuracy
ಠ╭╮ಠ
Projection
Location & accuracy
ಠ╭╮ಠ
Projection
Location & accuracy
ಠ╭╮ಠ
Location & accuracy
ಠ╭╮ಠ
Location & accuracy
ಠ╭╮ಠ
Location & accuracy
ಠ╭╮ಠ
Historical models and theory
data records, models and theory
observation
record
theory
model
Location & accuracy
ಠ╭╮ಠ
Location & accuracy
ಠ╭╮ಠ
clusters of cholera cases in the
London epidemic of 1854
Location & accuracy
ಠ╭╮ಠ
d<>d
(☞゚ヮ゚)☞ ☜(゚ヮ゚☜)
bias
(¬_¬)
bias
(¬_¬)
bias
(¬_¬)
bias
(¬_¬)
https://vimeo.com/145334736
bias
(¬_¬)
http://www.stephaniedinkins.com/co
nversations-with-bina48.html
bias
(¬_¬)
the project
(づ ̄ ³ ̄)づ
Data Selection
Pre-processing &
Cleaning
Data Mining
Interpretation/
Evaluation
Feature Selection
The project
(づ ̄ ³ ̄)づ
The project
(づ ̄ ³ ̄)づ
The project
(づ ̄ ³ ̄)づ
What kinds of problems
can spatial data solve?
The project
(づ ̄ ³ ̄)づ
Assignment 1
Case Studies
The project
(づ ̄ ³ ̄)づ
Analytical
The project
(づ ̄ ³ ̄)づ
Analytical
The project
(づ ̄ ³ ̄)づ
Analytical
The project
(づ ̄ ³ ̄)づ
Analytical
The project
(づ ̄ ³ ̄)づ
Analytical
The project
(づ ̄ ³ ̄)づ
Civilian journalism
Eyal Weisman
Forensic Architecture
The project
(づ ̄ ³ ̄)づ
Predictive
The project
(づ ̄ ³ ̄)づ
...Predictive
The project
(づ ̄ ³ ̄)づ
...Predictive
The project
(づ ̄ ³ ̄)づ
Narrative
The project
(づ ̄ ³ ̄)づ
Narrative
The project
(づ ̄ ³ ̄)づ
Narrative
The project
(づ ̄ ³ ̄)づ
Narrative
The project
(づ ̄ ³ ̄)づ
Narrative
The project
(づ ̄ ³ ̄)づ
...Narrative/Analytical?
The project
(づ ̄ ³ ̄)づ
Exploratory
The project
(づ ̄ ³ ̄)づ
...Exploratory
The project
(づ ̄ ³ ̄)づ
...Exploratory
The project
(づ ̄ ³ ̄)づ
Inductive
to examine empirical evidence in
the search for patterns that might
support new theories or general
principles
Deductive
focusing on the testing of known
theories or principles against data
Normative
using spatial analysis to develop
or prescribe new or better designs
Spatial Analysis
The project
(づ ̄ ³ ̄)づ
Where to start?!?!!
The project
(づ ̄ ³ ̄)づ
Data Selection
Pre-processing &
Cleaning
Data Mining
Interpretation/
Evaluation
Feature Selection
The project
(づ ̄ ³ ̄)づ
The project
(づ ̄ ³ ̄)づ
Data Selection
Pre-processing &
Cleaning
Feature Selection
The project
(づ ̄ ³ ̄)づ
Data MiningFeature Selection
The project
(づ ̄ ³ ̄)づ
Data Mining
Interpretation/
Evaluation
The project
(づ ̄ ³ ̄)づ
The project
(づ ̄ ³ ̄)づ
Where to start?!?!!
The project
(づ ̄ ³ ̄)づ
d<>d
(☞゚ヮ゚)☞ ☜(゚ヮ゚☜)
Fold a paper into 8 sections
8min
1 problem per 1 min
4 min (2 min each)
discuss w/ partner how you
would get data
1 min
decide on most interesting
d<>d
(☞゚ヮ゚)☞ ☜(゚ヮ゚☜)
DATA MINING THE CITY
Weds 7p-9p 200 Buell
Violet Whitney, vw2205@columbia.edu
Week 2 course feedback:
http://shoutkey.com/taro
Location & accuracy
ಠ╭╮ಠ
https://en.wikipedia.org/wiki/Spatial_analysis
https://en.wikipedia.org/wiki/John_Snow
http://dusk.geo.orst.edu/gis/Chapter14_notes.pdf
https://en.wikipedia.org/wiki/Stan_Openshaw
http://www.med.upenn.edu/beat/docs/Openshaw1983.pdf
https://en.wikipedia.org/wiki/Neighbourhood_effect
https://www.amazon.com/Truly-Disadvantaged-Underclass-Public-
Policy/dp/0226901262/ref=pd_lpo_sbs_14_img_0?_encoding=UTF8&psc
=1&refRID=Z9MJXACHJ610V2PYFSSF
Resources
Hyperlocal
Hyperlocal full book
Google API Explorer
NYC Open Data
Filtering Content
Jaron Lanier
neotony
Clay Shirky
Paul Currien - an algorithm : step by step instructions for carrying out a task
https://medium.com/towards-data-
science/background-removal-with-deep-
learning-c4f2104b3157
Privacy and location removal
https://medium.com/towards-data-
science/create-a-heat-map-from-your-
google-location-history-in-3-easy-steps-
e66c93925914

More Related Content

Recently uploaded

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 

Recently uploaded (20)

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 

Featured

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
Alireza Esmikhani
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Week 2 - Data Mining the City

Editor's Notes

  1. Next week we have guests coming which is super exciting. We’re going to learn about how to solve mapping, routing and path problems. Allan William Martin - computational designer and former head of Product at Floored - automated Floorplan layouts, and now a Product Manager in cloud computing Jeff Tarr - who is the senior engineer working on indoor localization, and understanding paths at Sidewalk Labs We’re super lucky to have both of them. They are both going to be making content specifically for you to help you in your own projects so that’s even more exciting.
  2. In this portion we’ll contextualize the criticisms of spatial analytics …. so we understand how we are operating and we can be critical and aware of our own approaches
  3. Across many fields, analytics have been applied spatially. A couple examples: regional science is a field of the social sciences concerned with analytical approaches to problems that are specifically urban, rural, or regional. Spatial economics deals with what is where, and why.
  4. But there are major critiques of these methods. In his landmark text “Explanation of Geography” David Harvey critiques these emerging types for being devoid of theory and ethnographic perspective, and for being co-opted for political purposes. The book could also have been called “The Role of Theory in Scientific Explanation”...as Harvey recognizes that these scientific fields are also deeply interwoven with theoretical issues.
  5. Another critique of spatial analysis is that it looks at problems from a God’s Eye View, and is overly reductive of individual experience. In history, Subaltern Studies questions the history of the masses. What happens at the base levels of society rather than what happens among the elite. It fundamentally questions who speaks for whom. When histories about masses are written by elite historians but not by the masses themselves. When a map is drawn, do the masses speak for themselves, or does an elite planner, economist or government official speak for them? Spatial Analytics borrows objective and subjective tactics, but is critiqued for a God’s Eye View perspective that can lack first person understanding because it tries to reduce trends from multiple occurrences into a model, trend or behavior pattern.
  6. Objectivity is a central philosophical concept, related to reality and truth. It is concerned with finding agreed understanding of the natural world often through measurement. Subjectivity is based on or influenced by personal feelings, tastes, or opinions. The distinction between subjectivity and objectivity can be a fine line, but data mining can borrows objective and subjective methods to understand spatial phenomenon.
  7. There’s further criticism in the obsession with data. You may have heard the term “quantrapreneurs” which mocks companies built on data. The quantified self also known as lifelogging, is a movement to incorporate technology into data acquisition on aspects of a person's daily life in terms of inputs (food consumed, quality of surrounding air), states (mood, arousal, blood oxygen levels), and performance, whether mental or physical. In short, quantified self is self-knowledge through self-tracking with technology. But there are skeptics of the quantified self. In a quote from a skeptic: "Quantified self" practitioners as a group are not necessarily curious about human values or an understanding of what makes us human. They're more interested in anything that can be measured and given a number. They believe the maxim that only the things that are measured can be improved. But I see a lot of measuring, but not much improvement.... The skeptic continues... Quantifying the number of times we eat, sleep, or tweet doesn’t somehow reveal something more truthful about ourselves over just experiencing it. Are we actually learning something more fundamental about ourselves? Why do we think there’s something more true in the numbers than how I feel?
  8. .
  9. Coffee argument
  10. It seems somewhat impossible to describe everything… And even when we describe and categorize the world, the boundaries can be somewhat arbitrary.---- How do you distinguish a cell in the small intestine from the descending colon? The cell doesn’t know that its part of that system or even the digestive system. Humans only delineate it that way.
  11. To this degree, boundaries and categories can be somewhat arbitrary in nature.
  12. Can “reality” be described? For Nietzche (nee-cha) and nihlists there is no reason to describe it because there is no objective order or structure in the world except for what we give it. “Every belief, every considering something true, is necessarily false because there is simply no true world.” The perspective is that humans search and attribute meaning in a meaningless world.
  13. The phenomenon of humans perceiving correlations that don’t exist such as the faces we imagine in trees or the clouds or the circle and lines on the screen is called apophenia.
  14. In data science it is called Illusory correlation… the phenomenon of perceiving a relationship between variables (typically people, events, or behaviors) even when no such relationship exists.
  15. So what is the distinction between “reality” and our simulation of it in our models, maps, and data. If we had all of the data points about the feel, color, location size of everything in the natural world and modeled it, what would be the difference between reality and our simulation? Simulation theory (as popularized by neuromancer and the matrix) is the hypothesis that reality could be simulated - that we couldn’t tell the difference anyways if we were in a simulation. You may have heard of the Borges Map which describes a 1:1 scale map which has been used again and again by other authors and artists. In 1893, Lewis Carroll, author of Alice in Wonderland, imagined a fictional map that had "the scale of a mile to the mile." In his passage: “And then came the grandest idea of all! We actually made a map of the country, on the scale of a mile to the mile!” “Have you used it much?” I enquired. “It has never been spread out, yet,” said Mein Herr: “the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well.”
  16. Today the 1:1 map and the blend between reality and simulation is closer than ever through the Internet and Internet of things A reading from the Artist, Hito Steyerl
  17. The whorf hypothesis describes: reality is embedded in culture’s language and that language controls thought and cultural norms. Some languages create the capacity to discuss concepts that don’t exist or can’t be comprehended in other languages.
  18. The fact that we describe colors categorically or that we think of counting sequentially drive our capacity of how we understand those things. But many things are less clearly classified and often fit into gradient fields. You can imagine a language not made up of categorical words but made up of songs, rhythm, pitch, and tone. Instead of saying bluish green would we hum somewhere in between the pitch of blue and green? Could volume indicate intensity of saturation? This would fundamentally shift our understanding of what is. ….questions?
  19. Words and language create a shared understanding of what something is, however a word in inherently reductionist. So we reduce at the level of the word, but we also reduce when we model a concept. In data science we use models, trend lines and patterns to understand what happened and to predict what will happen.
  20. When we have a set of data points from individual events, we attempt to understand it by creating generalized model. In machine learning this is called fitting. When we underfit a model it is too general to be useful. Our ideal spot is in the middle: where it is specific enough to be useful but general enough to be applied to other data samples. Overfitting tries to hug the data set too closely. Then when the model is applied to a another dataset, the model isn’t relevant. Because big data uses huge data sets, its useful to model around a percentage of that data to represent all of the data. However if a model is overfit to the sample data it wont apply to the rest of your data. ---Modeling income in NYC Overfitting is relevant in examples in your life as well: I looked for wall hooks on Amazon once, and now its recommendation engine thinks I love wall hooks. Because its overfitting to the sample data it was given.
  21. You can also think of examples of overfitting and underfitting in architecture. Modernist architecture was intended to universalize design around a standard man. But we often find that we are misfits in the models that are built for us. A modernist chair might be too large for us. Or the seat on a plane is too small for a tall person.
  22. In Goldilocks and the 3 bears, Goldilocks finds that the chairs (lets call them models) don’t all fit her
  23. When comparing and modeling data from different sources we need to find common ground to compare the data sets. Data synthesis is A method that uses statistical techniques to combine results from different studies and obtain a quantitative estimate of the overall effect of a particular intervention or variable on a defined outcome—i.e., it is a statistical process for pooling data from many clinical trials to glean a clear answer. A planner’s first stop in describing the existing conditions of a community is usually the Census Bureau. To protect the privacy of respondents, Census data is delivered at different geographies and across different periods of time. For example, the best estimate of the number of households in a community may be available for each Census block from the Decennial Census (last conducted in 2010), and the best estimate of household income may be the five-year rolling data product from the American Community Survey for each Census tract. Combining these disparate data sets to create a coherent and complete representation of what is happening in a community at any point in time is difficult. It’s a bit like trying to completely understand a subject from photos that are taken from different angles, at different points in time, from different distances. Further complicating the problem, urban planners like to use non-Census data sets, such as school quality, that may introduce yet another set of geographies (e.g., school districts).
  24. Lets talk about how location and accuracy effect our data
  25. So there’s all these various words or mechanism to describe and track objects and their behavior. We can describe an object’s location with a unique identifier number (this is used in 3d modeling programs and is called a GUID), we can describe its color, length height and width but each of these descriptors changes the way we understand that object.
  26. But its not just the word we record about an object that changes its definition, its definition is also defined by the tool or method that we use to describe it. You can think of our senses as a tool for measuring the natural environment. You can see, smell, hear, taste, and touch, each which describe a different aspect of a thing. Likewise we can use tools with sensors to understand and record events and to sense objects. Each tool or sensor has limited agency in what it can record. A camera for example is limited to its resolution, what distance it is from where its recording, the range of color it can capture, whether its view is obscured etc. All of these factors will effect the outcome of what is recorded. A camera placed at a different height with greater resolution would capture very different results. An infrared camera would capture a lower range of spectral colors. However if I used a lidar sensor to detect the distance of the object I would have no information about the object's color and I would only know its position relative to the position of the lidar receiver.
  27. Take these examples of how location is tracked to understand how the tool impacts the data that is recorded. GPS - Global Positioning System is a radio navigation system that allows land, sea, and airborne users to determine their exact geographic location, velocity, and time, 24 hours a day, in all weather conditions, anywhere in the world. GPS is made up of 29 satellites orbitting the earth. The working/operation of GPS is based on the ‘trilateration’ mathematical principle. The position is determined from the distance measurements to satellites. From the figure, the four satellites are used to determine the position of the receiver on the earth. The target location is confirmed by the 4th satellite. And three satellites are used to trace the location place. A fourth satellite is used to confirm the target location of each of those space vehicles. GPS consists of satellite, control station and monitor station and receiver. The GPS receiver takes the information from the satellite and uses the method of triangulation to determine a user’s exact position. Agency of GPS: its accurate up to about 10meters which isn’t very useful for indoor positioning. It works best when line of site is direct so its much worse in brick buildings or when obscured by foliage.
  28. Wifi pinging - With this method, a signal is sent through a wifi hotspot to a user’s device such as a smart phone, smart watch, or computer to relate what IP addresses are within range. The strength of a signal or its mere presence within a wifi network can indicate where people are located.
  29. While wifi is usually limited to individual wifi networks, larger connected networks can be used to track the movement of a device through the city. Using time stamps of when a unique user’s address shows up at various places throughout a city Agency - - 10 meters of accuracy Mac addresses are scrambled every so often so its hard to track who is in range
  30. Beacons - These are small devices that work similarly to wiif pinging but a major difference is that it can track who a person is (their unique profile). It can track while not connected to the web (bluetooth) and its accuracy is much better +- 1 m. Beacons are often used commercially to communicate with a shopper’s smartphone to improve the in-store experience. Beacons use Bluetooth to detect nearby smartphones with the intent of sending them ads, coupons, or product information, or to track how a customer moves through a store. Companies like Apple and McDonald's have used beacons to deliver in-store deals to customers phones. This is also sometimes referred to as geo-fencing.
  31. Computer vision - CV is a method for analyzing imagery, often through pattern and object recognition. Computer vision can be used to identify a particular person using facial recognition; it can understand various objects like cars, trees or people, or to recognize gestures or even the mood of someone given his or her expression. These methods are often applied to video surveillance, but can also be applied to analyze stock image or video footage. Computer vision can be used to track where pedestrians walk. Agency - While CV is the most accurate in tracking up to the inch where someone goes it has other major limitations. Its not always accurate in defining what is a peron and can often track two people as one, or thing something non-human is human. It also requires translating a 2d video into a plan view which is difficult to accurately translate.
  32. So data is changed by how it is recorded with a tool, but also how it is translated when it is communicated and visualized. Maps cannot be created without map projections. All map projections necessarily distort the surface in some fashion. Depending on the purpose of the map, some distortions are acceptable and others are not; therefore, different map projections exist in order to preserve some properties of the sphere-like body at the expense of other properties.
  33. https://en.wikipedia.org/wiki/List_of_map_projections#pseudocylindrical All trying to solve the issue of flattening a spherical surface - more computable, more easy to reproduce
  34. So accuracy can be distorted when data is translated to a map and can be lost via the tool it is tracked with. In images and especially in satellite photography this is an issue when one pixel can represent a foot to a mile. https://www.artforum.com/video/mode=large&id=51651
  35. When data is lots when or changed by its means of recording this is also similar to history. Think about how history is recorded.
  36. After a series of events such as a presidential election someone must record this history based on their own observations and interpretations. How might someones record of history differ if they record through an aerial camera, through a camera on the ground, or if they hear an event remotely through a radio?
  37. But even if all observations and understanding is subjective, is it worthwhile to start somewhere? History Helps Us Understand Change and How the Society We Live in Came to Be The second reason history is inescapable as a subject of serious study follows closely on the first. The past causes the present, and so the future.
  38. Models can be problematic but they can also help us make better decisions
  39. John Snow...not the game of thrones character During the cholera epedemic in London, John Snow used surveys and mapping to show that cholera was spread through germs (feces in water etc) rather than through bad air (miasma theory). Before this germs were not accepted as causing sickness and many didn’t want to believe it. By talking to local residents, he identified the source of the outbreak as the public water pump on Broad Street. Although Snow's chemical and microscope examination of a water sample from the Broad Street pump did not conclusively prove its danger, his studies of the pattern of the disease were convincing enough to persuade the local council to disable the well pump by removing its handle. This action has been commonly credited as ending the outbreak. He also used statistics to illustrate the connection between the quality of the water source and cholera cases. He showed that the Southwark and Vauxhall Waterworks Company was taking water from sewage-polluted sections of the Thames and delivering the water to homes thus increasing incidences of cholera.
  40. Data from open NYC - how its reduced or abstracted 10 min exercise !!!!!!!!!!!!!
  41. Bias has several definitions, and its common usage is decidedly negative. We typically use it to mean systematic favoritism of a group. Generally speaking, “bias” is derived from the ancient Greek word that describes an oblique line (i.e., a deviation from the horizontal). In Data Science, bias is a deviation from expectation in the data. More fundamentally, bias refers to an error in the data. But, the error is often subtle or goes unnoticed. So, why does bias occur in the first place? Over the next posts in this series, we will briefly define and describe common statistical and cognitive biases, as listed below: Selection (or sample) Bias Seasonal Bias Linearity Bias Confirmation Bias Recall Bias Survivor Bias Observer Bias Reinforcement Bias
  42. Math Masters of Destruction how data is used to lie Verizon map video
  43. Regional science is a field of the social sciences concerned with analytical approaches to problems that are specifically urban, rural, or regional.
  44. Overly reductionist is no longer useful, overfitting is so specific that it no longer fits multiple people - I like to think of a sock vs a toe sock, one may fit many feet, the other is so formed to one type of foot that it doesn’t fit my stubby toes
  45. Stephanie Dinkins is an artist focused on artificial intelligence as it intersects race, gender, aging and our future histories. examination of the codification of social, cultural and future histories
  46. Exercise on hypothesis...
  47. A quick overview from the last class… We learned what Data Mining is… With Data Selection you take raw data from websites, a database somewhere, or a website’s API - we got data in our exercise last class by manually scraping Google Maps With Pre-Processing & Cleaning Data - you clean the data for whatever purposes you need. In our case we had addresses which needed to be turned into latitude and longitudes so that they were useful to us With Feature Selection - you need to select which features are useful for your data mining and visualization, we did this by selecting the location attributes ( latitude and longitude data) and deleting attributes that weren’t relevant such as the open hours of a store or its rating. With Data Mining we visualized our data set by getting images from Google Street View. There are a number of ways we could begin to sort these images to understand correlations such as how many people appear in each image or how many windows are in each image. What we graph or analyze in these images depends entirely on what we are trying to understand. Lastly in data mining we interpret and evaluate our result. We have yet to do this... We learned about encoding and decoding information, and about what APIs are
  48. What skills do they need
  49. Analytical: Based on history of events describe something that happened, such as public and private funding for bus stations in 2015 in New York City was lower in neighborhoods with an average lower median income.
  50. Analytical…. To understand hurricane damage
  51. Analytical…. To understand how different cultures think about objects or concepts
  52. Analytical…. Forensic to understand an event https://www.nytimes.com/interactive/2016/09/25/us/charlotte-scott-shooting-video.html?mtrref=undefined
  53. Analytical…. Forensic… To understand deaths from a fire
  54. http://www.forensic-architecture.org/case/rafah-black-friday/
  55. Predictive: Tries to base a historical set of data to predict what might happen in the future. For example, when a McDonalds becomes a new tenant at any location, foot-traffic on the street in the surrounding 5 blocks increases 10%. This can also be used to make arguments or decisions, such as: because foot-traffic increases 10% any time a new McDonalds moves in, a new subway stop should be built here to capture larger traffic. Historical analysis could also be implement dynamic zoning based on what has happened in a past location.
  56. Predictive: continued https://quickdraw.withgoogle.com/data
  57. Predictive: continued why we won’t have jobs anymore… ...so pay attention https://affinelayer.com/pixsrv/index.html
  58. Narrative: Which has the intent of telling a story, which can also be argumentative, but is not intended to draw a scientific conclusion.
  59. Narrative: http://prisonmap.com/
  60. Narrative: https://www.nytimes.com/interactive/2016/02/26/us/race-of-american-power.html
  61. Narrative… https://theintercept.co/condolences/
  62. Narrative… https://flowingdata.com/2015/09/23/years-you-have-left-to-live-probably/
  63. Narrative continued….can be biased
  64. Exploratory: Is intended to explore how data might be related but may not have an end goal of arguing anything in particular. For example the project on Broadway visuallizes instagram images along broadway, what colors are used in the images, and how many images are posted at each location throughout the day and then its visualized side by side with the street view locations. http://www.on-broadway.nyc/
  65. Exploratory continued… You can imagine doing exquisite corpses of buildings or streets, or building interiors with streetview
  66. Exploratory continued… Composite images: pulling together multiple datasets http://mpkelley.photography/?category=airportraits
  67. WHERE DO WE START?!?!
  68. If we come back to our data mining diagram, you’ll notice it starts at the phase of data selection...but before we start collecting our data, we need to know what we’re collecting it for
  69. Data analysis starts first with a questions phase with a problem you want to solve: Such as “What are the characteristics of buildings that are most heavily instagrammed? What are the characteristics of diverse neighborhoods?
  70. The wrangling phase includes data selection, pre-processing and cleaning data as well as feature selection - this is a moment to investigate and understand data and its attributes and may require going through multiple data sets to find out if they are useful
  71. The next step is to look for patterns and correlations in the data. This is often done through graphing various features of the data to see how they might be correlated You’ll often need to go back and forth between the wrangling phase of getting and cleaning data and exploring the data to make sure that its useful
  72. The next step is to draw conclusions or make predictions from that data set - this often includes machine learning and statistics. Making broad and accurate conclusions takes a lot of research and vetting. Your projects will scratch the surface here but will make provocative claims supported by data.
  73. Finally the analysis is communicated (often via blog posts, reports which include data visualization).
  74. SO AGAIN….WHERE DO WE START?!?!
  75. Exercise to get started
  76. This isn’t the only way to start a project….just one way to get ourselves moving Example problems in the form of a question: “What are the characteristics of buildings that are most heavily instagrammed? What are the characteristics of diverse neighborhoods?
  77. Regional science is a field of the social sciences concerned with analytical approaches to problems that are specifically urban, rural, or regional.