Week 1 - Data Mining the City

•Download as PPTX, PDF•

1 like•529 views

Columbia University - Graduate School of Architecture and Planning Preservation - Data Mining the City- Week 1 Data, Getting Data, Cleaning Data

Data & Analytics

DATA MINING THE CITY
Weds 7p-9p 200 Buell
Violet Whitney, vw2205@columbia.edu
please take a moment to
say why you’re here:
shoutkey.com/carrot

Class Overview
D<>D Getting Data
What are data?

Class Overview
D<>D Getting Data
What are data?
D<>D Cleaning Data

Class Overview
D<>D Getting Data
What are data?
D<>D Cleaning Data
Reflection/Attendance

Platform Society
Foursquare 2008
Big Data
Widespread Adoption
Democratization of data
Statistics
Bayes Theorem (1763)
Regression (1805)
Computer Age
Turing (1936)
Neural Networks (1943)
Evolutionary Computation (1965)
Databases (1970s)
Genetic Algorithms (1975)
Data Mining
KDD or Knowledge Discovery from
Databases (1989)
Supervised machine learning (1992)
Data Science (2001)

Platform SocietyStatistics Computer Age Data Mining

redevelopment of “blighted” areas racial redlining
city data isn’t new

Python
APIs
Processing
Batch Processes
After Effects scripts
Sorting Excel
formulas
Geocoding
Recommendation
Systems

Machine Learning Pattern Recognition
Algorithms
High-Performance
Computing
Statistics
Database Systems
Data Warehouse
Information Retrieval Applications
Data Mining
Visualization
class overview
(ʘᗩʘ')

Data Selection
Pre-processing &
Cleaning
Data Mining
Interpretation/
Evaluation
Feature Selection
class overview
(ʘᗩʘ')

40%
30%
30%
Attendance
Work
WklyPostsFinalProject
class overview
(ʘᗩʘ')

30%
30%
Work
WklyPostsFinalProject
class overview
(ʘᗩʘ')

class overview
(ʘᗩʘ')
30%
30%
Work
WklyPostsFinalProject
YOUR EPIC PROJECT!
Are Airbnb prices higher in
neighborhoods that are more
diverse?

class overview
(ʘᗩʘ')
30%
30%
Work
WklyPostsFinalProject
party!!!

d<>d
(☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜)
Best of Luck with the Wall
Officer Involved

What are data?
¯_(ツ)_/¯
Why visualization matters

What are data?
¯_(ツ)_/¯
Anscombe’s Quartet

What are data?
¯_(ツ)_/¯
Why size matters

What are data?
¯_(ツ)_/¯
How are data
represented?

What is data?
¯_(ツ)_/¯
11111111 01010011 00001000

What is data?
¯_(ツ)_/¯
11111111
FF
01010011
83
00001000
08

What is data?
¯_(ツ)_/¯
11111111
FF
red = 255
01010011
83
green = 83
00001000
08
blue = 8

What is data?
¯_(ツ)_/¯
binary <----------------- human readable
“encoding”“encoding”

What is data?
¯_(ツ)_/¯
binary -----------------> human readable
“decoding”“decoding”

d<>d
(☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜)
Pre-processing &
Cleaning

Cleaning Data
(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧
Pre-processing &
Cleaning

Cleaning Data
(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧
AttributesPre-processing &
Cleaning
rating
rating
rating
rating

Cleaning Data
(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧
AttributesPre-processing &
Cleaning
address
address
address
address

Cleaning Data
(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧
AttributesPre-processing &
Cleaning
open hours
open hours
open hours

Cleaning Data
(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧
Pre-processing &
Cleaning
Objects
Restaurant 1
Restaurant 2
Restaurant 3
Restaurant 4

Cleaning Data
(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧
Feature Selection
address
address
address
address

DATA MINING THE CITY
Weds 7p-9p 200 Buell
Violet Whitney, vw2205@columbia.edu
attendance/reflection:
shoutkey.com/us

Similar to Week 1 - Data Mining the City

Data driven innovationBig Data Value Association

Technology for the Digital CitizenLevi Kabwato

This is not your grandmother's online map: Advancing your mission with GIS toolsChicago Technology Cooperative

Data as a Creative MaterialAudree Lapierre

Functional Leap of Faith (Keynote at JDay Lviv 2014)Tomer Gabel

The Digital Divides or the third industrial revolution: concepts and figuresIsmael Peña-López

Iftf 20191206 v9ISSIP

Using Graphs to Enable National-Scale AnalyticsNeo4j

top 10 Data Mining AlgorithmsNagasuri Bala Venkateswarlu

eChicago ConferenceShireen Mitchell

geostackJoana Simoes

HS DAM Chicago 2019 - Reframing the ConversationChristina Gibbs

Cs501 dm introKamal Singh Lodhi

Data Science Chapter 1.pdfMpumelelo Ndlovu

Social media quizpublic-i

unit 1 DATA MINING.pptBREENAHICETSTAFFCSE

A Training & Simulation Perspective on Maritime Information & AutomationAndy Fawkes

BigData & Supply Chain: A "Small" IntroductionIvan Gruer

Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...Matthias Stürmer

How to last in civic tech (especially now) - Matthew Stempeck & Micah L. Sifr...mysociety

Similar to Week 1 - Data Mining the City (20)

Data driven innovation

Technology for the Digital Citizen

This is not your grandmother's online map: Advancing your mission with GIS tools

Data as a Creative Material

Functional Leap of Faith (Keynote at JDay Lviv 2014)

The Digital Divides or the third industrial revolution: concepts and figures

Iftf 20191206 v9

Using Graphs to Enable National-Scale Analytics

top 10 Data Mining Algorithms

eChicago Conference

geostack

HS DAM Chicago 2019 - Reframing the Conversation

Cs501 dm intro

Data Science Chapter 1.pdf

Social media quiz

unit 1 DATA MINING.ppt

A Training & Simulation Perspective on Maritime Information & Automation

BigData & Supply Chain: A "Small" Introduction

Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...

How to last in civic tech (especially now) - Matthew Stempeck & Micah L. Sifr...

Recently uploaded

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

B2 Creative Industry Response Evaluation.docxStephen266013

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort

Data Science Jobs and Salaries Analysis.pptxFurkanTasci3

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

E-Commerce Order PredictionShraddha Kamble.pptxBoston Institute of Analytics

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Data Warehouse , Data Cube Computationsit20ad004

Industrialised data - the key to AI success.pdfLars Albertsson

Recently uploaded (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

B2 Creative Industry Response Evaluation.docx

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...

VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

RA-11058_IRR-COMPRESS Do 198 series of 1998

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service

Data Science Jobs and Salaries Analysis.pptx

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

20240419 - Measurecamp Amsterdam - SAM.pdf

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

E-Commerce Order PredictionShraddha Kamble.pptx

100-Concepts-of-AI by Anupama Kate .pptx

Data Warehouse , Data Cube Computation

Industrialised data - the key to AI success.pdf

Week 1 - Data Mining the City

1. DATA MINING THE CITY Weds 7p-9p 200 Buell Violet Whitney, vw2205@columbia.edu please take a moment to say why you’re here: shoutkey.com/carrot

2. No computers please

3. Except when we need them

4. Class Overview

5. Class Overview D<>D Getting Data

6. Class Overview D<>D Getting Data What are data?

7. Class Overview D<>D Getting Data What are data? D<>D Cleaning Data

8. Class Overview D<>D Getting Data What are data? D<>D Cleaning Data Reflection/Attendance

9. hi! ( ﾟヮﾟ)

10.

11.

12.

13. Platform Society Foursquare 2008 Big Data Widespread Adoption Democratization of data Statistics Bayes Theorem (1763) Regression (1805) Computer Age Turing (1936) Neural Networks (1943) Evolutionary Computation (1965) Databases (1970s) Genetic Algorithms (1975) Data Mining KDD or Knowledge Discovery from Databases (1989) Supervised machine learning (1992) Data Science (2001)

14. Platform SocietyStatistics Computer Age Data Mining

15. class overview (ʘᗩʘ')

16.

17.

18.

19.

20.

21.

22. cities are platforms

23. cities are networks

24. redevelopment of “blighted” areas racial redlining city data isn’t new

25. quantity and ubiquity

26. data for designers

27. develop hypothesis

28. Python APIs Processing Batch Processes After Effects scripts Sorting Excel formulas Geocoding Recommendation Systems

29. critical data usage

30.

31.

32.

33.

34.

35. Machine Learning Pattern Recognition Algorithms High-Performance Computing Statistics Database Systems Data Warehouse Information Retrieval Applications Data Mining Visualization class overview (ʘᗩʘ')

36. class overview (ʘᗩʘ')

37. Data Selection Pre-processing & Cleaning Data Mining Interpretation/ Evaluation Feature Selection class overview (ʘᗩʘ')

38. 40% 30% 30% Attendance Work WklyPostsFinalProject class overview (ʘᗩʘ')

39. 40% Attendance class overview (ʘᗩʘ')

40. 30% 30% Work WklyPostsFinalProject class overview (ʘᗩʘ')

41. 30% 30% Work WklyPostsFinalProject class overview (ʘᗩʘ')

42. class overview (ʘᗩʘ') 30% 30% Work WklyPostsFinalProject YOUR EPIC PROJECT! Are Airbnb prices higher in neighborhoods that are more diverse?

43. class overview (ʘᗩʘ') 30% 30% Work WklyPostsFinalProject party!!!

44. d<>d (☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜)

45. d<>d (☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜)

46. d<>d (☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜) Best of Luck with the Wall Officer Involved

47. d<>d (☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜) Data Selection API

48. What are data? ¯_(ツ)_/¯

49. What are data? ¯_(ツ)_/¯ Why visualization matters

50. What are data? ¯_(ツ)_/¯ Anscombe’s Quartet

51. What are data? ¯_(ツ)_/¯ Anscombe’s Quartet

52. What are data? ¯_(ツ)_/¯ Why size matters

53. What are data? ¯_(ツ)_/¯ Why size matters

54. What are data? ¯_(ツ)_/¯ How are data represented?

55. What is data? ¯_(ツ)_/¯ 11111111 01010011 00001000

56. What is data? ¯_(ツ)_/¯ 11111111 FF 01010011 83 00001000 08

57. What is data? ¯_(ツ)_/¯ 11111111 FF red = 255 01010011 83 green = 83 00001000 08 blue = 8

58. What is data? ¯_(ツ)_/¯

59. What is data? ¯_(ツ)_/¯ binary <----------------- human readable “encoding”“encoding”

60. What is data? ¯_(ツ)_/¯ binary -----------------> human readable “decoding”“decoding”

61. d<>d (☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜)

62. Cleaning data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧

63. d<>d (☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜) Pre-processing & Cleaning

64. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ Pre-processing & Cleaning

65. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ Pre-processing & Cleaning

66. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ AttributesPre-processing & Cleaning rating rating rating rating

67. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ AttributesPre-processing & Cleaning address address address address

68. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ AttributesPre-processing & Cleaning open hours open hours open hours

69. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ Pre-processing & Cleaning Objects Restaurant 1 Restaurant 2 Restaurant 3 Restaurant 4

70. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ Feature Selection address address address address

71. First Assignment

72. DATA MINING THE CITY Weds 7p-9p 200 Buell Violet Whitney, vw2205@columbia.edu attendance/reflection: shoutkey.com/us

Editor's Notes

We’re going to do some exercises: this first one will be on getting data which will start the weekly assignment. D<>D just means paired designers, were going to pair up with whoever has computers because its more fun together, and then we can meet each other
I just graduated with my MArch from GSAPP
Aleppo project at CSR
sidewalk
Where we fit into history
Kings College practiced statistics through engineeringThe world’s most powerful computer at Watson Lab 1954, Paperless studio (CAD) CBIP - Columbia Building Intelligence Project - data/metric-driven design of the built environment Columbia also hosted Cities Lab and Network Cities Center for Spatial Research - humanitarian mapping This is the best place for technology and architecture
As Professor José van Dijck has described, the computerization of every aspect of life has created a Platform society.
Today most of our social and economic relations take place through platforms like Facebook and Venmo
Tinder’s matching algorithm leads to an increasing number of matches and marriages each year. Ultimately its algorithm will shape the genetic makeup of the human race, as swipes are made, humans are matched and babies are born.
The filters of StreetEasy and Apartment Finder --literally filter the makeup of --who lives in what neighborhoods-- reprogramming entire city zones.
Where the Nolli map once exposed accessible public space, Yelp is now telling individuals what spaces they should like, but everyone sees a different map. These recommendation systems algorithmically segregate cities, generating spatialized filter bubbles which choreograph pedestrian flows through siloed canals across the city.
From Yelp reviews directing people to preferred restaurants to Airbnb reprogramming homes into vacation rentals, the invisible code that powers a city’s use may have more drastic influence than any physical invention in the last century.
But cities have always operated as platforms, as Manuel Castells states - they are the ‘material interfaces’ that connect individual city dwellers.
Just like the networks on the internet, room adjacencies and hallways too act like networks.
not only have cities operated like platforms, the usage of data in cities isn’t new. -- In the 30s surveys and statistics about the makeup of a place were used to justify the redevelopment of “blighted areas” --and for racial redlining. So what is so different about data in the city now?
Today its the quantity and ubiquity of that data which is new. The democratization of data through public APIs allow various apps and lone coders to access giant pools of data dropped by tiny transactions throughout the city.
This interconnectedness and availability of this data gives immense power to designers to choreograph the use of cities and speculate creatively about the urban environment.
This course will focus on encoding spatial analytical processes. We will hypothesize about the relationships of tools and space, as well as develop models and simulations so designers can gain a foothold in the changing landscape of the digital city.
We will develop a technical training in relevant techniques: using Python, public APIs, batch image and video processes, and visualization techniques in Processing
As well as a critical understanding of the social, economic, and political dynamics caused by these technologies such as data bias, and privacy issues.
In Session A, we will learn about data types, preprocessing data, about location and accuracy
About mapping Data & Other Visualization techniques, About defining Spatial Patterns About recommendation systems
And about Pixels, Images, Video, and computer vision
Session B will be run as workshops tailored to your specific interests (such as sentiment analysis or natural language processing) and will give you the opportunity to deep dive into your own project which can orient around your studio.
Workshops will include expert guest critics from data, cloud computing and urban analytics.
Set of processes or methods for discovering patterns
We’ll do a quick reflection at the end of each class through a google form to give you the opportunity to submit regular feedback on the class as well as mark yourself as here
Every week there will be a tutorial or an assignment that will develop your Project which you will post on Medium. Who knows what Medium is?
Every week there will be a tutorial or an assignment that will develop your Project which you will post on Medium. We’ll get started on the first week’s assignment and you’ll continue it at the end of class.
The course project asks students to use at least 2 NYC datasets to generate a visual argument about change in the city. Projects will be individual, however students are encouraged to share their data sets and methods with a pair coding partner.
Super open on what people want to do for midterm and final review. critics?
Who has computers? groups
Google Street View is an amazing archive of the city but has yet to be easily sortable. If we want to see all locations that are marked as historic in New York City, we would need to look up each location from a database of addresses copy the address into Google Maps, drop the pegman into each location, screenshot each street scene, and then repeat the steps for each location before being able to compare them all.
Artists like Josh Begley have found smarter ways to sample Google Street View. He uses Google’s API and custom scripts to automate the downloading of street view from various locations. In “Officer Involved”, he uses databases of police brutality (collected by non-governmental and news organizations) to sample Street View scenes at the location of each incident, thus immersing us in “the environment of someone’s last moment”.
Where is data stored?-----Flat files, Databases and websites, APIs - whats an API? Google Maps (church, CVS, bridge, bar, etc) ------> google sheets manually scraping
Each dataset has the same summary statistics (mean, standard deviation, correlation),...
and the datasets are clearly different, and visually distinct). Anscombe’s Quartet is the classic example showing how visualization can trump statistics alone.
In a paper by Benoit Mandelbrot on the coastline of Britain it was shown that it is inherently nonsensical to discuss certain spatial concepts(such as the length of the perimeter of the coastline) despite that there me an inherent presumption that discussing the length of a coastline seems valid. Lengths in ecology depend directly on the scale at which they are measured and experienced. So while surveyors commonly measure the length of a river, this length only has meaning in the context of the relevance of the measuring technique to the question under study.
He depicted this idea behind fractal geometry, that certain forms and branching patterns could be seen at multiple scales
binary is the way computers store data at their lowest level, as electric charge.
We don’t use ones and zeroes. When working with binary data, we often use hexadecimal instead.
But given the proper context, this hexadecimal string actually represents color (you’ve probably used these numbers in photoshop)
What you may not know is that internally, most data are held as long, one-dimensional sequences of values, either binary (as hexadecimal) or text (as characters).
In computers, encoding is the process of putting a sequence of characters (letters, numbers, punctuation, and certain symbols) into a specialized format for efficient transmission or storage.
Decoding is the opposite process -- the conversion of an encoded format back into the original sequence of characters.
Now that we know a bit about what data are and how they’re stored… lets get into formatting data
We’re going to use location data to get streetview images from Google’s API (their open data)
We want to clean our data to turn our addresses into lat, and longitude
When we’re talking about our data, there are a couple terms to know...
When we’re talking about our data, there are a couple terms to know...
...
...

Week 1 - Data Mining the City

Recommended

Recommended

More Related Content

Similar to Week 1 - Data Mining the City

Similar to Week 1 - Data Mining the City (20)

Recently uploaded

Recently uploaded (20)

Week 1 - Data Mining the City

Editor's Notes