13. Platform Society
Foursquare 2008
Big Data
Widespread Adoption
Democratization of data
Statistics
Bayes Theorem (1763)
Regression (1805)
Computer Age
Turing (1936)
Neural Networks (1943)
Evolutionary Computation (1965)
Databases (1970s)
Genetic Algorithms (1975)
Data Mining
KDD or Knowledge Discovery from
Databases (1989)
Supervised machine learning (1992)
Data Science (2001)
35. Machine Learning Pattern Recognition
Algorithms
High-Performance
Computing
Statistics
Database Systems
Data Warehouse
Information Retrieval Applications
Data Mining
Visualization
class overview
(ʘᗩʘ')
72. DATA MINING THE CITY
Weds 7p-9p 200 Buell
Violet Whitney, vw2205@columbia.edu
attendance/reflection:
shoutkey.com/us
Editor's Notes
We’re going to do some exercises: this first one will be on getting data which will start the weekly assignment.
D<>D just means paired designers, were going to pair up with whoever has computers because its more fun together, and then we can meet each other
I just graduated with my MArch from GSAPP
Aleppo project at CSR
sidewalk
Where we fit into history
Kings College practiced statistics through engineeringThe world’s most powerful computer at Watson Lab 1954,
Paperless studio (CAD)
CBIP - Columbia Building Intelligence Project - data/metric-driven design of the built environment
Columbia also hosted Cities Lab and Network Cities
Center for Spatial Research - humanitarian mapping
This is the best place for technology and architecture
As Professor José van Dijck has described, the computerization of every aspect of life has created a Platform society.
Today most of our social and economic relations take place through platforms like Facebook and Venmo
Tinder’s matching algorithm leads to an increasing number of matches and marriages each year. Ultimately its algorithm will shape the genetic makeup of the human race, as swipes are made, humans are matched and babies are born.
The filters of StreetEasy and Apartment Finder --literally filter the makeup of --who lives in what neighborhoods-- reprogramming entire city zones.
Where the Nolli map once exposed accessible public space, Yelp is now telling individuals what spaces they should like, but everyone sees a different map. These recommendation systems algorithmically segregate cities, generating spatialized filter bubbles which choreograph pedestrian flows through siloed canals across the city.
From Yelp reviews directing people to preferred restaurants to Airbnb reprogramming homes into vacation rentals, the invisible code that powers a city’s use may have more drastic influence than any physical invention in the last century.
But cities have always operated as platforms, as Manuel Castells states - they are the ‘material interfaces’ that connect individual city dwellers.
Just like the networks on the internet, room adjacencies and hallways too act like networks.
not only have cities operated like platforms, the usage of data in cities isn’t new. -- In the 30s surveys and statistics about the makeup of a place were used to justify the redevelopment of “blighted areas” --and for racial redlining. So what is so different about data in the city now?
Today its the quantity and ubiquity of that data which is new. The democratization of data through public APIs allow various apps and lone coders to access giant pools of data dropped by tiny transactions throughout the city.
This interconnectedness and availability of this data gives immense power to designers to choreograph the use of cities and speculate creatively about the urban environment.
This course will focus on encoding spatial analytical processes. We will hypothesize about the relationships of tools and space, as well as develop models and simulations so designers can gain a foothold in the changing landscape of the digital city.
We will develop a technical training in relevant techniques: using Python, public APIs, batch image and video processes, and visualization techniques in Processing
As well as a critical understanding of the social, economic, and political dynamics caused by these technologies such as data bias, and privacy issues.
In Session A, we will learn about data types, preprocessing data, about location and accuracy
About mapping Data & Other Visualization techniques,
About defining Spatial Patterns
About recommendation systems
And about Pixels, Images, Video, and computer vision
Session B will be run as workshops tailored to your specific interests (such as sentiment analysis or natural language processing) and will give you the opportunity to deep dive into your own project which can orient around your studio.
Workshops will include expert guest critics from data, cloud computing and urban analytics.
Set of processes or methods for discovering patterns
We’ll do a quick reflection at the end of each class through a google form to give you the opportunity to submit regular feedback on the class as well as mark yourself as here
Every week there will be a tutorial or an assignment that will develop your Project which you will post on Medium.
Who knows what Medium is?
Every week there will be a tutorial or an assignment that will develop your Project which you will post on Medium.
We’ll get started on the first week’s assignment and you’ll continue it at the end of class.
The course project asks students to use at least 2 NYC datasets to generate a visual argument about change in the city. Projects will be individual, however students are encouraged to share their data sets and methods with a pair coding partner.
Super open on what people want to do for midterm and final review.
critics?
Who has computers?
groups
Google Street View is an amazing archive of the city but has yet to be easily sortable. If we want to see all locations that are marked as historic in New York City, we would need to look up each location from a database of addresses copy the address into Google Maps, drop the pegman into each location, screenshot each street scene, and then repeat the steps for each location before being able to compare them all.
Artists like Josh Begley have found smarter ways to sample Google Street View. He uses Google’s API and custom scripts to automate the downloading of street view from various locations. In “Officer Involved”, he uses databases of police brutality (collected by non-governmental and news organizations) to sample Street View scenes at the location of each incident, thus immersing us in “the environment of someone’s last moment”.
Where is data stored?-----Flat files, Databases and websites, APIs - whats an API?
Google Maps (church, CVS, bridge, bar, etc) ------> google sheets
manually scraping
Each dataset has the same summary statistics (mean, standard deviation, correlation),...
and the datasets are clearly different, and visually distinct).
Anscombe’s Quartet is the classic example showing how visualization can trump statistics alone.
In a paper by Benoit Mandelbrot on the coastline of Britain it was shown that it is inherently nonsensical to discuss certain spatial concepts(such as the length of the perimeter of the coastline) despite that there me an inherent presumption that discussing the length of a coastline seems valid. Lengths in ecology depend directly on the scale at which they are measured and experienced. So while surveyors commonly measure the length of a river, this length only has meaning in the context of the relevance of the measuring technique to the question under study.
He depicted this idea behind fractal geometry, that certain forms and branching patterns could be seen at multiple scales
binary is the way computers store data at their lowest level, as electric charge.
We don’t use ones and zeroes. When working with binary data, we often use hexadecimal instead.
But given the proper context, this hexadecimal string actually represents color (you’ve probably used these numbers in photoshop)
What you may not know is that internally, most data are held as long, one-dimensional sequences of values, either binary (as hexadecimal) or text (as characters).
In computers, encoding is the process of putting a sequence of characters (letters, numbers, punctuation, and certain symbols) into a specialized format for efficient transmission or storage.
Decoding is the opposite process -- the conversion of an encoded format back into the original sequence of characters.
Now that we know a bit about what data are and how they’re stored… lets get into formatting data
We’re going to use location data to get streetview images from Google’s API (their open data)
We want to clean our data to turn our addresses into lat, and longitude
When we’re talking about our data, there are a couple terms to know...
When we’re talking about our data, there are a couple terms to know...