Week 1 - Data Mining the City

•Download as PPTX, PDF•

1 like•531 views

Columbia University - Graduate School of Architecture and Planning Preservation - Data Mining the City- Week 1 Data, Getting Data, Cleaning Data

DATA MINING THE CITY
Weds 7p-9p 200 Buell
Violet Whitney, vw2205@columbia.edu
please take a moment to
say why you’re here:
shoutkey.com/carrot

Class Overview
D<>D Getting Data
What are data?

Class Overview
D<>D Getting Data
What are data?
D<>D Cleaning Data

Class Overview
D<>D Getting Data
What are data?
D<>D Cleaning Data
Reflection/Attendance

Platform Society
Foursquare 2008
Big Data
Widespread Adoption
Democratization of data
Statistics
Bayes Theorem (1763)
Regression (1805)
Computer Age
Turing (1936)
Neural Networks (1943)
Evolutionary Computation (1965)
Databases (1970s)
Genetic Algorithms (1975)
Data Mining
KDD or Knowledge Discovery from
Databases (1989)
Supervised machine learning (1992)
Data Science (2001)

Platform SocietyStatistics Computer Age Data Mining

redevelopment of “blighted” areas racial redlining
city data isn’t new

Python
APIs
Processing
Batch Processes
After Effects scripts
Sorting Excel
formulas
Geocoding
Recommendation
Systems

Machine Learning Pattern Recognition
Algorithms
High-Performance
Computing
Statistics
Database Systems
Data Warehouse
Information Retrieval Applications
Data Mining
Visualization
class overview
(ʘᗩʘ')

Data Selection
Pre-processing &
Cleaning
Data Mining
Interpretation/
Evaluation
Feature Selection
class overview
(ʘᗩʘ')

40%
30%
30%
Attendance
Work
WklyPostsFinalProject
class overview
(ʘᗩʘ')

30%
30%
Work
WklyPostsFinalProject
class overview
(ʘᗩʘ')

class overview
(ʘᗩʘ')
30%
30%
Work
WklyPostsFinalProject
YOUR EPIC PROJECT!
Are Airbnb prices higher in
neighborhoods that are more
diverse?

class overview
(ʘᗩʘ')
30%
30%
Work
WklyPostsFinalProject
party!!!

d<>d
(☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜)
Best of Luck with the Wall
Officer Involved

What are data?
¯_(ツ)_/¯
Why visualization matters

What are data?
¯_(ツ)_/¯
Anscombe’s Quartet

What are data?
¯_(ツ)_/¯
Why size matters

What are data?
¯_(ツ)_/¯
How are data
represented?

What is data?
¯_(ツ)_/¯
11111111 01010011 00001000

What is data?
¯_(ツ)_/¯
11111111
FF
01010011
83
00001000
08

What is data?
¯_(ツ)_/¯
11111111
FF
red = 255
01010011
83
green = 83
00001000
08
blue = 8

What is data?
¯_(ツ)_/¯
binary <----------------- human readable
“encoding”“encoding”

What is data?
¯_(ツ)_/¯
binary -----------------> human readable
“decoding”“decoding”

d<>d
(☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜)
Pre-processing &
Cleaning

Cleaning Data
(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧
Pre-processing &
Cleaning

Cleaning Data
(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧
AttributesPre-processing &
Cleaning
rating
rating
rating
rating

Cleaning Data
(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧
AttributesPre-processing &
Cleaning
address
address
address
address

Cleaning Data
(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧
AttributesPre-processing &
Cleaning
open hours
open hours
open hours

Cleaning Data
(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧
Pre-processing &
Cleaning
Objects
Restaurant 1
Restaurant 2
Restaurant 3
Restaurant 4

Cleaning Data
(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧
Feature Selection
address
address
address
address

DATA MINING THE CITY
Weds 7p-9p 200 Buell
Violet Whitney, vw2205@columbia.edu
attendance/reflection:
shoutkey.com/us

Big data is disrupting many industries by generating and analyzing large amounts of data from diverse sources, enabling new products and services. Uber and Airbnb have disrupted transportation and hospitality by leveraging big data, while Waze uses traffic data. Sensors are now everywhere and producing huge amounts of scalable data that can create value by addressing real problems. Companies should capture data from all activities, use diverse sources, solve core issues, and retain data for unanticipated future uses.

#5 DataBeersBCN -"How to do Data Journalism… and not die trying"

DataBeersBCN

1. The document discusses the history and evolution of data journalism, from early examples in the 1800s to modern practices using new digital tools. 2. It outlines key aspects of modern data journalism, such as multidisciplinary teams and making sources and methods transparent. 3. The author argues that data journalism is increasingly important for accountability by enabling investigative reporting using transparency laws and open data.

S. Rinzivillo - Visualization in Mobility Data Mining

Istituto nazionale di statistica

The document discusses visualization techniques for mobility data mining. It describes work done by the Knowledge Discovery and Data Mining Laboratory on several EU projects involving GPS data to analyze individual and collective mobility patterns. Visualizations are shown of individual daily and weekly movements, borders of human mobility, and traffic flows into and out of the city of Pisa. The goal is to develop techniques to create an atlas of urban mobility and a permanent mobility observatory.

3D models: issues of accuracy

Rollo Home

The document discusses the use of 3D data management and visualization for places. It questions whether a national 3D data set is needed or if it is looking for problems to solve. It also discusses using 3D modeling at different levels of detail to improve decision making in areas like information management, conservation, urban planning, and public consultation. Level of detail can range from block models to detailed interior models. CityGML is presented as an open standard for 3D city and landscape models.

Nell Watson

FDMagazine

The document discusses the history and rapid growth of artificial intelligence, highlighting major AI breakthroughs and the datasets and algorithms that enabled them, with an average of 3 years between breakthroughs and datasets being created versus 18 years for the enabling algorithms. It then explores potential future directions for AI including its growing utility, the rise of machine-readable content on the semantic web, the potential for artificial general intelligence, and applications across various industries.

Urban Planning Using APIS

Mark Daggett

The document discusses how data visualization and geospatial data can be used for urban planning purposes. It provides examples of how game designers use geospatial death maps to improve game levels. Additionally, it discusses how John Snow's cholera map was an early example of using geospatial data and how individuals and organizations now emit large amounts of geospatial data daily through connected devices. The document advocates that urban planners could use this abundant geospatial data from citizens to inform community-focused planning and design processes. It provides examples of how geospatial data has been visualized regarding topics like Netflix queues, political donations, tourist vs local photos, and the costs of incarceration to provide new insights.

Un Pulse Camp - Humanitarian Innovation

bodaceacat

This document discusses various cool technologies including systems engineering, crowdsourcing, open data, open linked data, autonomy, GIS, image processing, pattern analysis, intelligent agents, assisted creativity, and paper-based user interfaces. It explores how these technologies can be combined with people and processes to drive innovation through ideas, crisis response teams, data applications, spatial data mapping, image analysis, summarization, swarm intelligence, creative collaboration, and interactive documents.

The document discusses the evolution of technology from the 1980s to the present and future, focusing on the convergence of different technologies. It describes how in the 1980s, the personal computer started to converge with other technologies like the television and newspaper. In the 1990s and 2000s, further convergence occurred as the personal computer combined with radio, CD players, games, social networks, and telephones. The document predicts that future technologies will involve reality and virtual worlds on devices with ultra-high connectivity. It also discusses how research and business models are becoming more integrated and converged across different fields.

CDT Away Day Talk: Qualitative–Quantitative reasoning and lightweight numbers

Alan Dix

Talk at EPIC CDT Away Day, St Davids Hotel, Cardiff, 11th April 2024. https://alandix.com/academic/talks/CDT-away-day-April-2024-QQ/ As academics we need to deal with numbers including project management spreadsheets and student marks. In addition, they are part of day-to-day life whether household budgeting or working out how many socks to pack for a journey. Perhaps most crucially, many national and global issues require an understanding of numeric information from climate change to tax rates, and of course the Covid-19 pandemic. If citizens are not able to make sense of this, democracy fails. Of course, many are not only uncertain when dealing with numbers, but suffer more or less extreme maths anxiety. Indeed a recent UK survey found that, “over a third of adults (35%) say that doing maths makes them feel anxious, while one in five are so fearful it even makes them feel physically sick”. Sometimes detailed calculations are necessary, but often the critical skill is qualitative–quantitative reasoning, that is a qualitative understanding of quantitative phenomena. This can after be aided by the ability to use back-of-the-envelope calculations and dealing with lightweight numeric information. This talk discusses these issues and presents some prototype tools to explore the design space for personal numeric information. This talk is largely the same as the one of the same name given at Ulster University in February. However, the slides have been updated to correct web material misattributed to BBC which was actually Guardian. An eagle-eyed member of the audience spotted that the font in the screenshot was one found in the Guardian online web and not the BBC.

This is not your grandmother's online map: Advancing your mission with GIS tools

Chicago Technology Cooperative

This document provides an overview of geographic information systems (GIS) and mapping tools for non-profits. It discusses how maps can be used for storytelling, advocacy, program delivery, research, fundraising and community mapping. It also covers topics like data sources, tools, stakeholder participation and challenges around data acquisition. Overall, the document serves as an introduction to using maps and GIS for social causes.

Data as a Creative Material

Audree Lapierre

There's a wealth of data readily available, but few people know what to do with it. Based on our 7 years of practical experience running the leading Canadian data-visualization studio and working with high-profile clients, we share practical ways to use data in design & communications, while giving an overview of the challenges & opportunities ahead. Creatives will be interested in learning how to use data in their works, marketers will discover new ways of communicating information. Five things you will learn: 1- How data can be used as an input in the create process 2- How data can be used in communication & public relation 3- Discover "the spectrum of visualization" 4- Learn about the challenges of working with data 5- Discover the new disciplines emerging around the usage of data

Functional Leap of Faith (Keynote at JDay Lviv 2014)

Tomer Gabel

Keynote talk given at JDay Lviv 2014 in Ukraine (http://www.jday.com.ua/). Video coming soon. Abstract: Some say that there's nothing new under the sun. However, looking back on five to six decades of computing, it's easy to see that things progress at their own leisurly pace. Structured programming, originating in the '60s, did not gain mainstream adoption until the '80s; object-oriented programming was hotly debated in the '70s and '80s but only gained widespread acceptance in the '90s. Every couple of decades sees an engineering leap that radically improves the software engineering discipline across the board. I believe we are now at such an inflection point, with functional programming concepts slowly sifting into the mainstream. After this talk, I hope you will too.

The Digital Divides or the third industrial revolution: concepts and figures

Ismael Peña-López

It is usual to think about the digital divide as a very concrete aspect of the impact of ICTs, mainly concerning whether there is an existence of infrastructures (sometimes computers, sometimes computers connected to the Internet). It is usual to think about digital literacy as the ability of someone to switch on a computer and playing some cards game, sending an e-mail and, optimistically, run some word processor and type in a love letter. It is usual to think about ICTs as something that won’t make disappear the hunger in the world or heal the thousands of people suffering from countless diseases, specially in places where citizens live with less than one dollar a day. It is usual to think about the digital divide as something that does not affect me, as I live on the sunny side of the world, in a developed country that will last this way for centuries. With the aim to dismantle all these (almost) false assumptions, the seminar will try and give "correct" definitions for concepts such as Digital Divide, Digital Literacy, eReadiness or eAwareness and show examples on how ICTs can help underdeveloped and developing countries to reach higher quotas of welfare… and how so-called developed countries can exchange places with the lesser developed ones in case they do not pay attention to what is happening in a global world. More info, citation and download, here: http://ictlogy.net/bibciter/reports/projects.php?idp=287

Iftf 20191206 v9

ISSIP

This document discusses the future of AI and presents a timeline for progress and cost reductions. It predicts that by 2035, AI systems capable of human-level perception will exist, and by 2055, systems may develop human-level cognition. The cost of AI is expected to decrease dramatically over time, with supercomputers potentially costing $1,000 by 2040 and $1 by 2060. Experts may be surprised if progress is faster or slower than the predicted timeline. The document encourages students to help build the future of AI through open source contributions.

Using Graphs to Enable National-Scale Analytics

Neo4j

top 10 Data Mining Algorithms

Nagasuri Bala Venkateswarlu

eChicago Conference

Shireen Mitchell

Community Technology Centers (CTCs) have struggled with changing names and priorities but have also achieved victories in expanding access to technology. CTCs originated in the 1980s to provide equal computer access and now over 1,000 are united through the Community Technology Centers' Network. Major accomplishments include federal grants in the 1990s-2000s totaling over $150 million. However, CTCs now face challenges such as broadband deployment without training, social media risks, and changing technologies. The top priorities for CTCs are expanding broadband access combined with training, supporting legislation to fund community technology programs, and ensuring CTCs remain relevant in a changing digital landscape.

geostack

Joana Simoes

The document discusses the value of data and the rise of big data. It notes that Matthew Fontaine Maury in the 1800s recognized the value of analyzing ship log data collectively. Today, new sources of data like sensors have exploded the volume of data. Characteristics of big data include volume, variety, and velocity. Technological challenges include scalability, heterogeneity, and low latency. The document provides examples of non-relational databases and MapReduce as approaches to handle big data.

HS DAM Chicago 2019 - Reframing the Conversation

Christina Gibbs

Reframing the Conversation - Innovations in DAM, Collections Information, and Data at the Detroit Institute of Arts September 24, 9:55AM Presenters: Jessica Herczeg-Konecny, Digital Asset Manager, and Christina Gibbs, Collections Database Manager Museums need to publish and widely share their data sets and images to remain relevant in today’s digital age. What does it take to provide the widest possible access to digital collections? This case study will reveal insights into the need for interoperability between Collections Information Systems, DAM, and the greater semantic web. Christina and Jessica will address challenges, risk analysis, and outcomes as they facilitate building a bigger and stronger foundation through implementing a new DAM system as well as an API.

Cs501 dm intro

Kamal Singh Lodhi

The document provides an introduction to data mining. It discusses the growth of data from terabytes to petabytes and how data mining can help extract knowledge from large datasets. The document outlines the evolution of sciences from empirical to theoretical to computational and now data-driven. It also describes the evolution of database technology and defines data mining as the process of discovering interesting patterns from large amounts of data. The key steps of the knowledge discovery process are discussed.

Data Science Chapter 1.pdf

Mpumelelo Ndlovu

The document defines data science as incorporating machine learning, data mining, capturing and cleaning unstructured data from sources like social media, using big data technologies to store and process large datasets, and considering ethics and regulation. It lists the key skills required of a data scientist as including communication, statistics, computer science, machine learning, data wrangling, visualization, and domain expertise. Common data science techniques are described as clustering, classification, association rule mining, and outlier detection.

Social media quiz

public-i

unit 1 DATA MINING.ppt

BREENAHICETSTAFFCSE

This document provides an introduction to data mining concepts and techniques. It discusses why data mining is needed due to the massive growth of data, defines data mining as the extraction of patterns from large data sets, and outlines the data mining process. A variety of data types that can be mined are described, including relational, transactional, time-series, text and web data. The document also covers major data mining functionalities like classification, clustering, association rule mining and trend analysis. Top 10 popular data mining algorithms are listed.

A Training & Simulation Perspective on Maritime Information & Automation

Andy Fawkes

BigData & Supply Chain: A "Small" Introduction

Ivan Gruer

Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...

Matthias Stürmer

The global datasphere is growing from 60 Zettabytes today to 175 Zettabytes in 2025. Much of this data and software is privately controlled by American and Chinese corporations with enormous market power. Only the seven largest big tech companies such as Microsoft, Facebook, Alibaba or Tencent already have a market capitalization of over USD 8700 billion, which is almost three times India's GDP. This trend is called data colonialism of the cyber space. What problems arise from this and how can they be solved? The concept of digitale sustainability addresses this challenge by presenting a new pathway towards greater data sovereignty.

Population Growth in Bataan: The effects of population growth around rural pl...

Bill641377

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理

nuttdpt

毕业原版【微信:176555708】【(UCSF毕业证书)旧金山分校毕业证】【微信:176555708】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信176555708】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信176555708】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

Similar to Week 1 - Data Mining the City

Data driven innovation

Big Data Value Association

Technology for the Digital Citizen

Levi Kabwato

CDT Away Day Talk: Qualitative–Quantitative reasoning and lightweight numbers

Alan Dix

This is not your grandmother's online map: Advancing your mission with GIS tools

Chicago Technology Cooperative

Data as a Creative Material

Audree Lapierre

Functional Leap of Faith (Keynote at JDay Lviv 2014)

Tomer Gabel

The Digital Divides or the third industrial revolution: concepts and figures

Ismael Peña-López

Iftf 20191206 v9

ISSIP

Using Graphs to Enable National-Scale Analytics

Neo4j

top 10 Data Mining Algorithms

Nagasuri Bala Venkateswarlu

eChicago Conference

Shireen Mitchell

geostack

Joana Simoes

HS DAM Chicago 2019 - Reframing the Conversation

Christina Gibbs

Cs501 dm intro

Kamal Singh Lodhi

Data Science Chapter 1.pdf

Mpumelelo Ndlovu

Social media quiz

public-i

unit 1 DATA MINING.ppt

BREENAHICETSTAFFCSE

A Training & Simulation Perspective on Maritime Information & Automation

Andy Fawkes

BigData & Supply Chain: A "Small" Introduction

Ivan Gruer

Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...

Matthias Stürmer

Similar to Week 1 - Data Mining the City (20)

Data driven innovation

Technology for the Digital Citizen

CDT Away Day Talk: Qualitative–Quantitative reasoning and lightweight numbers

This is not your grandmother's online map: Advancing your mission with GIS tools

Data as a Creative Material

Functional Leap of Faith (Keynote at JDay Lviv 2014)

The Digital Divides or the third industrial revolution: concepts and figures

Iftf 20191206 v9

Using Graphs to Enable National-Scale Analytics

top 10 Data Mining Algorithms

eChicago Conference

geostack

HS DAM Chicago 2019 - Reframing the Conversation

Cs501 dm intro

Data Science Chapter 1.pdf

Social media quiz

unit 1 DATA MINING.ppt

A Training & Simulation Perspective on Maritime Information & Automation

BigData & Supply Chain: A "Small" Introduction

Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...

Recently uploaded

Population Growth in Bataan: The effects of population growth around rural pl...

Bill641377

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理

nuttdpt

一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理

g4dpvqap0

毕业原版【微信:41543339】【(Glasgow毕业证书)格拉斯哥大学毕业证】【微信:41543339】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信41543339】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信41543339】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理

74nqk8xf

毕业原版【微信:41543339】【(Coventry毕业证书)考文垂大学毕业证】【微信:41543339】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信41543339】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信41543339】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

University of New South Wales degree offer diploma Transcript

soxrziqu

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样

apvysm8

原版一模一样【微信：741003700 】【(uts毕业证书)悉尼科技大学毕业证学历证书】【微信：741003700 】学位证，留信认证（真实可查，永久存档）offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原海外各大学 Bachelor Diploma degree, Master Degree Diploma 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

Everything you wanted to know about LIHTC

Roger Valdez

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理

nuttdpt

毕业原版【微信:176555708】【(UCSB毕业证书)圣芭芭拉分校毕业证】【微信:176555708】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信176555708】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信176555708】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf

GetInData

Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots. In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms. Why do we need yet another (open-source ) Copilot? How can we build one? Architecture and evaluation

Learn SQL from basic queries to Advance queries

manishkhaire30

Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively. Key Highlights: Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation. Advanced Queries: Learn to craft complex queries to uncover deep insights from your data. Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets. Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios. Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making. Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data! #DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics

Global Situational Awareness of A.I. and where its headed

vikram sood

You can see the future first in San Francisco. Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum. The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war. Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change. Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride. Let me tell you what we see.

一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理

bopyb

毕业原版【微信:176555708】【(GWU,GW毕业证书)乔治·华盛顿大学毕业证】【微信:176555708】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信176555708】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信176555708】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

一比一原版(UofS毕业证书)萨省大学毕业证如何办理

v3tuleee

原版定制【微信:41543339】【(UofS毕业证书)萨省大学毕业证】【微信:41543339】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信41543339】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信41543339】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

Timothy Spann

一比一原版(Chester毕业证书)切斯特大学毕业证如何办理

74nqk8xf

毕业原版【微信:41543339】【(Chester毕业证书)切斯特大学毕业证】【微信:41543339】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信41543339】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信41543339】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake

Walaa Eldin Moustafa

Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines. #SQL #Views #Privacy #Compliance #DataLake

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...

Social Samosa

End-to-end pipeline agility - Berlin Buzzwords 2024

Lars Albertsson

We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines. A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more. A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream. Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.

Predictably Improve Your B2B Tech Company's Performance by Leveraging Data

Kiwi Creative

Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts. Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!). From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing. - - - This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA. Watch the video recording at https://youtu.be/5vjwGfPN9lw Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/

My burning issue is homelessness K.C.M.O.

rwarrenll

Recently uploaded (20)

Population Growth in Bataan: The effects of population growth around rural pl...

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理

一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理

一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理

University of New South Wales degree offer diploma Transcript

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样

Everything you wanted to know about LIHTC

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf

Learn SQL from basic queries to Advance queries

Global Situational Awareness of A.I. and where its headed

一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理

一比一原版(UofS毕业证书)萨省大学毕业证如何办理

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

一比一原版(Chester毕业证书)切斯特大学毕业证如何办理

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...

End-to-end pipeline agility - Berlin Buzzwords 2024

Predictably Improve Your B2B Tech Company's Performance by Leveraging Data

My burning issue is homelessness K.C.M.O.

Week 1 - Data Mining the City

1. DATA MINING THE CITY Weds 7p-9p 200 Buell Violet Whitney, vw2205@columbia.edu please take a moment to say why you’re here: shoutkey.com/carrot

2. No computers please

3. Except when we need them

4. Class Overview

5. Class Overview D<>D Getting Data

6. Class Overview D<>D Getting Data What are data?

7. Class Overview D<>D Getting Data What are data? D<>D Cleaning Data

8. Class Overview D<>D Getting Data What are data? D<>D Cleaning Data Reflection/Attendance

9. hi! ( ﾟヮﾟ)

10.

11.

12.

13. Platform Society Foursquare 2008 Big Data Widespread Adoption Democratization of data Statistics Bayes Theorem (1763) Regression (1805) Computer Age Turing (1936) Neural Networks (1943) Evolutionary Computation (1965) Databases (1970s) Genetic Algorithms (1975) Data Mining KDD or Knowledge Discovery from Databases (1989) Supervised machine learning (1992) Data Science (2001)

14. Platform SocietyStatistics Computer Age Data Mining

15. class overview (ʘᗩʘ')

16.

17.

18.

19.

20.

21.

22. cities are platforms

23. cities are networks

24. redevelopment of “blighted” areas racial redlining city data isn’t new

25. quantity and ubiquity

26. data for designers

27. develop hypothesis

28. Python APIs Processing Batch Processes After Effects scripts Sorting Excel formulas Geocoding Recommendation Systems

29. critical data usage

30.

31.

32.

33.

34.

35. Machine Learning Pattern Recognition Algorithms High-Performance Computing Statistics Database Systems Data Warehouse Information Retrieval Applications Data Mining Visualization class overview (ʘᗩʘ')

36. class overview (ʘᗩʘ')

37. Data Selection Pre-processing & Cleaning Data Mining Interpretation/ Evaluation Feature Selection class overview (ʘᗩʘ')

38. 40% 30% 30% Attendance Work WklyPostsFinalProject class overview (ʘᗩʘ')

39. 40% Attendance class overview (ʘᗩʘ')

40. 30% 30% Work WklyPostsFinalProject class overview (ʘᗩʘ')

41. 30% 30% Work WklyPostsFinalProject class overview (ʘᗩʘ')

42. class overview (ʘᗩʘ') 30% 30% Work WklyPostsFinalProject YOUR EPIC PROJECT! Are Airbnb prices higher in neighborhoods that are more diverse?

43. class overview (ʘᗩʘ') 30% 30% Work WklyPostsFinalProject party!!!

44. d<>d (☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜)

45. d<>d (☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜)

46. d<>d (☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜) Best of Luck with the Wall Officer Involved

47. d<>d (☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜) Data Selection API

48. What are data? ¯_(ツ)_/¯

49. What are data? ¯_(ツ)_/¯ Why visualization matters

50. What are data? ¯_(ツ)_/¯ Anscombe’s Quartet

51. What are data? ¯_(ツ)_/¯ Anscombe’s Quartet

52. What are data? ¯_(ツ)_/¯ Why size matters

53. What are data? ¯_(ツ)_/¯ Why size matters

54. What are data? ¯_(ツ)_/¯ How are data represented?

55. What is data? ¯_(ツ)_/¯ 11111111 01010011 00001000

56. What is data? ¯_(ツ)_/¯ 11111111 FF 01010011 83 00001000 08

57. What is data? ¯_(ツ)_/¯ 11111111 FF red = 255 01010011 83 green = 83 00001000 08 blue = 8

58. What is data? ¯_(ツ)_/¯

59. What is data? ¯_(ツ)_/¯ binary <----------------- human readable “encoding”“encoding”

60. What is data? ¯_(ツ)_/¯ binary -----------------> human readable “decoding”“decoding”

61. d<>d (☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜)

62. Cleaning data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧

63. d<>d (☞ﾟヮﾟ)☞ ☜(ﾟヮﾟ☜) Pre-processing & Cleaning

64. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ Pre-processing & Cleaning

65. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ Pre-processing & Cleaning

66. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ AttributesPre-processing & Cleaning rating rating rating rating

67. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ AttributesPre-processing & Cleaning address address address address

68. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ AttributesPre-processing & Cleaning open hours open hours open hours

69. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ Pre-processing & Cleaning Objects Restaurant 1 Restaurant 2 Restaurant 3 Restaurant 4

70. Cleaning Data (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ Feature Selection address address address address

71. First Assignment

72. DATA MINING THE CITY Weds 7p-9p 200 Buell Violet Whitney, vw2205@columbia.edu attendance/reflection: shoutkey.com/us

Editor's Notes

We’re going to do some exercises: this first one will be on getting data which will start the weekly assignment. D<>D just means paired designers, were going to pair up with whoever has computers because its more fun together, and then we can meet each other
I just graduated with my MArch from GSAPP
Aleppo project at CSR
sidewalk
Where we fit into history
Kings College practiced statistics through engineeringThe world’s most powerful computer at Watson Lab 1954, Paperless studio (CAD) CBIP - Columbia Building Intelligence Project - data/metric-driven design of the built environment Columbia also hosted Cities Lab and Network Cities Center for Spatial Research - humanitarian mapping This is the best place for technology and architecture
As Professor José van Dijck has described, the computerization of every aspect of life has created a Platform society.
Today most of our social and economic relations take place through platforms like Facebook and Venmo
Tinder’s matching algorithm leads to an increasing number of matches and marriages each year. Ultimately its algorithm will shape the genetic makeup of the human race, as swipes are made, humans are matched and babies are born.
The filters of StreetEasy and Apartment Finder --literally filter the makeup of --who lives in what neighborhoods-- reprogramming entire city zones.
Where the Nolli map once exposed accessible public space, Yelp is now telling individuals what spaces they should like, but everyone sees a different map. These recommendation systems algorithmically segregate cities, generating spatialized filter bubbles which choreograph pedestrian flows through siloed canals across the city.
From Yelp reviews directing people to preferred restaurants to Airbnb reprogramming homes into vacation rentals, the invisible code that powers a city’s use may have more drastic influence than any physical invention in the last century.
But cities have always operated as platforms, as Manuel Castells states - they are the ‘material interfaces’ that connect individual city dwellers.
Just like the networks on the internet, room adjacencies and hallways too act like networks.
not only have cities operated like platforms, the usage of data in cities isn’t new. -- In the 30s surveys and statistics about the makeup of a place were used to justify the redevelopment of “blighted areas” --and for racial redlining. So what is so different about data in the city now?
Today its the quantity and ubiquity of that data which is new. The democratization of data through public APIs allow various apps and lone coders to access giant pools of data dropped by tiny transactions throughout the city.
This interconnectedness and availability of this data gives immense power to designers to choreograph the use of cities and speculate creatively about the urban environment.
This course will focus on encoding spatial analytical processes. We will hypothesize about the relationships of tools and space, as well as develop models and simulations so designers can gain a foothold in the changing landscape of the digital city.
We will develop a technical training in relevant techniques: using Python, public APIs, batch image and video processes, and visualization techniques in Processing
As well as a critical understanding of the social, economic, and political dynamics caused by these technologies such as data bias, and privacy issues.
In Session A, we will learn about data types, preprocessing data, about location and accuracy
About mapping Data & Other Visualization techniques, About defining Spatial Patterns About recommendation systems
And about Pixels, Images, Video, and computer vision
Session B will be run as workshops tailored to your specific interests (such as sentiment analysis or natural language processing) and will give you the opportunity to deep dive into your own project which can orient around your studio.
Workshops will include expert guest critics from data, cloud computing and urban analytics.
Set of processes or methods for discovering patterns
We’ll do a quick reflection at the end of each class through a google form to give you the opportunity to submit regular feedback on the class as well as mark yourself as here
Every week there will be a tutorial or an assignment that will develop your Project which you will post on Medium. Who knows what Medium is?
Every week there will be a tutorial or an assignment that will develop your Project which you will post on Medium. We’ll get started on the first week’s assignment and you’ll continue it at the end of class.
The course project asks students to use at least 2 NYC datasets to generate a visual argument about change in the city. Projects will be individual, however students are encouraged to share their data sets and methods with a pair coding partner.
Super open on what people want to do for midterm and final review. critics?
Who has computers? groups
Google Street View is an amazing archive of the city but has yet to be easily sortable. If we want to see all locations that are marked as historic in New York City, we would need to look up each location from a database of addresses copy the address into Google Maps, drop the pegman into each location, screenshot each street scene, and then repeat the steps for each location before being able to compare them all.
Artists like Josh Begley have found smarter ways to sample Google Street View. He uses Google’s API and custom scripts to automate the downloading of street view from various locations. In “Officer Involved”, he uses databases of police brutality (collected by non-governmental and news organizations) to sample Street View scenes at the location of each incident, thus immersing us in “the environment of someone’s last moment”.
Where is data stored?-----Flat files, Databases and websites, APIs - whats an API? Google Maps (church, CVS, bridge, bar, etc) ------> google sheets manually scraping
Each dataset has the same summary statistics (mean, standard deviation, correlation),...
and the datasets are clearly different, and visually distinct). Anscombe’s Quartet is the classic example showing how visualization can trump statistics alone.
In a paper by Benoit Mandelbrot on the coastline of Britain it was shown that it is inherently nonsensical to discuss certain spatial concepts(such as the length of the perimeter of the coastline) despite that there me an inherent presumption that discussing the length of a coastline seems valid. Lengths in ecology depend directly on the scale at which they are measured and experienced. So while surveyors commonly measure the length of a river, this length only has meaning in the context of the relevance of the measuring technique to the question under study.
He depicted this idea behind fractal geometry, that certain forms and branching patterns could be seen at multiple scales
binary is the way computers store data at their lowest level, as electric charge.
We don’t use ones and zeroes. When working with binary data, we often use hexadecimal instead.
But given the proper context, this hexadecimal string actually represents color (you’ve probably used these numbers in photoshop)
What you may not know is that internally, most data are held as long, one-dimensional sequences of values, either binary (as hexadecimal) or text (as characters).
In computers, encoding is the process of putting a sequence of characters (letters, numbers, punctuation, and certain symbols) into a specialized format for efficient transmission or storage.
Decoding is the opposite process -- the conversion of an encoded format back into the original sequence of characters.
Now that we know a bit about what data are and how they’re stored… lets get into formatting data
We’re going to use location data to get streetview images from Google’s API (their open data)
We want to clean our data to turn our addresses into lat, and longitude
When we’re talking about our data, there are a couple terms to know...
When we’re talking about our data, there are a couple terms to know...
...
...

Week 1 - Data Mining the City

Recommended

Recommended

More Related Content

Similar to Week 1 - Data Mining the City

Similar to Week 1 - Data Mining the City (20)

Recently uploaded

Recently uploaded (20)

Week 1 - Data Mining the City

Editor's Notes