SlideShare a Scribd company logo
Social Science-Conscious Analysis
Case Study: The Cost of Public School
Riley H
Why The Cost of Public School?
New York City has some
of the best and worst
schools in all of the state
as well as the country.
Sometimes these are
right next to each other.
A Closer Look at Adjacent Schools
P.S. 11 in Midtown
West performed worse
than 60% of all schools
in New York State.
P.S. 59 in Midtown
East, a 10 minute walk
away, is the 19th best
elementary school in
the state.
The Problem
In a perfect world, how would you answer
your question?
For us, the perfect solution involved selling identical
houses right across a school zone from each other.
We’d then measure the price
difference. It was important to make
sure that other factors of a
neighborhood that drive price are as
stable as possible between the two,
allowing us to collect only the price
difference associated with the school.
With unlimited data, how would you
demonstrate your hypothesis was true?
Identifying an exact method to nail down the problem
we want to solve is sometimes the hardest step.
Start by detailing your “ideal experiment”; what you
would do with all the data you could ever want.
From there, you can break it down into pieces that are
possible.
What can you actually acquire?
High quality data and computational time are in
extremely short supply with few exceptions!
Cut down your question based on what data you can
acquire, but make sure you remain true to the core
social issue!
For The Cost of Public School Project
We focused on the following:
● What data do we need on housing?
● What types of housing can we acquire, and how will
the data we can't get affect the impact of the
experiment?
● What factors other than housing could affect the cost
of housing, and how can we grab accurate data for
them and quantify them?
The Data
Community Data Sites
Community sites are great if they’re available. They
can be a godsend for projects like these if the
community in question has been diligent in upgrading
their processes.
Unfortunately, most cities are still using handwritten
forms for a lot of their workings, leaving details
scanned into the system in the dreaded pdf format
with barely readable font. In other words, useless.
NOOOO! NOT HANDWRITTEN PDFS!
OUR PRECIOUS
DATA…
LOST TO THE ETHER!
D:
Caveats of Third Party Sites
● May not be free and clear to use, even just for
research purposes. Make sure you check the
terms!
● Limits on how much data you can get in a period of
time.
● May require a sign up and approval process before
allowing API usage.
● API may be slow.
● Pulling data in general moves slowly.
Fixing the Data:
Sometimes Your Research Needs Researching
Preliminary data exploration is important to make sure
what you have makes sense.
But what does “sense” refer to?
In some cases, it will be obvious, but not in all of
them. Cross-referencing what you have with other
sources of information may save you trouble later!
Well, the data looks okay...
Cursory summaries of the data (means, medians,
quartiles and ranges) may not show anything
particularly strange...even when it is there.
Check for duplicate data lines and wrong information
that is obscured to the point of looking realistic!
These are common side-effects of using an API from
a third party site, and won’t be so easy to find!
Feature or Flat Wrong?
After coming up with odd results in our regression
models, we looked back to the data and found many
listings with very small square footage listed. Some
were clearly wrong, like listings with 10 square feet.
Others were dubious, especially for tiny NYC living.
Where should we have drawn the line? You may find
yourself making this sort of judgement, and that’s
where your community research comes in handy!
Reasonable results don’t always
mean you have good data.
The Model
Yay! It’s a Clean Dataset!
After a lot of hard work, we finally have what we need to
proceed, a beautiful, clean data set.
At this point, you probably notice that your clean data is
substantially smaller than what you originally had, maybe
too small to enact your original experiment idea.
You can try to find more data, or use a model!
Modeling For a New Purpose
Our model was used to help us create data that we were
missing for the purpose of actually completing the
experiment, rather than have the predictions we acquired
used directly.
With our secondary experiment in mind, we constructed a
set of “fake” housing data to give us price averages in
areas of New York City that our third party site did not
care about.
Problems with this Approach
The Actual Model
Ours was a linear regression model including the
following features. Make sure that the type of model
and the features involved work for your project.
The Analysis
Variety Helps Catch Errors
Analysis can be one of the most intense parts of a social
science project. It's more than just getting averages and
crunching numbers; not only do you have to know what
the numbers mean, but what they are defining
SOCIALLY.
This is where a diverse team comes in handy! Personal
experience may be an indication of where to go next and
what you've missed.
Don’t Forget the People Aspect
We specifically brought in people who know a lot about
certain areas of NYC, former realtors who are now
researchers, and people who own property in the areas
we were examining closely.
We also used our own experiences as residents of the
city to guide our choices.
We found that our numbers were in fact reflecting
lived experiences.
Don’t forget the community
you want to serve.
They should be driving your
research direction.
Look For the Reasons Why
If it turns out that your research doesn’t reflect lived
experience, examine why!
It could mean a drastic error in either your question, its
framing, the data set, or your analysis of the results!
Use the community to your advantage rather than work
against them.
Back to Square One
Thank You
To my team at Microsoft, Glenda Ascencio, Anastassiya
Neznanova, and Thomas Patino, and our leads, Jake
Hofman, Amit Sharma, and Jenn Wortman Vaughn.
To Microsoft's Data Science Summer School, headed by
Jennifer Chayes at Microsoft Research.
And to everyone who encouraged me to give a data
science talk!
More about Myself
I am a student at CUNY Queens College graduating in
May with a BS in Computer Science and BA in
Mathematics.
If you have questions, comments, or want to recruit,
please contact me!
techiecheckie@gmail.com
https://github.com/techiecheckie
https://www.linkedin.com/in/techiecheckie
Bibliography
1. NYCOpenData, nycopendata.socrata.com
2. GreatSchools, greatschools.org
3. StreetEasy, streeteasy.com
4. NYC GeoClient API,
developer.cityofnewyork.us/api/geoclient-api
5. Microsoft Data Science Summer School,
ds3.research.microsoft.com

More Related Content

Similar to Social Science-Conscious Analysis Case Study: The Cost of Public School

Book Summary : Everybody Lies
Book Summary : Everybody LiesBook Summary : Everybody Lies
Book Summary : Everybody Lies
Rahul Rishi
 
Building data science teams
Building data science teamsBuilding data science teams
Building data science teams
Gülşah Gürük, MSc, PMP®
 
Service Design Days 2016 - Keynote Kike Alonso
Service Design Days 2016 - Keynote Kike AlonsoService Design Days 2016 - Keynote Kike Alonso
Service Design Days 2016 - Keynote Kike Alonso
SERVICE DESIGN DAYS
 
Jordan Christensen at Analytics That Excite
Jordan Christensen at Analytics That ExciteJordan Christensen at Analytics That Excite
Jordan Christensen at Analytics That Excite
InfoTrust LLC
 
Close - A clever way to make new friends
Close - A clever way to make new friendsClose - A clever way to make new friends
Close - A clever way to make new friends
Rafael Felipe Nascimento de Aguiar
 
Ds article ppt
Ds article pptDs article ppt
Ds article ppt
TanayKarnik1
 
Slides from Growthcon 2014 Lean Analytics masterclass
Slides from Growthcon 2014 Lean Analytics masterclassSlides from Growthcon 2014 Lean Analytics masterclass
Slides from Growthcon 2014 Lean Analytics masterclass
Lean Analytics
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
Tuan Yang
 
An Introduction to Data Visualization
An Introduction to Data VisualizationAn Introduction to Data Visualization
An Introduction to Data Visualization
Nupur Samaddar
 
Big data and the need for better consumer insights
Big data and the need for better consumer insightsBig data and the need for better consumer insights
Big data and the need for better consumer insights
Jack Morton Worldwide
 
Sites as intellegent centers
Sites as intellegent centersSites as intellegent centers
Sites as intellegent centers
Ryan Connolly
 
Five Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our UsersFive Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our Users
Sajid Reshamwala
 
The Right Research Method For Any Problem (And Budget)
The Right Research Method For Any Problem (And Budget)The Right Research Method For Any Problem (And Budget)
The Right Research Method For Any Problem (And Budget)
Leah Buley
 
Information Architecture 101
Information Architecture 101Information Architecture 101
Information Architecture 101
Christina Wodtke
 
Analytics, Search, Social Media, and Optimization: Why Has Marketing Gotten S...
Analytics, Search, Social Media, and Optimization: Why Has Marketing Gotten S...Analytics, Search, Social Media, and Optimization: Why Has Marketing Gotten S...
Analytics, Search, Social Media, and Optimization: Why Has Marketing Gotten S...
Kate O'Neill
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
Booz Allen Hamilton
 
These are just examples from the previous class not for this week .docx
These are just examples from the previous class not for this week .docxThese are just examples from the previous class not for this week .docx
These are just examples from the previous class not for this week .docx
christalgrieg
 
Matchbox presentation
Matchbox presentation Matchbox presentation
Matchbox presentation
Point_conference
 
CommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_UnbrandedCommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_UnbrandedJim Parnitzke
 

Similar to Social Science-Conscious Analysis Case Study: The Cost of Public School (20)

Book Summary : Everybody Lies
Book Summary : Everybody LiesBook Summary : Everybody Lies
Book Summary : Everybody Lies
 
Building data science teams
Building data science teamsBuilding data science teams
Building data science teams
 
Service Design Days 2016 - Keynote Kike Alonso
Service Design Days 2016 - Keynote Kike AlonsoService Design Days 2016 - Keynote Kike Alonso
Service Design Days 2016 - Keynote Kike Alonso
 
Jordan Christensen at Analytics That Excite
Jordan Christensen at Analytics That ExciteJordan Christensen at Analytics That Excite
Jordan Christensen at Analytics That Excite
 
Close - A clever way to make new friends
Close - A clever way to make new friendsClose - A clever way to make new friends
Close - A clever way to make new friends
 
Ds article ppt
Ds article pptDs article ppt
Ds article ppt
 
Slides from Growthcon 2014 Lean Analytics masterclass
Slides from Growthcon 2014 Lean Analytics masterclassSlides from Growthcon 2014 Lean Analytics masterclass
Slides from Growthcon 2014 Lean Analytics masterclass
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
 
An Introduction to Data Visualization
An Introduction to Data VisualizationAn Introduction to Data Visualization
An Introduction to Data Visualization
 
Big data and the need for better consumer insights
Big data and the need for better consumer insightsBig data and the need for better consumer insights
Big data and the need for better consumer insights
 
Sites as intellegent centers
Sites as intellegent centersSites as intellegent centers
Sites as intellegent centers
 
Five Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our UsersFive Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our Users
 
The field-guide-to-data-science
The field-guide-to-data-scienceThe field-guide-to-data-science
The field-guide-to-data-science
 
The Right Research Method For Any Problem (And Budget)
The Right Research Method For Any Problem (And Budget)The Right Research Method For Any Problem (And Budget)
The Right Research Method For Any Problem (And Budget)
 
Information Architecture 101
Information Architecture 101Information Architecture 101
Information Architecture 101
 
Analytics, Search, Social Media, and Optimization: Why Has Marketing Gotten S...
Analytics, Search, Social Media, and Optimization: Why Has Marketing Gotten S...Analytics, Search, Social Media, and Optimization: Why Has Marketing Gotten S...
Analytics, Search, Social Media, and Optimization: Why Has Marketing Gotten S...
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
 
These are just examples from the previous class not for this week .docx
These are just examples from the previous class not for this week .docxThese are just examples from the previous class not for this week .docx
These are just examples from the previous class not for this week .docx
 
Matchbox presentation
Matchbox presentation Matchbox presentation
Matchbox presentation
 
CommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_UnbrandedCommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_Unbranded
 

Recently uploaded

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 

Recently uploaded (20)

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 

Social Science-Conscious Analysis Case Study: The Cost of Public School

  • 1. Social Science-Conscious Analysis Case Study: The Cost of Public School Riley H
  • 2. Why The Cost of Public School? New York City has some of the best and worst schools in all of the state as well as the country. Sometimes these are right next to each other.
  • 3. A Closer Look at Adjacent Schools P.S. 11 in Midtown West performed worse than 60% of all schools in New York State. P.S. 59 in Midtown East, a 10 minute walk away, is the 19th best elementary school in the state.
  • 5. In a perfect world, how would you answer your question? For us, the perfect solution involved selling identical houses right across a school zone from each other. We’d then measure the price difference. It was important to make sure that other factors of a neighborhood that drive price are as stable as possible between the two, allowing us to collect only the price difference associated with the school.
  • 6. With unlimited data, how would you demonstrate your hypothesis was true? Identifying an exact method to nail down the problem we want to solve is sometimes the hardest step. Start by detailing your “ideal experiment”; what you would do with all the data you could ever want. From there, you can break it down into pieces that are possible.
  • 7. What can you actually acquire? High quality data and computational time are in extremely short supply with few exceptions! Cut down your question based on what data you can acquire, but make sure you remain true to the core social issue!
  • 8. For The Cost of Public School Project We focused on the following: ● What data do we need on housing? ● What types of housing can we acquire, and how will the data we can't get affect the impact of the experiment? ● What factors other than housing could affect the cost of housing, and how can we grab accurate data for them and quantify them?
  • 10. Community Data Sites Community sites are great if they’re available. They can be a godsend for projects like these if the community in question has been diligent in upgrading their processes. Unfortunately, most cities are still using handwritten forms for a lot of their workings, leaving details scanned into the system in the dreaded pdf format with barely readable font. In other words, useless.
  • 11. NOOOO! NOT HANDWRITTEN PDFS! OUR PRECIOUS DATA… LOST TO THE ETHER! D:
  • 12. Caveats of Third Party Sites ● May not be free and clear to use, even just for research purposes. Make sure you check the terms! ● Limits on how much data you can get in a period of time. ● May require a sign up and approval process before allowing API usage. ● API may be slow. ● Pulling data in general moves slowly.
  • 13. Fixing the Data: Sometimes Your Research Needs Researching Preliminary data exploration is important to make sure what you have makes sense. But what does “sense” refer to? In some cases, it will be obvious, but not in all of them. Cross-referencing what you have with other sources of information may save you trouble later!
  • 14. Well, the data looks okay... Cursory summaries of the data (means, medians, quartiles and ranges) may not show anything particularly strange...even when it is there. Check for duplicate data lines and wrong information that is obscured to the point of looking realistic! These are common side-effects of using an API from a third party site, and won’t be so easy to find!
  • 15. Feature or Flat Wrong? After coming up with odd results in our regression models, we looked back to the data and found many listings with very small square footage listed. Some were clearly wrong, like listings with 10 square feet. Others were dubious, especially for tiny NYC living. Where should we have drawn the line? You may find yourself making this sort of judgement, and that’s where your community research comes in handy!
  • 16. Reasonable results don’t always mean you have good data.
  • 18. Yay! It’s a Clean Dataset! After a lot of hard work, we finally have what we need to proceed, a beautiful, clean data set. At this point, you probably notice that your clean data is substantially smaller than what you originally had, maybe too small to enact your original experiment idea. You can try to find more data, or use a model!
  • 19. Modeling For a New Purpose Our model was used to help us create data that we were missing for the purpose of actually completing the experiment, rather than have the predictions we acquired used directly. With our secondary experiment in mind, we constructed a set of “fake” housing data to give us price averages in areas of New York City that our third party site did not care about.
  • 20. Problems with this Approach
  • 21. The Actual Model Ours was a linear regression model including the following features. Make sure that the type of model and the features involved work for your project.
  • 23. Variety Helps Catch Errors Analysis can be one of the most intense parts of a social science project. It's more than just getting averages and crunching numbers; not only do you have to know what the numbers mean, but what they are defining SOCIALLY. This is where a diverse team comes in handy! Personal experience may be an indication of where to go next and what you've missed.
  • 24. Don’t Forget the People Aspect We specifically brought in people who know a lot about certain areas of NYC, former realtors who are now researchers, and people who own property in the areas we were examining closely. We also used our own experiences as residents of the city to guide our choices. We found that our numbers were in fact reflecting lived experiences.
  • 25. Don’t forget the community you want to serve. They should be driving your research direction.
  • 26. Look For the Reasons Why If it turns out that your research doesn’t reflect lived experience, examine why! It could mean a drastic error in either your question, its framing, the data set, or your analysis of the results! Use the community to your advantage rather than work against them.
  • 28. Thank You To my team at Microsoft, Glenda Ascencio, Anastassiya Neznanova, and Thomas Patino, and our leads, Jake Hofman, Amit Sharma, and Jenn Wortman Vaughn. To Microsoft's Data Science Summer School, headed by Jennifer Chayes at Microsoft Research. And to everyone who encouraged me to give a data science talk!
  • 29. More about Myself I am a student at CUNY Queens College graduating in May with a BS in Computer Science and BA in Mathematics. If you have questions, comments, or want to recruit, please contact me! techiecheckie@gmail.com https://github.com/techiecheckie https://www.linkedin.com/in/techiecheckie
  • 30. Bibliography 1. NYCOpenData, nycopendata.socrata.com 2. GreatSchools, greatschools.org 3. StreetEasy, streeteasy.com 4. NYC GeoClient API, developer.cityofnewyork.us/api/geoclient-api 5. Microsoft Data Science Summer School, ds3.research.microsoft.com