Presentation covers our capstone project analysing growth and decline in Washington DC neighborhoods. Inspired by NextCity Non Profit in Philadelphia, our team implemented their same methods for statistical analysis and took it one step further in DC by including monthly data and more features. Our results concluded that a machine learning implementation was more informative and accurate than the purely statistical model.
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
GeorgeTown CCPE Data Science Capstone Final Presentation, Cohort 10, The Data Extractors - Progress In Washington D.C.
1. GeorgetownCCPE Data Science Capstone Project Dec 16th, 2017
Mapping Progress in Washington D.C.
Presented by:Team Data Extractors
Tony Sanchez, Jay Huang, Ken Shuart, Jason Coffey
15. • Amount of variance
retained from the original
data set
• Chose 2 components with
75% explained variance
16. • Coefficients of features inside components
• First component: HS max % and poverty
• Second component: population and total crime
17. • Silhouette score =
cohesion and separability
of clusters
• Balance between high
silhouette score and
neighborhood movement
between clusters
• 30 clusters chosen
30
18. • Clusters are greater
than mean silhouette
score
• Acceptable cohesion
and separability
19. • Clustering did not converge into
an optimal solution
• Iterated 1000 times
20. Data Ingestion Data Munging and
Wrangling
Computation and
Analyses
Modeling and
Application
Reporting and
Visualization
http://arcg.is/2C4zGNp
21.
22. Lessons Learned
• Education
• Unemployment
• New
Construction
• Foreclosures
• Parks and Infrastructure
• Jobs added/lost
• Transportation
• Homeless Shelters
Hypothesis only, not challenges. This should say overview not hypothesis.
Hypothesis only, not challenges. This should say overview not hypothesis.
race age gender education native born
violent crimes naturalized no citizen. List our data sources vs the philly sources. talk to ACS and everything inlcuded.
Drop Box was worm store write once read many
Hypothesis only, not challenges. This should say overview not hypothesis.
Hypothesis only, not challenges. This should say overview not hypothesis.
Move score information to computation and analysis slides.
Hypothesis only, not challenges. This should say overview not hypothesis.
Philly model:
Since the primary factor of each neighborhood score was based on DC Mean Growth over the 5 year period, one might conclude that the results were pretty accurate. Off the bat it makes sense that the south east concentrated city development would be reasonably hotter than the upper west side of the district. Hot areas, Kalorama, Adams Morgan, Foggy Bottom, Anacostia
TDE:
However, looking at our model, it’s more equally distributed. Since we factored ever single month into our analysis and added variables such as: Rent, Violent Crime, Demographics, there’s more detail in the outcomes. Hot areas are Navy Yard, makes since thinking about the growth around the ballpark, Union Station had higher scores. Foggy Bottom and Columbia Heights Scored Similarly. Crime was unusually high in CH and Realestate/Income was very high in Foggy Bottom.