Data Works MD February 2023 - https://www.meetup.com/dataworks/events/290813196/
Video -
-------------------------------------------------
Data Journalism at The Baltimore Banner
In this presentation, data journalism Nick Thieme will be presenting on what data journalism looks like at The Baltimore Banner. Nick will be discussing how data journalism dovetails with local news and will highlight several of the projects they have been working on. Several of The Baltimore Banner's recent articles featuring data journalism can be found here.
-------------------------------------------------
Nick Thieme creates rigorous data journalism with the goal of exposing and undoing systemic inequities by using the tools of statistics to discover reliable information about Baltimore. He grew up in the D.C. area, moving to Baltimore in the 2010s. After a time creating data journalism for Atlanta at the Atlanta Journal-Constitution, he's excited to return home to use his work to make the city a more equitable place. Nick can be reached on Twitter.
3. Let’s dig in
• Choosing a story
• Questions
• Learning
• Contextualizing
2/23/2023
3
4. Choosing this story
• “I think... if it is true that there are
as many minds as there are heads,
then there are as many kinds of
love as there are hearts.”
• In 2017, my house was almost sold
in tax sale
• Baltimore City’s property tax nearly
double second highest (~2.2 v
~1.4)
• Work on this done before:
– Abell Foundation
– MVLS
– Baltimore Brew
– No comprehensive data look
2/23/2023
4
5. What question do we want to answer? What data
needed?
• How much money do investors make off the property tax system?
• Who is affected?
• What is the geography of property tax?
• Baltimore City property tax sale data
– Record request in: 6/21
– Responsive documents: 8/29
– MPIA violation, one of many
• State land records
– Records request in: 8/8
– Responsive documents: 9/1
• Census data
6. Learning / data work
• SDAT records:
– 24 counties in Maryland, one file from each.
– 277 columns
• Everything you could want about a property
• Location
• Value
• Owner transfers
• City of Baltimore Tax sale data
– 21 columns, most either uninformative or
incomplete
– Importantly, property address, bidder information,
lien information
2/23/2023
6
7. Learning / data work (2)
• Need to match the SDAT data with the
property lien data
– Property addresses are maintained
differently in different places
– Block/lot/ward doesn’t work because of
multi-unit buildings
– One unit may be liened but another isn’t
• Use postmastr in R to standardize addresses
between SDAT and CoB
• Link shapefile information between joined
SDAT/CoB data and shapefiles with
block/lot/ward since footprint is the same for
all units in a building
• A lot of back and forth making sure the match
was complete
2/23/2023
7
8. Learning / data work
(3)
• Investors make money in 3 ways
– Flipping homes
– Interest payments
– Attorney’s fees
• 2/3 tractable
– Court records not granular/dependable enough
for attorney’s fees
• Flipping homes
– Find next sale after tax sale, tax difference as
profit
– Most homes sold soon after tax sale
– Assessment values rarely change after tax sale
• Interest
– Interest accrues immediately after tax sale and
until redemption
– Different rates for homeowners and non-
homeowners
– Errors in CoB data
2/23/2023
8
9. What did we learn?
• Enormous racial disparities in Baltimore tax sale
– 46% of buildings in SW Baltimore and 42% of Sandtown-
Winchester liened
– The more white residents in a tract, the less likely homes
are to be liened
– Logistic mixed-effects GAM supports this
• $37m total in income off tax sale in 6 years
• $27m in flips
• $10m in interest
– $8m in non-owner-occupied
– $2m in owner-occupied
• Misclassification a huge issue
– Of 10,000 homes where owner lives at liened house, 6.6k
listed as non-owner-occupied
– Has implications for tax rate, forclosure time, protections
2/23/2023
9
10. Contextualizing
• Arnita Owens-Phillips almost lost her house through tax sale
– Lien purchased by Stonefield Investments
– Tangled titled / “heirs property”
– Liens ballooned through interest and tax sale process
– Helped by legal aid fund
• Edmondson Community Center sold through tax sale
– $5,000 on $2,500 of liens
– Investor flipped center for $140,000
• Legal changes
– HPP
– Judicial in rem
– Changing misclassification
11. Choosing this story
• Reporter Jessica Calefati reached out
8/30/22 about Johns Hopkins creating a
new police force
• Pre-reporting already finished
– Hopkins paused the plans in 2020 for two
years
– Originally implemented because of a ”crime
wave” on Hopkins campus
– Shapefiles of proposed jurisdictions
2/23/2023
11
12. What questions do we want to answer? What
data is needed?
• What do crime trends in the proposed jurisdictions
look ?
• Do they depend on the campus? Year? Crime type?
• How do those trends fit with Hopkins’ stated rationale
for creating a police force?
• Crime data from Baltimore Police Department / Open
Baltimore
• Shapefiles of proposed jurisdiction
– Non-existent
– Need to be created from reports
• https://publicsafety.jhu.edu/assets/uploads/sites/9/2022/08
/8_JHPD_PoliceDept_Maps-Homewood-8.4.22.pdf
• https://publicsafety.jhu.edu/assets/uploads/sites/9/2022/08
/9_JHPD_PoliceDept_Maps-East-Baltimore-8.4.22.pdf
• https://publicsafety.jhu.edu/assets/uploads/sites/9/2022/08
/7_JHPD_PoliceDept_Maps-Peabody-8.4.22.pdf
• QGIS by hand
13. Learning/data work
• BPD crime data
– Has geolocation information for
crimes
– Is victim-level, want incident level!
• Shapefiles in arbitrary coordinate
system
– Need to convert to WGS-84
• Group BPD data by crime type,
location, time, inside-outside,… to
reduce victim-level to incident level
• Use Sf to convert shapefiles to right
CRS, and join with BPD data
2/23/2023
Sample Footer Text 13
14. Learning/data work (2)
• Care more about crime rates than raw crime numbers
• Need to combine with ACS data about populations
• Linear interpolation
– Not perfect, but can check whether resulting population counts in campus make sense
• A lot of checking
– Do the yearly victim numbers in jurisdictions agree with the BPD data? Sample and check
by hand
15. What did we learn?
2/23/2023
15
• Property crimes down across all
three campuses
• Violent crime steady or down
• No “crime wave” since plan stopped
16. Contextualizing
• “We are still besieged by violence, and it’s unacceptable” – Branville Bard Jr.,
Hopkins’ Vice President for Public Safety
• No other private universities in Maryland have private police forces
• Student and faculty opposition
• JHU police can only police strictly within the bounds of the jurisdiction
17. Defending your
work
• Essential part of data journalism is
defending your own work
• We create facts in a way journalists
typically do not
• Our work is only as good as the
science behind it
• When the science is sound but
challenged, should respond
substantively to critiques
2/23/2023
Sample Footer Text 17
18. Choosing this story
• Experimenting with pure data stories
• NYT advanced visual journalism,
teaching readers a new visual
language
• COVID taught readers about models
and exponential growth
• Next decade of journalism will teach
readers a new statistical language
2/23/2023
Sample Footer Text 18
19. What questions do we want to answer? / What
data is needed?
• How did voting trends in Maryland and in Baltimore City / County change between 2018
and 2022?
• Did Black and white voters differ in their preference for Wes Moore?
• Did Black and white voters differ in their preference for cannabis legalization?
• Interesting answers to these questions require more granular data than county-level
results
– Need precinct-level voting results from MD State Board of Elections from 2022
– Precinct-level results from 2018 from Metric Geometry and Gerrymandering Group at MIT
– Precinct shapefiles from Maryland Department of Planning
• Census data
20. Learning / data work
• Precinct-level results come in XML form, requires parsing
with xml2 in R
• Precinct names in voting data and shapefile data often
differ, need to match
– No obvious way to do this, need to examine by hand
(needed to remove two 0’s in the middle of the names
in some cases, just one in others. Automatable, but
needs to be discovered manually)
– Much, much worse in Georgia
• Missing data from Montgomery County and Kent County
• This is the fun but dangerous part: data analysis
2/23/2023
Sample Footer Text 20
21. What did we learn?
• Moore outperformed Jealous / Cox performed
worse than Hogan almost everywhere
• Precinct-level data lets us see:
– the incredible expansion of blue territory in the
west part of the County and City from 2018 to
2022
– The expansion of blue between D.C. and
Baltimore and the lightening of red support all
around the state
2/23/2023
21
22. What did we learn (2)
• Moore preference related to race
– <20% Black census tracts had fully mixed
opinions on Moore
– >80% Black nearly all for Moore
– Log relationship
• Strong majority of voters wanted legal weed
– More support in Black community
– Driven by:
• long left tail in white
• Less variation in Black
– Violin plot!
2/23/2023
22