Open data is enabling journalists, watchdog groups and investors to gain more insight than ever before into the finances of state and local governments. Unfortunately, much of it remains trapped inside bulky PDFs, lagging annual reports and other data-rich documents that are challenging to collect and analyze. At this session, ProPublica finance reporter Cezary Podkul will walk through an example of the obstacles he faced analyzing bond documents for a series of stories he did on municipal borrowing. The second half of the session will feature Sunlight Foundation OpenGov grant recipient Marc Joffe, who will discuss opportunities for liberating municipal finance data for journalists, watchdogs and investors alike.
3. Quick Word About ProPublica
• We are a non-profit
investigative news-
room focused on
accountability
journalism
• We publish stories,
develop news apps,
tools and open source
a lot of our code at:
github.com/propublica
5/31/2015 ODSC 2015 | Boston 3
4. Accountability Journalism
• There is a growing need for it general, in
public finance in particular:
5/31/2015 ODSC 2015 | Boston 4
- Detroit Free Press, April 5, 1993
- Chicago Tribune, Nov. 1, 2013
- The Bond Buyer, Feb. 12, 2014
- ProPublica, Aug. 7, 2014
- Boston Herald, June 10, 2012
- BenefitsPro Feb. 12, 2015 - USA Today, Dec. 3, 2013
- Wall Street Journal, Jan. 26, 2010
- Voice of San Diego, Aug. 6, 2012
5. The Good News
5/31/2015 ODSC 2015 | Boston 5
• A lot of data already exists on the
finances of state and local
governments:
– Governments that borrow money from
investors provide bond offering documents
and other disclosures on EMMA
– They must also produce annual filings
called “Comprehensive Annual Financial
Reports” which detail all of their financials
6. The Good News: EMMA
• What is EMMA?
– Electronic Municipal Market Access
• Since 2009, the official repository
for muni bond offering documents
and continuing disclosures
• Run by the Municipal Securities
Rulemaking Board (MSRB)
3/7/2015 NICAR 2015 | Atlanta 6
7. • What’s in EMMA?
– Data on more than 1.2 million muni bonds:
• Official statements; ongoing financial
disclosures; advance refunding documents;
event notices, voluntary disclosures, and more
– Real-time trade data for nearly every
municipal bond bought and sold
– Political contribution disclosures (here)
– Documents, documents, more documents
3/7/2015 NICAR 2015 | Atlanta 7
The Good News: EMMA
8. The Bad News
5/31/2015 ODSC 2015 | Boston 8
• EMMA is great repository of info,
but little of it is easily accessible:
– PDFs, PDFs and more PDFs
• Sell a bond? Submit a PDF
• Material event happened? Tell us via PDF
• File financials? File a PDF
– No standardized reporting templates
• Important info scattered in different places
– No machine-readable bulk download
• XBRL? You wish
9. Things Could Be Better
5/31/2015 ODSC 2015 | Boston 9
• The SEC’s EDGAR database makes a wealth of
info available about corporations:
– Bulk download of filings available via FTP:
• http://datahub.io/dataset/edgar
• ftp://ftp.sec.gov/
– The agency is also moving away from text-based
submissions to XBRL filings:
• http://www.sec.gov/info/edgar/edgartaxonomies.shtml
– No PDFs … seriously:
• “Only documents submitted to the EDGAR system in
either plain text or HTML are official filings. PDF
documents are unofficial copies of filings. Filers may not
use the unofficial PDF copies instead of plain text or
HTML documents to meet filing requirements.”
10. The Result
• When IBM files its annual form 10-K, you get this:
– XBRL:
• http://www.sec.gov/Archives/edgar/data/51143/000104746915001106/i
bm-20141231_pre.xml
– Text:
• http://www.sec.gov/Archives/edgar/data/51143/0001047469-15-
001106.txt
– Even an interactive data explorer, with Excel download:
5/31/2015 ODSC 2015 | Boston 10
11. The Result
• When Detroit files its Comprehensive Annual
Financial Report with EMMA, you get this:
– http://emma.msrb.org/ER789294-ER614016-ER1015978.pdf
5/31/2015 ODSC 2015 | Boston 11
12. Happy Hunting
5/31/2015 ODSC 2015| Boston 12
• So how do you spot anomalies like these and
write about them in a systematic way?
$0
$500,000,000
$1,000,000,000
$1,500,000,000
$2,000,000,000
$2,500,000,000
$3,000,000,000
$3,500,000,000
10/2007
10/2008
10/2009
10/2010
10/2011
10/2012
10/2013
10/2014
10/2015
10/2016
10/2017
10/2018
10/2019
10/2020
10/2021
10/2022
10/2023
10/2024
10/2025
10/2026
10/2027
10/2028
10/2029
10/2030
10/2031
10/2032
10/2033
10/2034
10/2035
10/2036
10/2037
10/2038
10/2039
10/2040
10/2041
10/2042
10/2043
10/2044
10/2045
10/2046
Amountowedovertime
Ohio Series 2007B Tobacco Settlement Bonds
Principal Accreted Interest
$191.3m borrowed,
with $3.2bn due at
maturity in 2047.
Interest accrues at
7.25% interest rate,
compounded.
No option to redeem
until 2017
13. Example: Tobacco Bonds
5/31/2015 ODSC 2015| Boston 13
• That’s what I wanted to do for my series on tobacco
bonds – state and local debts backed by payments
from the 1998 legal settlement with Big Tobacco
14. Example: Tobacco Bonds
5/31/2015 ODSC 2015| Boston 14
• Problem: How do you define the sample universe?
– How many bonds are there, which ones are the anomalies?
– Searching on EMMA wasn’t much help; just links to PDFs
• Solution: Asked a data vendor, Thomson
Reuters SDC, for their list:
Source: Thomson Reuters SDC
15. Example: Tobacco Bonds
5/31/2015 ODSC 2015| Boston 15
• Problem: How do you vet the data?
– Need to ensure completeness and accuracy
• Solution: Lots, and lots of reading
– Re-created Thomson
Reutersdatabase from
paperfilings,zeroing-in
on38deals thatincluded
theanomalousbonds
– Logged alltheterms and
conditionswe needed to
calculate theamounts
owedonthedebt
16. Example: Tobacco Bonds
• Why not do it programmatically?
Wish we could have, but:
– Data often buried in
scanned PDFs like this ->
– Even if you OCR, data do
not appear in same place
across documents
– Different labels, different
conventions for reporting
– Sometimes, repayment
amounts not reported at all
5/31/2015 ODSC 2015| Boston 16
18. Next Steps
• The Financial Transparency Act of 2015
has some helpful provisions in it:
• But for now it’s up to us to liberate the data
5/31/2015 ODSC 2015 | Boston 18
Source: Data Transparency Coalition
19. Example: Treasury.io
• API for daily spending, revenue and
debt operations data for U.S. Treasury
5/31/2015 ODSC 2015 | Boston 19
Developed by
csv soundsystem
with grant from
Knight-Mozilla
Open News Code
Sprint Grant
20. Example: Treasury.io
• Turns text:
5/31/2015 ODSC 2015 | Boston 20
• Into structured csv:
• Parser code available at:
https://github.com/csvsoundsystem/federal-treasury-api
21. Next Challenge
5/31/2015 ODSC 2015 | Boston 21
• The U.S. Treasury publishes even more
useful data in its monthly statement:
– http://www.fiscal.treasury.gov/fsreports/rpt/mth
TreasStmt/backissues.htm
• I am looking for developers interested
in helping liberate the data
– Is that you? Code repo available here:
https://github.com/csvsoundsystem/monthly-
treasury-statements