2. CAN YOU PREDICT THE
OUTCOME OF LEGISLATIVE BILLS?
WHERE DOES THE MONEY COME FROM FOR THE BILL?
3. DATA PUBLICLY AVAILABLE (~10 GB)
1) Bill data comes from govtrack.us
• JSON files, needed some adjustments.
2) Contribution data comes from senate.gov
• XML files, very messy.
4. THE PREDICTIVE MODEL
• Random Forest
• Reduced >2500 features to ~1300 features
• Bill Topics
• Sponsors
• Committees
• >30,000 training samples
• ~30:1 Fail:Pass class imbalance.
• Optimized against F1-Score.
Statistic Govtrack.us Legislatr
Precision 54% 51%
Recall 22% 51%
Reality
Pass Fail
Prediction Pass 110 105
Fail 106 5707
5.
6. KEY TAKEAWAYS
1)You have the best prediction.
2)You know what interests resulted in the creation of
the bill.
7. ABOUT ME
• Ph.D. in Atmospheric and
Oceanic Science
• Space weather effects on climate
• Meteorology
• Hobbies:
• Karate, Hiking
9. WHY A RANDOM FOREST
• First try was on a logistic regression model.
• Accuracy was 78%, compared to an “all Fail” guess of 96.5%.
• Also tried SVM (lincear SVC), but accuracy was still too low.
• Govtrack.us uses a logistic regression, but has very complex features.
• Features are things like, a certain senator wrote the bill and is also the chair of the
committee the bill is sent to.
• Random forest can capture these complex interactions without having super complex
features.
• My model is able to provide the importance of different features.
• Provides insights that a NN or SVM would not be able to.
10. REDUCING THE FEATURE SPACE
• Used Python networkx package with Girvan-Newman algorithm.
• First removed all edges with weak connections.
• Next used Girvan-Newman algorithm. Remove edges with highest
betweeness centrality to break apart hairball.
• Essentially just showed that everything was just a hairball with no
separate clusters to be made.
11. THE MONEY TRAIL
1)Link legislators to lobbyists. (Pre-processing and
Levenshtein distance)
2)Identify influential lobbyists for the bill.
3)Present breakdown of funding.
13. WHY NOT PREDICT USING MONEY?
• Money is one step removed from the bill.
• So you could predict using money, but that information is already
included by using the sponsors of the bill.
• Predicting with money would be an advantage if you had a new
legislator and thus no other way to make a prediction.