FundAClassroom.today
Inspire your supporters on DonorsChoose.org
Bobby Kim, Fellow at Insight Data Science
What is the problem?
• In 2013-2014, US teachers on average $513 out-of-pocket
for their classrooms.1
• DonorsChoose.org, an online crowdfunded charity is
helping reduce the burden.
• Can we predict whether a project will be funded
or not based simply on a teacher’s essay?
1. http://www.forbes.com/sites/nicoleleinbachreyhle/2014/08/19/teachers-spend-own-money-school-supplies/
FundAClassroom.Today
Supervised learning – binary classification
• Data: CSV dataset from DonorsChoose.org
• ~200,000 essays going back to 2012 – 75/25 Funded/Not Funded
• Vectorize DonorsChoose essays using tf-idf, build vocabulary of
4000 words
• Model: L2 Logistic Regression
• Validation:
• 5-fold cross validation for model tuning using training set (90%)
• ROC AUC using test set (10%)
About me
• PhD in Computational Biophysics, Rice University
• Built models for protein folding simulations using experimental
protein structures and sequence data
Future Directions
• Feature Engineering using NLP
• TextBlob – sentiment analysis, lemmatization, parts of
speech tagging, misspelling
• TextSTAT – reading level, subjectivity
• Supervised learning
• SVMs
Essay Format
• Paragraph 1 – Open with the challenge facing your
students.
• Paragraph 2 – Tell us more about your students.
• Paragraph 3 – Inspire your potential donors with an
overview of the resources you’re requesting
• Paragraph 4 – Close by sharing why your project is so
important
Data Story

FundAClassroomToday Demo

  • 1.
    FundAClassroom.today Inspire your supporterson DonorsChoose.org Bobby Kim, Fellow at Insight Data Science
  • 2.
    What is theproblem? • In 2013-2014, US teachers on average $513 out-of-pocket for their classrooms.1 • DonorsChoose.org, an online crowdfunded charity is helping reduce the burden. • Can we predict whether a project will be funded or not based simply on a teacher’s essay? 1. http://www.forbes.com/sites/nicoleleinbachreyhle/2014/08/19/teachers-spend-own-money-school-supplies/
  • 3.
  • 4.
    Supervised learning –binary classification • Data: CSV dataset from DonorsChoose.org • ~200,000 essays going back to 2012 – 75/25 Funded/Not Funded • Vectorize DonorsChoose essays using tf-idf, build vocabulary of 4000 words • Model: L2 Logistic Regression • Validation: • 5-fold cross validation for model tuning using training set (90%) • ROC AUC using test set (10%)
  • 9.
    About me • PhDin Computational Biophysics, Rice University • Built models for protein folding simulations using experimental protein structures and sequence data
  • 10.
    Future Directions • FeatureEngineering using NLP • TextBlob – sentiment analysis, lemmatization, parts of speech tagging, misspelling • TextSTAT – reading level, subjectivity • Supervised learning • SVMs
  • 11.
    Essay Format • Paragraph1 – Open with the challenge facing your students. • Paragraph 2 – Tell us more about your students. • Paragraph 3 – Inspire your potential donors with an overview of the resources you’re requesting • Paragraph 4 – Close by sharing why your project is so important
  • 14.