Easybib Open Analytics NYC


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Easybib Open Analytics NYC

  1. 1. Using data to improve student research
  2. 2. EasyBib is an automatic bibliography composer. Students use it to cite sources for their research.
  3. 3. We teach information literacy. 18% of all student papers include plagiarism1 Source: (1) TurnItIn; (2) Both Sides Now: Librarians Looking at Information Literacy from High School and College. 50% likelihood of using a credible vs. non- credible source1 4% increase in the use of paper mills and cheating sites1 ~16% of students are adequately prepared for college.2
  4. 4. That’s how we felt too..
  5. 5. The problem is becoming bigger.
  6. 6. Unprepared students make for unprepared adults. It’s not just students who plagiarize: •Pal Schmitt, former president of Hungary •German education minister •Jayson Blair (former New York Times writer) •Jonah Lehrer, journalist and author •Fareed Zakaria (reporter, author, host)
  7. 7. We are in the right place to figure it out. Over half of all students in the US (40M) Over half a billion citations
  8. 8. We asked ourselves the following questions: •What are students using in their research? •How good are their sources? •How can we help them?
  9. 9. We started with the basics._gaq.push([ 'citations._trackEvent', citationTitle, citationPublisher, citationId ]);
  10. 10. Here’s what we found. Top sources 2010 •Wikipedia •Google 1.The New York Times 2.CIA World Factbook 3.Oracle Thinkquest 4.Buzzle 5.US BLS 6.Dictionary.com 7.CDC 8.PBS 9.eHow Source: EasyBib Google Analytics Oct 2010-Nov 2010 data.
  11. 11. What could we do? •Warn them when their source’s credibility is in question •Analyze the quality of their full bibliography •Make it easier to not plagiarize •Suggest better sources
  12. 12. Define credibility.
  13. 13. Improve citation quality
  14. 14. Gave students access to their own analytics
  15. 15. To combat plagiarism, we built an audit trail for notes
  16. 16. So after all this... Does it blend (tm) ? 1. Wikipedia 2. Bio.com 3. History.com 4. PBS 5. Mayo Clinic 6. CDC 7. The New York Times 8. BBC 9. CNN 10.WebMD 11.US BLS • Wikipedia still on top, but ... • No content farms, no Google.. • WebMD is questionable, but its credibility can be argued for. Source: Apr-May 2013 Google Analytics data
  17. 17. We have to admit, it’s getting better... We have to admit, it’s getting better...
  18. 18. Help students find better sources
  19. 19. How does the Research engine currently work? Cloudant (CouchDB) MySQL Lucene/Solr Slow, asynchronous, lots of moving parts.
  20. 20. Starting to do a bit more StatsD::increment($metrics); $response = $rediska->publish( array('realtime'), $citation );
  21. 21. There’s a lot more we can do, and data will help us.
  22. 22. Cloudant Search •Full-text search integrated into Cloudant •Lucene syntax •Indexing is easy function(doc){ index("title", doc.title, {"store": "yes"}); } •Grouping of sources via chained map-reduce map: function(doc){ if (doc.title){ emit({"title": doc.title}, 1); } } reduce: _sum dbcopy: citationGroup ------ map: function(doc){ if (doc.title && doc.key.title){ emit(doc.value, doc.key.title); } }
  23. 23. Live data analysis. Crowdsourcing. •Use Cloudant Search to power feedback on sources (# of times cited in real time, quality of bibliographies derived from) •Allow users to submit their own credibility evaluations and aggregate results
  24. 24. SourceRank! Credibility weighting + crowdsourcing Synchronous & realtime via Cloudant Search Value nodes based on nearest neighbors And other things...
  25. 25. Driving growth We have the largest UGC citation set. Making this searchable creates a “moat.” The more people that use EasyBib, the better the tool becomes.
  26. 26. What about other data analytics tools? Too stretched to learn more complex tools (looking for easy answers) Costs (GA is free!) EMR, Hadoop, Redshift, Cloudant Search: This is what’s next.
  27. 27. Questions?
  28. 28. Darshan Somashekar @darshan darshan@imagineeasy.com