Hack Kid Con - Learn to be a Data Scientist for $1

2,804 views

Published on

Attempt to inspire some kids to pay attention in Math and Science classes so they can get a good job and help fill the skills gap in the years to come.

Published in: Technology, Education

Hack Kid Con - Learn to be a Data Scientist for $1

  1. 1. LEARNTO BE A DATA SCIENTIST FOR $1 Hack Kid Conference - April 2014 by Adrian Cockcroft BatteryVentures
  2. 2. A BIG new problem for a new generation
  3. 3. Now A BIG new problem for a new generation
  4. 4. Now A BIG new problem for a new generation Your future job as a Data Scientist
  5. 5. WHAT DOES A DATA SCIENTIST DO?
  6. 6. The hive mind map shows popular twitter hashtags for the last 7 days and how they are connected http://hivemindmap.com/?#
  7. 7. HIVE MIND MAP A mind-map of what’s happening onTwitter Thanks to Mark Harwood for these slides and the Hive Mind Map http://www.infoq.com/presentations/elasticsearch-revealing-uncommonly-common
  8. 8. Connections The thickness of a line between hashtags is based on the strength of connection Tip:! Strength of connection is the number of tweets with both tags vs the number with only one - see “Jaccard similarity coefficient”
  9. 9. Top tweets The most popular tweets for a tag are sorted based on the number of “retweets”
  10. 10. When? The rise and fall of each hashtag’s popularity can be shown over time
  11. 11. Calendar summary Tags that “peak” together are grouped into events on a calendar Tip:! Peaks are detected using standard deviations. Only tags with a single peak are chosen as events Tip:! Tags that rise and fall in popularity at the same time are detected using Pearson’s Correlation
  12. 12. What makes this possible? • Free software (Lucene, Java, Eclipse, Gephi, Tomcat, d3, Google analytics…) • Free data (millions of users’ tweets from Twitter’s 1% sample feed) • “Cloud” computing (rented server) • Smarter web browsers (visualizations using HTML5’s SVG/Canvas) • All the friendly folks on the internet (e.g. http://stackoverflow.com/ questions/14799842) • Some imagination…
  13. 13. Opportunities in Data Science • We are all generating volumes of data never seen before • You can recycle the behaviors of billions of people into more intelligent systems • customer purchases can be used for product recommendations • user searches can be used for spelling corrections, • Reader clicks can influence the trending news • Spotify activity is used to make music recommendations) • The tools have never been cheaper • It has never been easier to find help in developing systems
  14. 14. …one more thing.. I’m writing these slides for you while on my annual snowboarding trip to Canada. Data science pays well ;-) Wish you were here…
  15. 15. HOW CAN A KID LEARN BIG DATA FOR $1?
  16. 16. BIG DATA INTHE CLOUD WITH AMAZON EMR https://www.youtube.com/watch?v=S6Ja55n-o0M
  17. 17. LESSTHAN $1 After running two of the EMR examples, creating 6 computers in the cloud to do the analysis for up to an hour each
  18. 18. GOOGLE BIGQUERY https://demobigquery.appspot.com/
  19. 19. BAY AREA WEATHER https://demobigquery.appspot.com/
  20. 20. WHYTHE FLINTSTONES? https://demobigquery.appspot.com/
  21. 21. MEASURING KIDS How good are you at Math and Science, is it getting better or worse?
  22. 22. SCHOOL DATA https://www.data.gov/ http://eddataexpress.ed.gov/state-report.cfm/state/CA/
  23. 23. ACHIEVEMENT SCORES Download results into Excel to analyze and draw graphs
  24. 24. DOWNLOADED DATA Needed some clean-up. Made sure grade was consistent (4, 8, HS) for all results, and created a short Subject column
  25. 25. SCORES 2004-2012 Elementary - 4th Grade, Middle School - 8th Grade, High School
  26. 26. SCORES 2004-2012 Elementary - 4th Grade, Middle School - 8th Grade, High School About half of high school students in California are proficient at Math and Science
  27. 27. CALIFORNIA SCHOOLS Science and Math Scores at Elementary, Middle and High School Level
  28. 28. CALIFORNIA SCHOOLS Science and Math Scores at Elementary, Middle and High School Level Scores have been getting better. Good!
  29. 29. CALIFORNIA SCHOOLS Science and Math Scores at Elementary, Middle and High School Level Scores have been getting better. Good! Maybe the Math tests were harder for everyone that year?
  30. 30. CALIFORNIA SCHOOLS Science and Math Scores at Elementary, Middle and High School Level Scores have been getting better. Good!4th Grade “cohort” in 2004 was 8th Grade in 2008 Maybe the Math tests were harder for everyone that year?
  31. 31. DATA SCIENCE WITH EXCEL Pivot tables let you rearrange data and trend lines measure the slope
  32. 32. LEARNTO BE A DATA SCIENTIST FOR $1 • Everything is being measured • The latest data science tools are available to anyone for pennies • There is lots of freely available data • Pay attention in math and science class, play around with EMR and Bigquery and get an interesting and well paid job as a data scientist!

×