Data: A Cautionary Tale by Daniel Katz

366 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
366
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data: A Cautionary Tale by Daniel Katz

  1. 1. A Cautionary Tale
  2. 5. <ul><li>The Big Picture </li></ul><ul><li>Collect </li></ul><ul><li>Clean </li></ul><ul><li>Model </li></ul><ul><li>Store </li></ul><ul><li>Present </li></ul>
  3. 6. { &quot;classes&quot;: [ { &quot;name&quot;: &quot;Fundamental Process of Design&quot;, &quot;professor&quot;: &quot;Joo Youn Paek&quot; , &quot;year&quot; : &quot; 2010 &quot;, &quot;semester&quot; : &quot;fall&quot;, &quot;students&quot;: [ { &quot;student&quot; : { &quot;name&quot;: “Joe Student&quot;, “ email&quot;: “it4life@gmail.com&quot;, &quot;twitter_name&quot;: “@itp4life&quot; , “ blog_url&quot;: “http://itp4life.blogspot.co&quot; , } } ] } ] }
  4. 7. <classes> <class> <name>Fundamental Process of Design</name> <professor>Joo Youn Paek</professor> <year>2010</year> <semester>Fall</semester> <students> <student> <name>Joe Student</name> <email>itp4life@gmail.com</email> <twitter_name>@itp4life</twitter_name> <blog_url>http://itp4life.blogspot.com</blog_url> </student> </students> </class> </classes>
  5. 10. <ul><li>The Open Data Movement is in Full Swing </li></ul><ul><ul><li>Governments </li></ul></ul><ul><ul><li>Institutions </li></ul></ul><ul><ul><li>Scientists </li></ul></ul><ul><ul><li>Enthusiasts </li></ul></ul><ul><ul><li>http:// vimeo.com/2598878 </li></ul></ul>
  6. 11. <ul><li>Commercial tools and open source are starting to converge </li></ul>
  7. 12. <ul><li>There will always be assumptions </li></ul>
  8. 13. <ul><li>Bring it down </li></ul>
  9. 14. <ul><li>FreeBase – Entity Graph </li></ul><ul><li>Info Chimp </li></ul><ul><li>Twitter </li></ul><ul><li>Facebook </li></ul>
  10. 15. <ul><li>Data.gov </li></ul><ul><li>MTA </li></ul>
  11. 16. <ul><li>Arduino </li></ul><ul><li>Smart Phone </li></ul><ul><li>Other sensors </li></ul>
  12. 18. <ul><li>Don’t be intimidated by data from disparate sources </li></ul>
  13. 21. <ul><li>Clean up messy data </li></ul><ul><li>Inconsistent data points </li></ul><ul><li>Identify patterns </li></ul><ul><li>Combine data from disparate sources </li></ul>
  14. 22. Collection of Twitter Responses from API Value.parseJson().user.screen_name
  15. 24. <ul><li>Depending on the type of data you are collecting, there are appropriate places to store it </li></ul>
  16. 25. <ul><li>Non-programmers </li></ul><ul><ul><li>Google Fusion Tables </li></ul></ul><ul><li>For programmers </li></ul><ul><ul><li>Geo Database and programming tools </li></ul></ul><ul><ul><ul><li>PostGIS (Postgresql) </li></ul></ul></ul><ul><ul><ul><li>GeoTools (Java) </li></ul></ul></ul>
  17. 26. <ul><li>Non-programmers </li></ul><ul><ul><li>Google Docs (Read into processing) </li></ul></ul><ul><ul><li>Microsoft Excel (internal charting tool) </li></ul></ul><ul><ul><li>Text based formatting (visualize with Google Chart API) </li></ul></ul><ul><li>For programmers </li></ul><ul><ul><li>Any relational database </li></ul></ul><ul><ul><ul><li>MySql </li></ul></ul></ul><ul><ul><ul><li>PostgresSql </li></ul></ul></ul>
  18. 27. <ul><li>Graph Database </li></ul>
  19. 28. <ul><li>http:// blog.blprnt.com/blog/blprnt/your-random-numbers-getting-started-with-processing-and-data-visualization </li></ul><ul><li>http://code.google.com/p/gdocjdbc/ </li></ul>
  20. 29. <ul><li>http:// www.infochimps.com/datasets/tweets-during-state-of-the-union-address </li></ul><ul><li>http://code.google.com/p/google-refine / </li></ul><ul><li>http:// dev.twitter.com/doc/get/geo/search </li></ul><ul><li>http://flowingdata.com/2009/07/14/how-does-the-average-consumer-spend-his-money / </li></ul><ul><li>http://www.bls.gov/cex / </li></ul><ul><li>http://www.google.com/fusiontables/Home </li></ul>

×