• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,264
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
41
Comments
0
Likes
9

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • I am not a journalist, but it seems to me that a large part of your work, and indeed a large part of the work of a scientist or an analyst, is in asking the right questions of a source, and knowing how to frame those questions.The data journalist knows how to ask questions of data.
  • Also – high incidence of crime around police stations (no location, so police station used as default location); Russell Square as a murder hotspot.
  • Another nice example of this, and one used by many advocates of data visualisation, is the famous example of Anscombe’s quartet, for sets of two dimensional data with some interesting properties.
  • For example, many of the “classic” summary statistics for the corresponding columns in these data sets are to all intents and purposes the same.
  • But when we look at the datasets as a set of scatterplots, we see how the data tells very different stories.
  • People learn the skills they need, as they need them.

Transcript

  • 1. Data wrangling with open source tools Tony Hirst Dept of Communication & Systems The Open University, UK
  • 2. Premises
  • 3. “I take data from wherever I can get it” 1
  • 4. “Appropriate everything” 2
  • 5. Conversations with data 3
  • 6. Visual Conversations with data3
  • 7. (Accession Plot) @mediaczar
  • 8. If a picture’s worth a thousand words, maybe it should take as long to read?
  • 9. Most learning analytics won’t be performed by learning analytics researchers
  • 10. How can we help people fashion their own tools to support data conversations?
  • 11. Recipes
  • 12. site:open.ac.uk
  • 13. Have a conversation with the data…
  • 14. Ask the right questions…
  • 15. xkcd.com/1138
  • 16. Sometimes a question makes most sense in the context of questions previously asked and answers previously received
  • 17. DATA USERS Educators Learners Planners Marketers Policymakers Researchers Press NGOs “ D E V E L O P E R S ”
  • 18. Have dashboard, so what?
  • 19. A tools and issues based view
  • 20. DATA TOOLS USERS PROBLEMS
  • 21. Example – Google Fusion Tables Fusion Table https://www.google.com/fusiontables/DataSource?docid=1VKG7iCbFlsEYJzTuQppf4xoIqq1ABxWTdW6O_7o#rows:id=1 http://is.gd/qhuaoA Walkthrough http://blog.ouseful.info/2012/11/16/a-quick-look-at-gcsealevel-certificate-awards-market-share-by-examination-board/ http://is.gd/f9YAbG
  • 22. DATA TOOLS USERS PROBLEMS
  • 23. Access/obtain data Make sense of data Ask specific questions of data Communicate in a data-centric way
  • 24. Load data Clean data Merge/enrich data
  • 25. DATA Issues TOOLS
  • 26. DATA Other TOOLS Issues TOOLS
  • 27. “Tool based programming”
  • 28. A barrier to access (for the tool user) is data format
  • 29. JSON XMLCSVXLS TSV .db HTML PDF DOCTXT
  • 30. GLUE LOGIC(Glue code)
  • 31. =importHTML(URL, “table”, N) HTML QUERYABLE DATA
  • 32. Try it… Example Page http://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_the_United_States_by_endowment http://is.gd/7Vbg6n
  • 33. Google Spreadsheets as a database Explorer https://views.scraperwiki.com/run/google_spreadsheet_query/ http://is.gd/jiMJoh Walkthrough http://schoolofdata.org/2013/05/24/asking-questions-of-data-garment-factories-data-expedition/ http://is.gd/qJHihu
  • 34. =importCSV(URL, N) HTML INTERACTIVE DASHBOARD Google Charts
  • 35. Google Chart Visualization API https://code.google.com/apis/ajax/playground/ http://is.gd/TTHIUh
  • 36. Google Visualisation API
  • 37. googleVis (R)
  • 38. https://developers.facebook.com/ docs/reference/api/examples/ http://is.gd/7cRnvS
  • 39. A barrier to access (for the tool user) is data shape
  • 40. A barrier to access (for the tool user) is data cleanliness
  • 41. Questions of identity
  • 42. The Open University Open University OU Open Uni Open University, UK NORMALISATION/RECONCILIATION
  • 43. Reconciliation to a canonical name and/orto a unique identifier
  • 44. A stumbling block (for the data user) is data enrichment
  • 45. A stumbling block (for the data user) is joining datasets
  • 46. A stumbling block (for the data user) is joining partially matched data
  • 47. Rolling your own interactive data exploration tools
  • 48. R Shiny Apps
  • 49. ui.R server.R
  • 50. RCharts
  • 51. Many chart tools do the work for you if the data is in the right shape
  • 52. DATA TOOLS USERS PROBLEMS
  • 53. Justask… ask.SchoolOfData.org
  • 54. blog.ouseful.info @psychemedia