Your SlideShare is downloading. ×
Lasi datawrangling
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Lasi datawrangling

3,389

Published on

Published in: Technology, Education
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,389
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
42
Comments
0
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • I am not a journalist, but it seems to me that a large part of your work, and indeed a large part of the work of a scientist or an analyst, is in asking the right questions of a source, and knowing how to frame those questions.The data journalist knows how to ask questions of data.
  • Also – high incidence of crime around police stations (no location, so police station used as default location); Russell Square as a murder hotspot.
  • Another nice example of this, and one used by many advocates of data visualisation, is the famous example of Anscombe’s quartet, for sets of two dimensional data with some interesting properties.
  • For example, many of the “classic” summary statistics for the corresponding columns in these data sets are to all intents and purposes the same.
  • But when we look at the datasets as a set of scatterplots, we see how the data tells very different stories.
  • People learn the skills they need, as they need them.
  • Transcript

    • 1. Data wrangling with open source tools Tony Hirst Dept of Communication & Systems The Open University, UK
    • 2. Premises
    • 3. “I take data from wherever I can get it” 1
    • 4. “Appropriate everything” 2
    • 5. Conversations with data 3
    • 6. Visual Conversations with data3
    • 7. (Accession Plot) @mediaczar
    • 8. If a picture’s worth a thousand words, maybe it should take as long to read?
    • 9. Most learning analytics won’t be performed by learning analytics researchers
    • 10. How can we help people fashion their own tools to support data conversations?
    • 11. Recipes
    • 12. site:open.ac.uk
    • 13. Have a conversation with the data…
    • 14. Ask the right questions…
    • 15. xkcd.com/1138
    • 16. Sometimes a question makes most sense in the context of questions previously asked and answers previously received
    • 17. DATA USERS Educators Learners Planners Marketers Policymakers Researchers Press NGOs “ D E V E L O P E R S ”
    • 18. Have dashboard, so what?
    • 19. A tools and issues based view
    • 20. DATA TOOLS USERS PROBLEMS
    • 21. Example – Google Fusion Tables Fusion Table https://www.google.com/fusiontables/DataSource?docid=1VKG7iCbFlsEYJzTuQppf4xoIqq1ABxWTdW6O_7o#rows:id=1 http://is.gd/qhuaoA Walkthrough http://blog.ouseful.info/2012/11/16/a-quick-look-at-gcsealevel-certificate-awards-market-share-by-examination-board/ http://is.gd/f9YAbG
    • 22. DATA TOOLS USERS PROBLEMS
    • 23. Access/obtain data Make sense of data Ask specific questions of data Communicate in a data-centric way
    • 24. Load data Clean data Merge/enrich data
    • 25. DATA Issues TOOLS
    • 26. DATA Other TOOLS Issues TOOLS
    • 27. “Tool based programming”
    • 28. A barrier to access (for the tool user) is data format
    • 29. JSON XMLCSVXLS TSV .db HTML PDF DOCTXT
    • 30. GLUE LOGIC(Glue code)
    • 31. =importHTML(URL, “table”, N) HTML QUERYABLE DATA
    • 32. Try it… Example Page http://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_the_United_States_by_endowment http://is.gd/7Vbg6n
    • 33. Google Spreadsheets as a database Explorer https://views.scraperwiki.com/run/google_spreadsheet_query/ http://is.gd/jiMJoh Walkthrough http://schoolofdata.org/2013/05/24/asking-questions-of-data-garment-factories-data-expedition/ http://is.gd/qJHihu
    • 34. =importCSV(URL, N) HTML INTERACTIVE DASHBOARD Google Charts
    • 35. Google Chart Visualization API https://code.google.com/apis/ajax/playground/ http://is.gd/TTHIUh
    • 36. Google Visualisation API
    • 37. googleVis (R)
    • 38. https://developers.facebook.com/ docs/reference/api/examples/ http://is.gd/7cRnvS
    • 39. A barrier to access (for the tool user) is data shape
    • 40. A barrier to access (for the tool user) is data cleanliness
    • 41. Questions of identity
    • 42. The Open University Open University OU Open Uni Open University, UK NORMALISATION/RECONCILIATION
    • 43. Reconciliation to a canonical name and/orto a unique identifier
    • 44. A stumbling block (for the data user) is data enrichment
    • 45. A stumbling block (for the data user) is joining datasets
    • 46. A stumbling block (for the data user) is joining partially matched data
    • 47. Rolling your own interactive data exploration tools
    • 48. R Shiny Apps
    • 49. ui.R server.R
    • 50. RCharts
    • 51. Many chart tools do the work for you if the data is in the right shape
    • 52. DATA TOOLS USERS PROBLEMS
    • 53. Justask… ask.SchoolOfData.org
    • 54. blog.ouseful.info @psychemedia

    ×