Your SlideShare is downloading. ×
0
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Lasi datawrangling
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Lasi datawrangling

3,440

Published on

Published in: Technology, Education
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,440
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
43
Comments
0
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • I am not a journalist, but it seems to me that a large part of your work, and indeed a large part of the work of a scientist or an analyst, is in asking the right questions of a source, and knowing how to frame those questions.The data journalist knows how to ask questions of data.
  • Also – high incidence of crime around police stations (no location, so police station used as default location); Russell Square as a murder hotspot.
  • Another nice example of this, and one used by many advocates of data visualisation, is the famous example of Anscombe’s quartet, for sets of two dimensional data with some interesting properties.
  • For example, many of the “classic” summary statistics for the corresponding columns in these data sets are to all intents and purposes the same.
  • But when we look at the datasets as a set of scatterplots, we see how the data tells very different stories.
  • People learn the skills they need, as they need them.
  • Transcript

    1. Data wrangling with open source tools Tony Hirst Dept of Communication & Systems The Open University, UK
    2. Premises
    3. “I take data from wherever I can get it” 1
    4. “Appropriate everything” 2
    5. Conversations with data 3
    6. Visual Conversations with data3
    7. (Accession Plot) @mediaczar
    8. If a picture’s worth a thousand words, maybe it should take as long to read?
    9. Most learning analytics won’t be performed by learning analytics researchers
    10. How can we help people fashion their own tools to support data conversations?
    11. Recipes
    12. site:open.ac.uk
    13. Have a conversation with the data…
    14. Ask the right questions…
    15. xkcd.com/1138
    16. Sometimes a question makes most sense in the context of questions previously asked and answers previously received
    17. DATA USERS Educators Learners Planners Marketers Policymakers Researchers Press NGOs “ D E V E L O P E R S ”
    18. Have dashboard, so what?
    19. A tools and issues based view
    20. DATA TOOLS USERS PROBLEMS
    21. Example – Google Fusion Tables Fusion Table https://www.google.com/fusiontables/DataSource?docid=1VKG7iCbFlsEYJzTuQppf4xoIqq1ABxWTdW6O_7o#rows:id=1 http://is.gd/qhuaoA Walkthrough http://blog.ouseful.info/2012/11/16/a-quick-look-at-gcsealevel-certificate-awards-market-share-by-examination-board/ http://is.gd/f9YAbG
    22. DATA TOOLS USERS PROBLEMS
    23. Access/obtain data Make sense of data Ask specific questions of data Communicate in a data-centric way
    24. Load data Clean data Merge/enrich data
    25. DATA Issues TOOLS
    26. DATA Other TOOLS Issues TOOLS
    27. “Tool based programming”
    28. A barrier to access (for the tool user) is data format
    29. JSON XMLCSVXLS TSV .db HTML PDF DOCTXT
    30. GLUE LOGIC(Glue code)
    31. =importHTML(URL, “table”, N) HTML QUERYABLE DATA
    32. Try it… Example Page http://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_the_United_States_by_endowment http://is.gd/7Vbg6n
    33. Google Spreadsheets as a database Explorer https://views.scraperwiki.com/run/google_spreadsheet_query/ http://is.gd/jiMJoh Walkthrough http://schoolofdata.org/2013/05/24/asking-questions-of-data-garment-factories-data-expedition/ http://is.gd/qJHihu
    34. =importCSV(URL, N) HTML INTERACTIVE DASHBOARD Google Charts
    35. Google Chart Visualization API https://code.google.com/apis/ajax/playground/ http://is.gd/TTHIUh
    36. Google Visualisation API
    37. googleVis (R)
    38. https://developers.facebook.com/ docs/reference/api/examples/ http://is.gd/7cRnvS
    39. A barrier to access (for the tool user) is data shape
    40. A barrier to access (for the tool user) is data cleanliness
    41. Questions of identity
    42. The Open University Open University OU Open Uni Open University, UK NORMALISATION/RECONCILIATION
    43. Reconciliation to a canonical name and/orto a unique identifier
    44. A stumbling block (for the data user) is data enrichment
    45. A stumbling block (for the data user) is joining datasets
    46. A stumbling block (for the data user) is joining partially matched data
    47. Rolling your own interactive data exploration tools
    48. R Shiny Apps
    49. ui.R server.R
    50. RCharts
    51. Many chart tools do the work for you if the data is in the right shape
    52. DATA TOOLS USERS PROBLEMS
    53. Justask… ask.SchoolOfData.org
    54. blog.ouseful.info @psychemedia

    ×