Soton2013 opendata


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • A great example of timely data is data relating to roadworks. This data is often released in an impenetrable form, screeds of text detailing roadnames nobody uses and identifying in arcane language where roadworks are to take place, and what diversions have been put in place. Why is it so hard to just publish the data as KML that can be rendered trivially in an online map?!
  • Another example that demonstrates how CSV can be used to help data flow is demonstrated by Google Spreadsheets. The =importData formula allows a user to specify a source data URL, and pull the CSV data found at that location in to the spreadsheet. Unlike Many Eyes Wikified, if the source data at the URL is updated, the updated will (eventually) be pulled into the spreadsheet automatically.
  • One of the really good reasons for getting data into a data processing environment such as a spreadsheet is that you can start to work it. In the case of Google Spreadsheets, the spreadsheet environment can also be used as a database environment. That is, we can treat one or more data containing sheets in a spreadsheet as a database, and generate new views over the data, as well as running queries over that data.
  • Another way of using a Google Spreadsheet as a database is via the Google Spreadsheets API. The GoogleVisualisation API (?) provides a way of passing queries written using the Google ???viz query language from an arbitrary web page or web application, and receiving the resulting data in a standard JSON based format, which also happens to play nicely with the Google Visualisation API???The Guardian Datastore explorer is a crude demonstration for 2009(??) demonstrating how data from the Guardian datastore, data that is stored across a range of Google spreadsheets, can be explored , queried and visualised via these APIs. Users can select a dataset from a drop down menu, fed from a delicious account to which various datastore spreadsheets have been bookmarked using a particular set of tags, or by pasting in the URL of an arbitrary (public) Google spreadsheet. The first row/headings of the data can then be previewed (a simple spreadsheet is assumed, in which column headings appear In the first row of the spreadsheet).
  • A series of list boxes are then populated with the column labels and there names, and provide a certain amount of help for the creation of a query over the spreadsheet data. A range of output formats can also be selected, from simple HTML data tables, to a range of charts. URLs are also generated for HTML and CSV representations of the data returned from the query.
  • One of the nice things about the data table widget (a standard GoogleVisualisation API component in this case, though similar examples exist for YUI, the Yahoo User Interface Libraries, or frameworks such as JQuery), is that is supports things like row sorting by column, (for free – no programming required!), allowing even further manipulation of the data, albeit at a simplistic level.(It’s probably worth pointing out here that it may be worth providing a preview of the column headings and first few rows (or a sample of random rows) of data when datasets are published, just so that users can see what sort of data is on offer without having to download the whole data set?)
  • If you’re in the business of selling information as data, you are under threat where that information is published in an openly licensed way.
  • Do we have a hashtag for the workshop?
  • Soton2013 opendata

    1. 1. Get Started With Open Data Tony HirstDept of Communication and Systems, The Open University
    2. 2. So what do we mean by“OPEN DATA”
    3. 3. You are free to:- copy, publish, distribute and transmitthe Information;- adapt the Information;- exploit the Information commerciallyfor example, by combining it with otherInformation, or by including it in yourown product or application
    4. 4. You must:- acknowledge the source of the Information by includingany attribution statement specified by the InformationProvider(s) and, where possible, provide a link to thislicence;- ensure that you do not use the Information in a way thatsuggests any official status;- ensure that you do not mislead others or misrepresentthe Information or its source;- ensure that your use of the Information does not breachthe Data Protection Act 1998 or the Privacy andElectronic Communications (EC Directive) Regs 2003.
    5. 5. Exemptions:- personal data;- Information that has neither been publishednor disclosed under information accesslegislation (FOI) by or with the consent of theInformation Provider;- departmental or public sector organisationlogos, crests etc;- third party rights the Information Provider isnot authorised to license;- Information subject to other IPR
    6. 6. Availability and AccessReuse and RedistributionUniversal Participation The Open Knowledge Foundation
    7. 7. Availability and Access: the data mustbe available as a whole and at no morethan a reasonable reproductioncost, preferably by downloading overthe internet. The data must also beavailable in a convenient andmodifiable form. The Open Knowledge Foundation
    8. 8. Reuse and Redistribution: the datamust be provided under terms thatpermit reuse and redistributionincluding the intermixing with otherdatasets. The Open Knowledge Foundation
    9. 9. Universal Participation: everyone must be ableto use, reuse and redistribute – there shouldbe no discrimination against fields ofendeavour or against persons or groups. Forexample, ‘non-commercial’ restrictions thatwould prevent ‘commercial’ use, or restrictionsof use for certain purposes (e.g. only ineducation), are not allowed. The Open Knowledge Foundation
    10. 10. /via
    11. 11. FOI Licensing exemptions Data Paywalls protection ActAuthentication DATA “Privacy” Crappy Closed spreadsheets standards PDFs Messy Data
    12. 12. Right to access data
    13. 13. So where’s the data?
    14. 14. “First” generation: data catalogues
    15. 15. Breathing life into data…
    16. 16. =importData(“CSV_URL”)
    17. 17. the spreadsheet becomesA DATABASE
    18. 18. “Second” generation: data management systems
    19. 19. Digging for data…
    20. 20. There’s lots moredata that’s lockedup in web pages…
    21. 21. Scraping…
    22. 22. “grabbing web contentin a machine readable format and then processing it for your own purposes”
    23. 23. Original Extract AccessibleHTML web Information web page page -> data
    24. 24. Recreating thedatabase that was used to populate a (templated) page
    25. 25. “Creating” Data
    26. 26. [DisruptiveInnovation?]
    27. 27. Company Director Director Director Director Company Company Company Company
    28. 28. Barriers to Use
    29. 29. - Character string dates - Erratic whitespace - Arbitrary separators - Excel DatesAlso:- month overflows at week end- year overflows
    30. 30. Openis as open does… DATA
    31. 31.