Your SlideShare is downloading. ×
Soton2013 opendata
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Soton2013 opendata


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • A great example of timely data is data relating to roadworks. This data is often released in an impenetrable form, screeds of text detailing roadnames nobody uses and identifying in arcane language where roadworks are to take place, and what diversions have been put in place. Why is it so hard to just publish the data as KML that can be rendered trivially in an online map?!
  • Another example that demonstrates how CSV can be used to help data flow is demonstrated by Google Spreadsheets. The =importData formula allows a user to specify a source data URL, and pull the CSV data found at that location in to the spreadsheet. Unlike Many Eyes Wikified, if the source data at the URL is updated, the updated will (eventually) be pulled into the spreadsheet automatically.
  • One of the really good reasons for getting data into a data processing environment such as a spreadsheet is that you can start to work it. In the case of Google Spreadsheets, the spreadsheet environment can also be used as a database environment. That is, we can treat one or more data containing sheets in a spreadsheet as a database, and generate new views over the data, as well as running queries over that data.
  • Another way of using a Google Spreadsheet as a database is via the Google Spreadsheets API. The GoogleVisualisation API (?) provides a way of passing queries written using the Google ???viz query language from an arbitrary web page or web application, and receiving the resulting data in a standard JSON based format, which also happens to play nicely with the Google Visualisation API???The Guardian Datastore explorer is a crude demonstration for 2009(??) demonstrating how data from the Guardian datastore, data that is stored across a range of Google spreadsheets, can be explored , queried and visualised via these APIs. Users can select a dataset from a drop down menu, fed from a delicious account to which various datastore spreadsheets have been bookmarked using a particular set of tags, or by pasting in the URL of an arbitrary (public) Google spreadsheet. The first row/headings of the data can then be previewed (a simple spreadsheet is assumed, in which column headings appear In the first row of the spreadsheet).
  • A series of list boxes are then populated with the column labels and there names, and provide a certain amount of help for the creation of a query over the spreadsheet data. A range of output formats can also be selected, from simple HTML data tables, to a range of charts. URLs are also generated for HTML and CSV representations of the data returned from the query.
  • One of the nice things about the data table widget (a standard GoogleVisualisation API component in this case, though similar examples exist for YUI, the Yahoo User Interface Libraries, or frameworks such as JQuery), is that is supports things like row sorting by column, (for free – no programming required!), allowing even further manipulation of the data, albeit at a simplistic level.(It’s probably worth pointing out here that it may be worth providing a preview of the column headings and first few rows (or a sample of random rows) of data when datasets are published, just so that users can see what sort of data is on offer without having to download the whole data set?)
  • If you’re in the business of selling information as data, you are under threat where that information is published in an openly licensed way.
  • Do we have a hashtag for the workshop?
  • Transcript

    • 1. Get Started With Open Data Tony HirstDept of Communication and Systems, The Open University
    • 2. So what do we mean by“OPEN DATA”
    • 3. You are free to:- copy, publish, distribute and transmitthe Information;- adapt the Information;- exploit the Information commerciallyfor example, by combining it with otherInformation, or by including it in yourown product or application
    • 4. You must:- acknowledge the source of the Information by includingany attribution statement specified by the InformationProvider(s) and, where possible, provide a link to thislicence;- ensure that you do not use the Information in a way thatsuggests any official status;- ensure that you do not mislead others or misrepresentthe Information or its source;- ensure that your use of the Information does not breachthe Data Protection Act 1998 or the Privacy andElectronic Communications (EC Directive) Regs 2003.
    • 5. Exemptions:- personal data;- Information that has neither been publishednor disclosed under information accesslegislation (FOI) by or with the consent of theInformation Provider;- departmental or public sector organisationlogos, crests etc;- third party rights the Information Provider isnot authorised to license;- Information subject to other IPR
    • 6. Availability and AccessReuse and RedistributionUniversal Participation The Open Knowledge Foundation
    • 7. Availability and Access: the data mustbe available as a whole and at no morethan a reasonable reproductioncost, preferably by downloading overthe internet. The data must also beavailable in a convenient andmodifiable form. The Open Knowledge Foundation
    • 8. Reuse and Redistribution: the datamust be provided under terms thatpermit reuse and redistributionincluding the intermixing with otherdatasets. The Open Knowledge Foundation
    • 9. Universal Participation: everyone must be ableto use, reuse and redistribute – there shouldbe no discrimination against fields ofendeavour or against persons or groups. Forexample, ‘non-commercial’ restrictions thatwould prevent ‘commercial’ use, or restrictionsof use for certain purposes (e.g. only ineducation), are not allowed. The Open Knowledge Foundation
    • 10. /via
    • 11. FOI Licensing exemptions Data Paywalls protection ActAuthentication DATA “Privacy” Crappy Closed spreadsheets standards PDFs Messy Data
    • 12. Right to access data
    • 13. So where’s the data?
    • 14. “First” generation: data catalogues
    • 15. Breathing life into data…
    • 16. =importData(“CSV_URL”)
    • 17. the spreadsheet becomesA DATABASE
    • 18. “Second” generation: data management systems
    • 19. Digging for data…
    • 20. There’s lots moredata that’s lockedup in web pages…
    • 21. Scraping…
    • 22. “grabbing web contentin a machine readable format and then processing it for your own purposes”
    • 23. Original Extract AccessibleHTML web Information web page page -> data
    • 24. Recreating thedatabase that was used to populate a (templated) page
    • 25. “Creating” Data
    • 26. [DisruptiveInnovation?]
    • 27. Company Director Director Director Director Company Company Company Company
    • 28. Barriers to Use
    • 29. - Character string dates - Erratic whitespace - Arbitrary separators - Excel DatesAlso:- month overflows at week end- year overflows
    • 30. Openis as open does… DATA
    • 31.