Data All the Way Down
Jeni Tennison
@JeniT
http://www.jenitennison.com/blog/
Data All the Way Down
• challenges of complex open data
• layered approach to data publishing
• essential steps
• benefits
Complex Datasets
• too much for a single spreadsheet
• need to navigate
 • browse through data
 • look at slices of larger dataset
 • view summary statistics

• need to explain
 • definitions of terms, provisos & disclaimers
User Challenge
• complex data sets have range of users
 • different hardware / platforms
 • different tasks / goals
 • different ability / understanding

• no one interface satisfies everyone
• data owners cannot satisfy everyone
• create ecosystem around open data
visualisation / data gap   end user vs reuser
Visualisations
• approachable for real people
• necessary for stakeholder buy-in
• beauty is in what's left out
 • advertisement or taster of rich datasets
 • often not possible in official data

• leaves questions unanswered
 • what if we looked at the data in a different way?
Raw Data
• importable into own data store
 • often only interested in particular slice
 • data set may be massive / changing

• run whatever analysis you want
 • requires at least some programming skills
 • analysis might not be appropriate for the data

• documentation probably lacking
bridging the gap                         layered data access

Photo by Nikita Kravchuk http://www.flickr.com/photos/mi55er/3845619153/
Layered Architecture
• user interface
 • navigation and global understanding

• API
 • curated, targeted, programmable access

• query
 • free-form programmable access

• raw data
legislation.gov.uk   lists as Atom feeds
legislation.gov.uk   content as XML
legislation.gov.uk   layer other views
organograms   navigable visualisation
organograms   JSON data
organograms   RDF / XML / HTML
organograms   SPARQL query
organograms   raw data
Key Techniques
• resource-driven design (good URIs)
• every page built based on API calls
• explicit links to API access
 • for bonus points, link to your transformation code

• consistent terminology
 • clear mapping from UI to API

• caching & access control at each level
Benefits
• fork at any point
 • don't like the visualisation / API? create your own!

• everyone is human
 • reusers gain understanding from user interface

• visualisation benefits the stack
 • API oriented towards achieving a goal
 • visual validation of data improves quality
Questions?

Data All the Way Down

  • 1.
    Data All theWay Down Jeni Tennison @JeniT http://www.jenitennison.com/blog/
  • 2.
    Data All theWay Down • challenges of complex open data • layered approach to data publishing • essential steps • benefits
  • 3.
    Complex Datasets • toomuch for a single spreadsheet • need to navigate • browse through data • look at slices of larger dataset • view summary statistics • need to explain • definitions of terms, provisos & disclaimers
  • 4.
    User Challenge • complexdata sets have range of users • different hardware / platforms • different tasks / goals • different ability / understanding • no one interface satisfies everyone • data owners cannot satisfy everyone • create ecosystem around open data
  • 5.
    visualisation / datagap end user vs reuser
  • 6.
    Visualisations • approachable forreal people • necessary for stakeholder buy-in • beauty is in what's left out • advertisement or taster of rich datasets • often not possible in official data • leaves questions unanswered • what if we looked at the data in a different way?
  • 7.
    Raw Data • importableinto own data store • often only interested in particular slice • data set may be massive / changing • run whatever analysis you want • requires at least some programming skills • analysis might not be appropriate for the data • documentation probably lacking
  • 8.
    bridging the gap layered data access Photo by Nikita Kravchuk http://www.flickr.com/photos/mi55er/3845619153/
  • 9.
    Layered Architecture • userinterface • navigation and global understanding • API • curated, targeted, programmable access • query • free-form programmable access • raw data
  • 10.
    legislation.gov.uk lists as Atom feeds
  • 11.
    legislation.gov.uk content as XML
  • 12.
    legislation.gov.uk layer other views
  • 13.
    organograms navigable visualisation
  • 14.
    organograms JSON data
  • 15.
    organograms RDF / XML / HTML
  • 16.
    organograms SPARQL query
  • 17.
    organograms raw data
  • 18.
    Key Techniques • resource-drivendesign (good URIs) • every page built based on API calls • explicit links to API access • for bonus points, link to your transformation code • consistent terminology • clear mapping from UI to API • caching & access control at each level
  • 19.
    Benefits • fork atany point • don't like the visualisation / API? create your own! • everyone is human • reusers gain understanding from user interface • visualisation benefits the stack • API oriented towards achieving a goal • visual validation of data improves quality
  • 20.