Technical standards &
the RDTF Vision: some
considerations

Paul Walk
p.walk@ukoln.ac.uk

                                                     UKOLN is supported by:




      www.ukoln.ac.uk
     A centre of expertise in digital information management
general approach



                   2
no need
to take
pioneering
risks!

             3
we're not building a system

we're creating an
environment that enables
resource-providing systems to
interoperate

                                4
(...but we might build some
   stuff to help this along)



                               5
standards, schmandards
‱ standards are not the whole story shared                                  technical
 ‱   lessons for funders, providers & users principles                      standards

‱ use (open) technical standards where                   interoperability
  possible

‱ require standards only where necessary
 ‱   avoid pushing standards to create adoption           community/
                                                            domain

‱ establish/understand high-level principles              conventions

  and ‘explain the workings out’ - support
  deeper understanding

‱ foster/adopt conventions, based on open
  standards, born of community engagement
  and practice                                                                     6
technical foundations -
       safe bets


                          7
roasoadoa
‱ Service Oriented Architecture
 ‱   we learned the danger of mandating standards too early....

‱ Resource Oriented Architecture
 ‱ actually, ROA is still ‘service oriented’
 ‱ adopts universal conventions for the service part
 ‱ emphasises the resources
 ‱ It works!!!
‱ Data Oriented Architecture?
 ‱   potentially different as current trends are showing a tendency to
     ignore the service....

 ‱   data dumps
                                                                         8
identify persistently
‱ give global & public identities to your high-order
  entities
 ‱   metadata records

 ‱   actual resources

‱ choose an existing scheme and stick to it - don’t
  invent a new one
 ‱   e.g. use DOI for scholarly communications

‱ HTTP URIs are a sensible default, but use existing
  schemes where they exist for your domain/use-case

‱ be pragmatic - the persistent identiïŹcation religious
  wars are over!
                                                          9
expose metadata
‱ Powell & Johnstone - technical guidance on exposing
  metadata

‱ use persistent identiïŹers to identify metadata
  records

‱ use persistent identiïŹers to identify the
  resources the metadata records describe

‱ expose collections of resources
‱ use persistent identiïŹers to point to the
  collections

‱ lists are useful in all computer systems
‱ lists are collections; feeds are lists                10
use HTTP & REST
‱ embrace constraints!
‱ be resource-oriented where possible
 ‱   relax and embrace the constraints of HTTP & REST

‱ REST is complicated, but:
 ‱   you only have to understand it once!

 ‱   being more RESTful is achievable and often worthwhile

‱ REST & HTTP together give you a common, practically
  universal interface
 ‱   people have a ïŹghting chance of being able to work with you to
     consume your data, or to build on top of it

 ‱   you automatically get beneïŹt of things like caching
                                                                      11
aggregation
‱ aggregation is a corner-stone of the RDTF vision
‱ make your resources a target for aggregation:
 ‱   use persistent identiïŹers for everything - aggregations work
     much better if the inputs have globally unique identiïŹers

 ‱ adopt appropriate licensing
 ‱ in data aggregation, ‘share alike’ is easier than
     ‘attribution’

 ‱   CC0 is gaining popularity

‱ use aggregations ‘tactically’ (Peter Burnhill)
 ‱   they are a means to an end

 ‱   it’s the underlying resources which matter
                                                                    12
technical foundations -
   more of a gamble


                          13
Linked Data?
‱ as deïŹned by W3C (RDF etc.)
‱ elegant, seductively so
 ‱   may be the future.... but it has been the future for 10 years
     now....

‱ difïŹcult to see evidence of value through the hype
 ‱   works in curated contexts (Mike Bergman)

 ‱   not proven to work on the wide-open Web

‱ not yet (mainstream)developer-friendly
‱ be sensible! be critical!
‱ nothing I have advocated precludes this - so we can
  proceed carefully
                                                                     14
in particular pay attention
            to...


                              15
“build for normal users,
    developers and
       machines”
        Tom Coates
        http://www.plasticbag.org/archives/2006/02/my_future_of_web_apps_slides/




                                                                                   16
service (anti)patterns
‱ design your API to be
  developer-friendly

‱ be aware of what works, and
  of what appears to work
  but actually might not...

‱ share this understanding



                          Paul Walk, An infrastructure service anti-pattern
                          http://blog.paulwalk.net/2009/12/07/an-infrastructure-service-anti-pattern/

                                                                                                  17
expect & enable
users to ïŹlter -
give them feeds
  (RSS/Atom)

concentrate on
 making your
  resources
   available       http://www.ïŹ‚ickr.com/photos/httpwwwïŹ‚ickrcompeoplenadar/3349883/ (CC BY-
                   NC-ND 2.0)




                                                                                             18
openness and usability
‱ ‘open’ in danger of becoming synonymous with
  ‘permissively licensed’

‱ can be open and impossible/very difïŹcult to use
 ‱   this can be a sin of omission or even commission!

 ‱   remember all those SOAP interfaces....

 ‱   a well supported API might be more open than a completely
     freely available dump of gigabytes (or more) of data in the sense
     that it might allow open engagement from more people

‱ we need a richer understanding of openness - don’t let
  the discussion be dominated by the hippies ;-)

‱ however open technical standards are intrinsic
  to openness in any case
                                                                         19
in other words...


           be open, usefully


                               20
developer-friendly formats
‱ XML has a lot going for it:
 ‱   well understood

 ‱   very well supported with tools, libraries etc.

 ‱   often ïŹts the information models we’re used to

‱ but it has some issues:
 ‱   validation is a pain and is very often ignored

 ‱   it’s verbose - it takes up a lot of bandwidth

 ‱   not everything is a tree!

‱ JSON has gained rapid adoption
 ‱   less verbose - simple - ideal for simple client-side manipulation
                                                                         21
character encodings....
‱ huge number of XML records
  from UK IRs are invalid due
  to character encoding
  issues....

‱ UTF-8 is de-facto default for
  many systems




     ‱ there is a spacial
        place in hell for
        developers who
        ignored character
        encodings...              http://www.ïŹ‚ickr.com/photos/10661825@N07/

                                                                              22
next steps



             23
Technical Foundations
‱ articulate the principles behind adoption of various
  paradigms, standards and technologies
 ‱   Technical Foundations website
     ‱   ETA - June 2011


‱ gather evidence of ‘good use’ of technical standards
  and related technologies in our sector:
 ‱   JISC Observatory
     ‱   observatory.jisc.ac.uk


 ‱   ISKB being developed at UKOLN
     ‱   ETA - April/May 2011


‱ understand federated aggregation better
                                                         24
Recipes
‱ produce ‘recipes’ from the wealth of good
  practice and technical guidelines - e.g. the
  Technical Guidance on Metadata
  Standards (Powell & Johnstone)

‱ establish a glossary of terms to enable
  productive discussion in this space

‱ create an RDTF ïŹlter across the
  ‘Technical Foundations’ website being
  prepared for JISC by UKOLN

‱ create an RDTF ‘view’ of a subset of the
  resources and annotations in the ISKB
  being developed at UKOLN
                                      http://www.ïŹ‚ickr.com/photos/bigcrow/3381550945/
                                      (CC BY-NC-SA 2.0)
                                                                                   25
the big question facing
data providers:

do you want to provide a
data service, or just data?

                              26

Technical standards & the RDTF Vision: some considerations

  • 1.
    Technical standards & theRDTF Vision: some considerations Paul Walk p.walk@ukoln.ac.uk UKOLN is supported by: www.ukoln.ac.uk A centre of expertise in digital information management
  • 2.
  • 3.
  • 4.
    we're not buildinga system we're creating an environment that enables resource-providing systems to interoperate 4
  • 5.
    (...but we mightbuild some stuff to help this along) 5
  • 6.
    standards, schmandards ‱ standardsare not the whole story shared technical ‱ lessons for funders, providers & users principles standards ‱ use (open) technical standards where interoperability possible ‱ require standards only where necessary ‱ avoid pushing standards to create adoption community/ domain ‱ establish/understand high-level principles conventions and ‘explain the workings out’ - support deeper understanding ‱ foster/adopt conventions, based on open standards, born of community engagement and practice 6
  • 7.
  • 8.
    roasoadoa ‱ Service OrientedArchitecture ‱ we learned the danger of mandating standards too early.... ‱ Resource Oriented Architecture ‱ actually, ROA is still ‘service oriented’ ‱ adopts universal conventions for the service part ‱ emphasises the resources ‱ It works!!! ‱ Data Oriented Architecture? ‱ potentially different as current trends are showing a tendency to ignore the service.... ‱ data dumps 8
  • 9.
    identify persistently ‱ giveglobal & public identities to your high-order entities ‱ metadata records ‱ actual resources ‱ choose an existing scheme and stick to it - don’t invent a new one ‱ e.g. use DOI for scholarly communications ‱ HTTP URIs are a sensible default, but use existing schemes where they exist for your domain/use-case ‱ be pragmatic - the persistent identiïŹcation religious wars are over! 9
  • 10.
    expose metadata ‱ Powell& Johnstone - technical guidance on exposing metadata ‱ use persistent identiïŹers to identify metadata records ‱ use persistent identiïŹers to identify the resources the metadata records describe ‱ expose collections of resources ‱ use persistent identiïŹers to point to the collections ‱ lists are useful in all computer systems ‱ lists are collections; feeds are lists 10
  • 11.
    use HTTP &REST ‱ embrace constraints! ‱ be resource-oriented where possible ‱ relax and embrace the constraints of HTTP & REST ‱ REST is complicated, but: ‱ you only have to understand it once! ‱ being more RESTful is achievable and often worthwhile ‱ REST & HTTP together give you a common, practically universal interface ‱ people have a ïŹghting chance of being able to work with you to consume your data, or to build on top of it ‱ you automatically get beneïŹt of things like caching 11
  • 12.
    aggregation ‱ aggregation isa corner-stone of the RDTF vision ‱ make your resources a target for aggregation: ‱ use persistent identiïŹers for everything - aggregations work much better if the inputs have globally unique identiïŹers ‱ adopt appropriate licensing ‱ in data aggregation, ‘share alike’ is easier than ‘attribution’ ‱ CC0 is gaining popularity ‱ use aggregations ‘tactically’ (Peter Burnhill) ‱ they are a means to an end ‱ it’s the underlying resources which matter 12
  • 13.
    technical foundations - more of a gamble 13
  • 14.
    Linked Data? ‱ asdeïŹned by W3C (RDF etc.) ‱ elegant, seductively so ‱ may be the future.... but it has been the future for 10 years now.... ‱ difïŹcult to see evidence of value through the hype ‱ works in curated contexts (Mike Bergman) ‱ not proven to work on the wide-open Web ‱ not yet (mainstream)developer-friendly ‱ be sensible! be critical! ‱ nothing I have advocated precludes this - so we can proceed carefully 14
  • 15.
    in particular payattention to... 15
  • 16.
    “build for normalusers, developers and machines” Tom Coates http://www.plasticbag.org/archives/2006/02/my_future_of_web_apps_slides/ 16
  • 17.
    service (anti)patterns ‱ designyour API to be developer-friendly ‱ be aware of what works, and of what appears to work but actually might not... ‱ share this understanding Paul Walk, An infrastructure service anti-pattern http://blog.paulwalk.net/2009/12/07/an-infrastructure-service-anti-pattern/ 17
  • 18.
    expect & enable usersto ïŹlter - give them feeds (RSS/Atom) concentrate on making your resources available http://www.ïŹ‚ickr.com/photos/httpwwwïŹ‚ickrcompeoplenadar/3349883/ (CC BY- NC-ND 2.0) 18
  • 19.
    openness and usability ‱‘open’ in danger of becoming synonymous with ‘permissively licensed’ ‱ can be open and impossible/very difïŹcult to use ‱ this can be a sin of omission or even commission! ‱ remember all those SOAP interfaces.... ‱ a well supported API might be more open than a completely freely available dump of gigabytes (or more) of data in the sense that it might allow open engagement from more people ‱ we need a richer understanding of openness - don’t let the discussion be dominated by the hippies ;-) ‱ however open technical standards are intrinsic to openness in any case 19
  • 20.
    in other words... be open, usefully 20
  • 21.
    developer-friendly formats ‱ XMLhas a lot going for it: ‱ well understood ‱ very well supported with tools, libraries etc. ‱ often ïŹts the information models we’re used to ‱ but it has some issues: ‱ validation is a pain and is very often ignored ‱ it’s verbose - it takes up a lot of bandwidth ‱ not everything is a tree! ‱ JSON has gained rapid adoption ‱ less verbose - simple - ideal for simple client-side manipulation 21
  • 22.
    character encodings.... ‱ hugenumber of XML records from UK IRs are invalid due to character encoding issues.... ‱ UTF-8 is de-facto default for many systems ‱ there is a spacial place in hell for developers who ignored character encodings... http://www.ïŹ‚ickr.com/photos/10661825@N07/ 22
  • 23.
  • 24.
    Technical Foundations ‱ articulatethe principles behind adoption of various paradigms, standards and technologies ‱ Technical Foundations website ‱ ETA - June 2011 ‱ gather evidence of ‘good use’ of technical standards and related technologies in our sector: ‱ JISC Observatory ‱ observatory.jisc.ac.uk ‱ ISKB being developed at UKOLN ‱ ETA - April/May 2011 ‱ understand federated aggregation better 24
  • 25.
    Recipes ‱ produce ‘recipes’from the wealth of good practice and technical guidelines - e.g. the Technical Guidance on Metadata Standards (Powell & Johnstone) ‱ establish a glossary of terms to enable productive discussion in this space ‱ create an RDTF ïŹlter across the ‘Technical Foundations’ website being prepared for JISC by UKOLN ‱ create an RDTF ‘view’ of a subset of the resources and annotations in the ISKB being developed at UKOLN http://www.ïŹ‚ickr.com/photos/bigcrow/3381550945/ (CC BY-NC-SA 2.0) 25
  • 26.
    the big questionfacing data providers: do you want to provide a data service, or just data? 26