• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Technical Challenges in Resource Discovery
 

Technical Challenges in Resource Discovery

on

  • 1,173 views

Presentation given to the Discovery Summit, British Library, 21/02/2013

Presentation given to the Discovery Summit, British Library, 21/02/2013

Statistics

Views

Total Views
1,173
Views on SlideShare
1,070
Embed Views
103

Actions

Likes
2
Downloads
8
Comments
1

4 Embeds 103

https://twitter.com 99
http://localhost 2
http://tweetedtimes.com 1
http://ams.activemailservice.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Technical Challenges in Resource Discovery Technical Challenges in Resource Discovery Presentation Transcript

    • Technical challenges in resource discovery Paul    Walk paul@paulwalk.net @paulwalk http://www.paulwalk.net
    • Contents1. a  general  consideration: • open  or  closed2. a  particular  challenge: • synchronisation  in  an  open  world3. the  ‘nothing  new’,  but  doing  it  better • APIs  that  work  and  can  be  trusted
    • a healthy(?) state of tensionbetween open and closed
    • open and closed worlds• I’m  not  talking  about  licensing  or  access  to  data• open • unbounded  -­‐  like  the  Web• closed • bounded  -­‐  like  most  collections  management  system,  aggregations  etc.• formally,  much  of  what  we  do  is  underpinned  by  ‘open/closed  worlds’   assumptions: • open  world  assumption:  any  statement  not  known  to  be  true  is  unknown • closed  world  assumption:  any  statement  not  known  to  be  true  is  false
    • characteristics of an open world
    • characteristics of a closed/bounded world
    • judging where to apply each• we  need  our  infrastructure  (especially  integration  technology  between  systems)   to  be  open  and  relatively  unbounded • the  Web  is  still  the  best  available  foundation  for  this• however,  we  still  need  to  manage  our  resources,  maintain  quality  and  honour   complex  rights  management  commitments• we  probably  need  to  recognise  that  users’  experience  is  often  enhanced  through   the  application  of  a  more  focussed,  targeted  and  context-­‐aware  approach
    • a particular challenge
    • synchronisation • how  is  the  state  of  the   resource  maintained  across  ResourceCollection an  infrastructure  of   Aggregation ‘federated’  repositories? Resource • if  a  resource  is  changed  or   Collection Aggregation deleted,  how  does  the  right-­‐ hand  side  aggregation  know? Aggregation Resource • note  -­‐  this  is  based  on  our   Collection existing  ‘harvesting’  or  ‘pull’   approach Resource Collection multiple harvest routes, multiple copies
    • ResourceSync• a  joint  project  of  NISO  and  OAI,  led  by   Herbert  Van  de  Sompel  of  Los  Alamos• a  light-­‐weight  mechanism  to  allow  the   state  of  web  resources  to  be   communicated  between  web  systems• developing  a  spec  which  builds  on  the   sitemap  speciTication,  allowing   content  providers  to  publish   changesets• draft:  http://bit.ly/WYhTz2• Jisc  have  funded  UK  participation  in   this
    • The sun shone, having noalternative, on the nothingnew. Murphy,  Samuel  Becket
    • A distributed system is onein which the failure of acomputer you didnt evenknow existed can renderyour own computer unusable Leslie Lamport
    • a common ‘anti-pattern’ • as  a  developer,  I  have  no  reason  to   trust  that  these  APIs  are  any  good.   end-user end-user end-user UI • after  all,  the  service  provider   UI UI doesn’t  seem  to  trust  them  for  their   Future own  application.... Future 3rd-party Future 3rd-party dev 3rd-party dev dev API AP A PI I some aggregated data of broad interest and potential usefulness = certainty UI = belief = speculation end-user
    • a better pattern • As  a  developer,  I’m  more  likely  to   trust  this  pattern. • the  content  provider  is  using  their   end-user end-user own  API  to  deliver  their  own   application. UI UI • they  have  a  vested  interest! 3rd-party focussed app app API = certainty = belief some aggregated data of broad = speculation interest and potential usefulness
    • APIs are not best thought ofas machine-to-machineinterfacesAPIs are interfaces fordevelopers
    • messages from developers to content-providers• These  are  from  yesterday’s  developer  day  held  here  at  the  BL  in  support  of  this   summit:• please  don’t  build  elaborate  APIs  which  do  not  allow  us  to  see  all  of  the  data,   or  its  extent.  It’s  not  that  we  simply  want  to  download  all  the  data  -­‐  but  we  do  need   to  see  what  we’re  dealing  with• if  you  give  us  access  to  incomplete  data  (perhaps  because  you’re  worried  about   revealing  poor  data  quality),  then  we  will  tend  to  either  abandon  our  attempts  to   use  it  or  we  will  ‘Bill  in  the  gaps’  with  data  from  elsewhere.  So  offering  an  API   which  delivers  incomplete  data  is  usually  self-­‐defeating• the  implicit  bargain,  made  explicit: • give  us  access  to  the  data  as  soon  as  possible  and  we  will  do  some  of  the  work  to   process  so  it  is  Bit  for  some  new  purpose  -­‐  and  we  will  happily  share  this  code   with  you
    • Questions for the parallel sessions1. Which  emerging  technologies  do  we  need  to  focus  on  in   2013?2. Do  we  still  need  to  aggregate?3. What  does  data  quality  stop  us  doing?
    • Which emerging technologies do we need tofocus on in 2013?• Graphs:  Content  Context  is  king• both  Facebook  and  Google  are  betting   heavily  on  graph  technologies• closer  to  home  -­‐  so  are  content  providers   like  the  BBC• linking  these  is  an  interesting  challenge• databases  based  on  a  graph  model  give   the  potential  for  a  richer  understanding   about  entities  (users!)• instrumentation  in  personal  devices   makes  more  context  available  (e.g.  geo-­‐ location).
    • Do we still need to aggregate?
    • Do we still need to aggregate? yes.
    • Do we still need to aggregate? yes.• to  address  systems/network  latency  -­‐  provide  a  cache• to  showcase!• for  ‘Web  Scale  concentration’• network  effects  if  user  facing  services  also  developed• to  create  middleman  business  opportunities• as  infrastructure  to  support  locally  developed  services• as  an  approach  to  preservation
    • What does data quality stop us doing?• interpreted  as:  “what  does  a  concern  for  data  quality  stop  us  doing?” • it  stops  us  from  releasing  data  early• interpreted  as:  “what  does  poor/uncertain  data  quality  stop  us  doing?” • it  erodes  trust,  which  impacts  the  likelihood  of  someone  doing  something   worthwhile  with  our  data• reconciling  these  concerns  is  a  major  challenge  for  us.
    • thank you! Paul    Walk paul@paulwalk.net @paulwalk http://www.paulwalk.net