Dan Needham & Phil Cross (mimas) – Names Project


Published on

Pecha Kucha presented at Repository Fringe 2011.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • My name is Dan and this is Phil. We are from Mimas and we’re going to be giving an overview of the Names project and the work we’ve been doing with it in the repository area.
  • Names is a jisc funded project, in partnership with the British Library. Working to uniquely identify individuals and institutions within UK acedimia, and investigating requirements for a name authority service. As part of this we have developed a software prototype that we are moving towards live service.
  • For the prototype we needed to build our own record store. We have done this by pulling in data from a variety of sources and attempting to deduplicate and disambiguate the unique individuals contained to form our own records.
  • Using a famous case study we can see that the information associated with Brian Cox can be used to help deduplicate and disambiguate him from individuals with similar names across a number of different data sources. In our case we’re looking at things like name form, fields of interest, institutional affiliation and collaborative relationships.
  • Having built an initial set of records derived from external data we built a RESTful API which could be used to query and retrieve data in a variety of datas including JSOn, RDF, MARKXML and default html
  • So here we have Brian’s names record represented using html. His identifier is a resolvable url which brings back the information associated with his record. You can see that it can be returned in the other formats mentioned, and can be filtered by different attributes.
  • Individuals may use their identifiers to identify themselves for paper submission. External applications or repositories might use the API to retrieve data for their own purposes. Researchers, other academics or reporters may search the service to find details on an individual. Libraries my use it for cataloguing purposes. All of these sources might feed data back into the service.
  • We currently have 46840 permanent records derived from RAE merit data and a submission for RGU. We are processing ~30 million records from Zetoc, UKPMC and Southampton Eprints. Thousands on the way from different repositories.
  • There are several next steps for names: 1 – Increase and flesh out records within Names. 2 – develop interoperability with other identifier systems. 3 – Continue to develop connections with other applications that might use names data 3 – develop facilities for researcher ownership over their own data.
  • So that’s a brief overview of the names prototype and where we’re up to with it. Thanks to Brian for unwittingly participating in my presentation. I’ll now pass you over to Phil who will discuss the work we’ve been doing with Names in the repository area.
  • I’ll be talking about how Names is working with repositoriesMostly looking at EPrints software at the moment but the work should be extensible to other platformsSo I’ll be talking about an EPrints plugin we’re writingAbout how we are able to extract researcher information from EPrints repositoriesAnd about how you can manually submit researcher information to us
  • The plugin augments the existing auto-complete on name entry but searches our API for matching names in addition to searching the local repositoryIt adds a preferred form for the name together with the unique person identifier supplied by Names
  • First sceenshot shows the auto-complete drop down but showing results from Names – including the unique identifiersIt’s Brian again!
  • You need to differentiate between people with the same name so a mouseover shows disambiguating information from Names.At present this is only the Field of Interest information. Will include other information such as example publications, co-authors, possibly a homepage
  • Here we see another Roberts, A but with a different Field of Interest.Note that the URI is currently going into the Email field. This is the EPrints creator id field but we are intending to generate a new Names URI field with the plugin to avoid overwriting any existing internal identifiers
  • Finally you can see that the plugin still searches across the local database – see the entries for Cross, PhilWe’ll be demoing the software if you want to have a better look
  • Complimentary plugins that we are considering for the future are:a means for researchers to export info about themselves to Names via our API to create an entry and identifieran admin plugin to enable global editing of existing researcher names to the Names format, adding unique identifiers – would depend on there already being creator ids assigned internally
  • Other ways we are interoperating with repositories:EPrints has the ability to export it’s entire set of metadata as an RDF graph. We have created a module that imports this data where an internal identifier has been used to automatically add researcher information to our database
  • Finally, for institutions who wish to send us data about their researchers to generate Names URIs, we have published a spreadsheet format that people can email to us for bulk upload.
  • We are very interested in receiving input from the repository community about what you would consider useful for our service to provide and from anyone who would be interested in working with us.
  • Dan Needham & Phil Cross (mimas) – Names Project

    1. 1. The Names Project<br />Dan Needham & Phil Cross<br />04-08-2011<br />
    2. 2. Background<br />John Smith<br />Smith, J.<br />J C Smith<br />J Hall née Smith <br />???<br />Smith, J. B.<br />Smith, Joanne<br />Joanne Clare Hall<br />
    3. 3. Data In<br />‹‹#<br />RAE<br />Names<br />Data<br />Zetoc<br />Disambiguation<br />EPrints<br />
    4. 4. Disambiguation<br />Cox, B.E.<br />Physics<br />“Double <br />defractive<br />Higgs…”<br />Manchester<br />Prof, OBE<br />Maxfield, S.J.<br />Co-author<br />Image: Paul Clarke@flickr<br />
    5. 5. Data Out<br />JSON<br />RDF<br />API<br />Names<br />Data<br />MARKXML<br />HTML<br />
    6. 6. Image: Bob Lee@flickr<br />
    7. 7. Use cases<br />Names<br />
    8. 8. Current records<br /><ul><li>How many records permanent and where from?</li></ul>46840 <br />Records made permanent<br />~ 30 Million<br />Records in<br />Progress<br />Thousands more on the way<br />
    9. 9. What’s next<br /><ul><li>Processing the zetoc data!</li></ul>ISNI<br />ORCID<br />…<br />Other IDs<br />More data!<br />Names<br />Usage!<br />Data Ownership<br />
    10. 10. Thanks Brian!http://names.mimas.ac.uk/individual/35219<br />Image: unimancschools@flickr<br />
    11. 11. Names for Repositories<br />- EPrints plugin<br />- Extracting researcher data from EPrints<br />- Manual submission of researcher data to Names<br />
    12. 12. EPrints Plugin<br />- Augments name auto-completion with Names API<br />- Adds Person URI to repository<br />Cox, B.T.<br />http://names.mimas.ac.uk/individual/38855<br />Repository<br />
    13. 13. Plugin in Action<br />
    14. 14. Plugin in Action<br />
    15. 15. Plugin in Action<br />
    16. 16. Plugin in Action<br />
    17. 17. Future Plugins<br />Researcher: export your own information to Names<br />Admin: global editing of existing info to Names format<br />Roberts, L.R.<br />http://names.mimas.ac.uk/individual/38855<br />http://names.mimas.ac.uk/individual/55665<br />Repository<br />
    18. 18. Importing data from EPrints<br />RDF<br />
    19. 19. Submission to Names<br />Disambiguation<br />Names Data<br />
    20. 20. Info<br />names.mimas.ac.uk<br />amanda.hill@manchester.ac.uk<br />daniel.needham@manchester.ac.uk<br />philip.cross@manchester.ac.uk<br />