Thomas Krichel (Long Island University) – AuthorClaim


Published on

This is the presentation that accompanied Thomas Krichel's talk on AuthorClaim which was presented via Skype at Repository Fringe 2011.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Thomas Krichel (Long Island University) – AuthorClaim

  1. 1. IR repository data in AuthorClaim Wolfram Horstmann1 Thomas Krichel2,3,41 Bielefeld University 2 Long Island University 3 Novosibirsk State University 4 Open Library Society Repository Fringe 2011–08–03
  2. 2. thanksto the organizersto the BASE contributors Bernd Fehling, Marek Imialek, Mathias Loesch, Renata Mitrenga, Dirk Pieper, Jochen Schirrwagen, Friedrich Summann, Sebastian Wolf
  3. 3. structurebackground3libAuthor claiming and AuthorClaimBASE andAuthorClaim and BASE
  4. 4. backgroundWolfram is chief information officer(scholarly Information) at BielefeldUniversity.They run the BASE search enginesince 2004They are doing long-run work.
  5. 5. backgroundThomas the founder of RePEc.Thomas started this in the early 90s.Thomas is doing long-run work.
  6. 6. motivationMake (economics) papers freelyavailable.Make information about the papersfreely available.Have a self-sustaining infrastructure ofthis, don’t rely on external sources.
  7. 7. RePEcRePEc is misunderstood as arepository.In fact it is a collection of 1300+institutional (subject) repositories. pre-date OAI reduced business model more tightly interoperable
  8. 8. RePEc sources of successThere are a lot of sources of success.The reason can be classified business case technical matterboth are linked
  9. 9. RePEc business caseRePEc tries to decentralize as much aswe can.RePEc run essentially on volunteerpower.RePEc encourage reuse of RePEcdata.
  10. 10. RePEc technical caseRePEc registers authors with theRePEc Author Service (RAS).RePEc registers institutions.RePEc provides evaluative data forauthors and institutions.
  11. 11. RePEc and IRsRePEc is not a repository.RePEc is a bibliographic layer overrepositories.IRs can/will benefit from a similarbibliographic layer.
  12. 12. requirement for such a layerNot dependent on external funding.Freely reusable instantaneously.Must be there for the long-run.
  13. 13. a RePEc for all disciplinesRePEc bibliographic data → 3libRePEc Author Service → AuthorClaimEDIRC → ARIW
  14. 14. 3lib 3lib is an initial attempt at building an aggregate of freely available bibliographic data. It’s a project by OLS sponsored by OKFN. About 35 million records from the usual suspects: PubMed, OpenLibrary, DBLP, RePEc.
  15. 15. 3lib elementsThe data elements in 3lib are verysimple title author name expressions link to item page on provider site identifier3lib is meant to serve AuthorClaim.
  16. 16. AuthorClaimAuthorClaim is an authorship claimingservice for 3lib data.It lives at uses the same software as theRePEc Author Service, called ACIS.It is running since early 2008.
  17. 17. author claiming history IThomas started the first authorclaiming system, the RePEc authorservice in 1999.The system was written by MarkusJ.R. Klink.
  18. 18. author claiming history IIISI created researcherID in 2006 (?)arXiv have an author claiming systemsince 2009.NIH and Google Scholar are workingon it.The ORCID initiative is looking intoauthor identification since 2009.
  19. 19. claiming vs identificationAuthor claiming records are NOTauthor identification records.The difference is called “Klink’sproblem”.An person can claim to be an authorof a paper. If there are several author,we don’t know what author (s)he is.
  20. 20. Klink’s problem exampleJane and John Smith write a paper.Author list say “J. Smith and J.Smith”
  21. 21. AuthorClaim dataftp://ftp.authorclaim.orgCC0more than 100 profiles, growing slowly.
  22. 22. an exampleid: pbi1name variations: Geoffrey Bilder — G.Bilder — Bilder, G.isauthorof: info:lib/elis:856hasnoconnectionto:info:lib/pubmed:11127885 —info:lib/pubmed:7482633
  23. 23. more on the exampleThe refused papers are there forservices to build learning models forauthor names. Actually learning is anintegral part of the way AuthorClaimworks.Actually records also contain the 3libdata for papers.and they have ARIW-base affiliationdata.
  24. 24. IRs and author identificationIRs are generally too large to authoridentification by IR staff.Only registration of contributors isusually required.
  25. 25. IRs and author claimingIRs are too small to make itmeaningful for authors to claim papersin them directly.
  26. 26. benefits of author claiming toIRAll papers by an author can be puttogether.The task can be completelyautomated once an AuthorClaimrecord claims a paper in the IR.
  27. 27. to get it doneFrom the four elements in a 3librecord only the link to a web pagedescribing the item is problematic.But is cumbersome to customize toclose to 2k IRs.
  28. 28. partnership with BASEWe need a centralized collection.BASE is already doing this job.BASE can deliver all data toAuthorClaim regularly.
  29. 29. BASE aggregation servicesconstant monitoring of OAI-PMHworldconfiguration of harvesting for eachnew and erroneous repositorymetadata stores (raw and normalized)
  30. 30. BASE normalizationhighly heterogeneous use of OAI-DCrequires cleaning and enrichment, e.g. dc:type dc:date dc:languagealso enrichment with (missing) subjectclassifications
  31. 31. BASE search servicesbuilds index (solr)end user interfaces (vufind)iPhone-AppAPI for index usage by third parties(http or SOAP)
  32. 32. BASE data servicesrepository profile service (REST)raw metadata store access (http orSOAP)rsync for AuthorClaim
  33. 33. BASE data in AuthorClaimselection of records that have requireddata author title link identifierincremental updates
  34. 34. repository exclusionFrom BASE profile AuthorClaimdiscards some IRs that contain student work digitized old material link collections primary research dataThere are some minor manualexclusions.
  35. 35. results so far 1930 repositories, 12740116 records. 534 records claimed. The documentation at needs some debugging. The collection is not yet announced because it is being read.
  36. 36. the endContact krichel@openlib.orgfor more information.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.