1. The Invisible Scientist
Personal Digital Identity
on the Web:
Problems + Solutions
Duncan Hull
The University of Manchester
Science Foo Camp 2009
The Googleplex
Mountain View, California
2. The Invisible Scientist: Digital Identity
• I am not an identity or security expert but…
• Introduction: Personal Identity historically and currently
• The Problem:
– The way we identify scientists on the Web is inefficient and badly broken
– Which can make much of their work “invisible”
• Some solutions:
– URIs
– OpenIDs
~15 minutes of slides
– Contributor-ID (www.crossref.org) ~45 minutes for discussion
• Conclusions + What might better digital identities allow?
3. Tools for sharing data on the Web < 10 yrs old
All these social software tools are reliant on digital identity of some form
http://tinyurl.com/myscience
These tools are good but…
4. Unfortunately
• Many biomedical scientists don’t use these tools for serious work
– (if at all)
• Why?
• It’s complicated but…
5. Scientific publishing has worked this way for centuries
• Publishing the main (perhaps only) way of formally identifying people and their work
• “Publish or Perish”
7. Identity is different
• http://www.cs.bris.ac.uk/~gough/ on the Web:
• http://en.wikipedia.org/wiki/Julian_Gough We use URI’s to
identify people
• http://twitter.com/SUPERFAMILY
• http://www.juliangough.com/
• http://www.linkedin.com/pub/julian-gough/b/25b/3b3
• http://www.citeulike.org/tag/julian-gough
• http://dblp.uni-trier.de/db/indices/a-tree/g/Gough:Julian.html
• http://pubmed.gov/?Term=Julian+Gough[author]
• http://pubmed.gov/?Term=Gough+J[author] Julian Gough
• http://www.facebook.com/julian.gough
But do all these URI’s identify the same person?
8. Science is increasingly Digital
• Science is increasingly digital
– Not just digital publications in electronic journals…
– wiki edits (e.g. Rfam and Pfam in wikipedia, robert hoffman wikgenes)
– Software development, workflows
– Development of databases and ontologies - “data driven science” + “open data”
– blog posts
• Traditional journal publishing is often inadequate for sharing this kind of data
and attributing it to individual people
– See “Defrosting the Digital Libray” in PLoS Computational Biology for details
– Hull et all (2008) http://pubmed.gov/18974831
• No good incentives to make digital contributions (besides traditional publishing)
• “Micro-attribution” - a large number of small contributions go unrewarded
10. Misattribution (part 2)
• “Forgotten Password”, “Already Registered”, “Please Login”, “Access Denied”
are all recognised as “authors” in Google Scholar
http://tinyurl.com/phantom-user
11. Digital attribution
Attribution would seem to be a simple
process and yet it represents a
major, unsolved problem
for information science.
Neil Smalheiser and Vetle Torvik
Author name disambiguation
Chapter published in Volume 43 (2009) of the
Annual Review of Information Science and Technology (ARIST)
(edited by B. Cronin) which is available from the publisher Information Today, Inc
http://www.hbs.edu/units/tom/seminars/2007/docs/Author%20Name%20Disambiguation.pdf
12. Digital identity is currently a mess
• As well as identifying and attributing with URIs, we also need to:
– Attribution: Julian AuthorOf IncrediblyImportantThing
– Authentication: is Julian is who he says he is? Or a fake?
– Authorisation: is Julian authorised to do stuff?
Currently done through
combination of username-and-password
The average user has
[at least]
18 user accounts
and 3.49 passwords”
Simon Willison
(The Guardian) http://tinyurl.com/too-many-passwords
13. Digital Identity Really Matters
• Digital Identity is a pre-requisite for
– Attribution …
– Contribution…
– Publication … to be recorded and quantified.
• Important decisions made on digital identity
– Hiring, funding, promotion, collaboration
– Selecting appropriate reviewers for grants and publications
– attributing published data in an increasingly web-based world
• This is the environment which social software / Web 2.0 operates in:
– Reliant on accurate and efficient digital identities
14. What is myExperiment? http://www.myexperiment.org
• Facebook for Scientists?
• Collaborative software for sharing and finding experimental protocols on the web
15. Who is involved in
myExperiment?
Carole Goble
• Small team of developers (2-3 full time) David De Roure
• 1500 users have uploaded 560 workflows, 150 files
and 40 packs in 130 groups
18. Open ID is quickly becoming widespread
“42,235 sites are now enabled to accept OpenID logins” source
http://blog.janrain.com/2009/05/relying-party-stats-as-of-may-1-2009.html
19. But there are usability “issues”
einstein@uzh.ch
84%
+
mcsquared
OR
http://einstein.myopenid.com/ 16%
Password handled by third party
OpenID provider
Unless you hide it (e.g. Gmail, wordpress)
20. Crossref solution: DOIs for people
Geoffrey Bilder
• Crossref has solved a similar problem with identifying publications across different
publishers called “Digital Object Identifiers (DOI)”
– DOI:10.1371/journal.pcbi.1000204
– http://dx.doi.org/10.1371/journal.pcbi.1000204
– They are working on something similar for people
• DOI’s for scientists “Contributor ID”
– Watch this space…
21. Conclusions
• Digital Identity is broken (many biomedical scientists don’t realise)
– Important contributions are not properly attributed
– Misattribution can lead to invisibility
– This can discourage scientists from using the web more
• Fixing digital identities could make science more efficient
– Recognise digital contributions
– Motivate people to make non-publication contributions
• Technical problem mostly solved
• Discussion: The Good, The Bad and The Ugly Things identity might allow…
– Over to you!