Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Designing with algorithms


Published on

A case study presented at UX Cambridge 2016.

For hundreds of years, discoveries in science have been discussed, debated and advanced within the scientific literature. Finding evidence in the literature, to test a hypothesis, is fundamental to scientific research.

But finding evidence in scientific literature can be time consuming and difficult, especially as the number of published articles increases significantly each year. Advances in text mining technology offer the potential to make this task easier and quicker. Text miners are software engineers and subject experts who write algorithms to find useful information in vast amounts of unstructured text content. Deciding what information is useful to end users, and presenting it in an intuitive way, at the right point in time, is where UX can help.

This is a case study about annotating scientific terms and concepts in millions of research articles, with the goal to help life science researchers identify relevant information in articles quickly and easily. We explain how text miners, UX and developers collaborated; what we discovered about user needs; challenges and constraints we faced and iterative improvements we have made to the design.

Published in: Technology
  • Be the first to comment

Designing with algorithms

  1. 1. Designing with algorithms How text miners and UX can work together @micheleidesmith @j_h_kim Michele Ide-Smith, Product Manager Jee-Hyub Kim, Text Miner European Bioinformatics Institute (EMBL-EBI)
  2. 2. @micheleidesmith @micheleidesmith @j_h_kim European Bioinformatics Institute The home for big data in biology
  3. 3. @micheleidesmith @micheleidesmith @j_h_kim What we’ll cover • Context - finding evidence in research literature • What are annotations? • What is text mining? • Research insights and our design process • Summary - what we learnt
  4. 4. @micheleidesmith @micheleidesmith @j_h_kim Research scientists are expected to publish several articles every year
  5. 5. @micheleidesmith @micheleidesmith @j_h_kim Finding evidence in scientific literature is a challenge
  6. 6. “I was looking for the cellular location (cytoplasm or nucleus) of ribonucleotide reductase. It’s like a needle in a haystack.”
  7. 7. @micheleidesmith @micheleidesmith @j_h_kim
  8. 8. @micheleidesmith @micheleidesmith @j_h_kim
  9. 9. @micheleidesmith @micheleidesmith @j_h_kim Abstracts 31.4 million Agricola records 631,222 Full text articles 3.8 million NHS guidelines 780 Patents 4.2 million
  10. 10. @micheleidesmith @micheleidesmith @j_h_kim Full text is free to read and share, but a CC-BY license allows reuse
  11. 11. @micheleidesmith @micheleidesmith @j_h_kim Researchers still like to read PDFs
  12. 12. “Sometimes it’s nicer to scan a PDF, in my opinion...less scrolling and the figures are more prominent. I really don’t like to read on the screen.” “I can search in the PDF a little bit more easily than in the full text article.” “This [full text] is fairly clear but sometimes PDFs are slightly easier to read, slightly easier on the eye.”
  13. 13. @micheleidesmith @micheleidesmith @j_h_kim Younger researchers prefer reading online
  14. 14. “I almost never look at PDFs, they are a bit of a pain.” “I never go to the publisher site - I like to see all the articles in the same format. I don’t go to the PDF unless I want to print it out.”
  15. 15. @micheleidesmith @micheleidesmith @j_h_kim Our users • Life sciences researchers - find evidence for their research questions, learn new methods and find all available literature on a topic • Curators - find evidence for e.g. a gene function so that they can curate a page in a database
  16. 16. @micheleidesmith @micheleidesmith @j_h_kim “If I notice it’s really important then I’ll print it, so I can highlight it with a pen.”
  17. 17. @micheleidesmith @micheleidesmith @j_h_kim GOAL: To help researchers find useful information in articles quickly, and link to related data resources
  18. 18. @micheleidesmith @micheleidesmith @j_h_kim annotation noun a note by way of explanation or comment added to a text or diagram.
  19. 19. @micheleidesmith @micheleidesmith @j_h_kim An annotation is metadata (e.g. a comment, explanation, presentational markup) attached to text, image, or other data.
  20. 20. @micheleidesmith @micheleidesmith @j_h_kim Annotations can be private
  21. 21. @micheleidesmith @micheleidesmith @j_h_kim
  22. 22. @micheleidesmith @j_h_kim
  23. 23. @micheleidesmith @j_h_kim
  24. 24. @micheleidesmith @micheleidesmith @j_h_kim Annotations can be public
  25. 25. @micheleidesmith @j_h_kim
  26. 26. @micheleidesmith @micheleidesmith @j_h_kim
  27. 27. @micheleidesmith @j_h_kim
  28. 28. @micheleidesmith @micheleidesmith @j_h_kim Annotations can be created by humans
  29. 29. @micheleidesmith @j_h_kim Or by machines… By TedColes - Own work, CC BY-SA 4.0,
  30. 30. @micheleidesmith @micheleidesmith @j_h_kim Human curation of annotations is valuable, but hard to sustain
  31. 31. @micheleidesmith @micheleidesmith @j_h_kim
  32. 32. @micheleidesmith @micheleidesmith @j_h_kim “Text mining…,refers to the process of deriving high-quality information from text.”
  33. 33. @micheleidesmith @micheleidesmith @j_h_kim Text miners find useful information in unstructured text using algorithms (a set of rules) and build data pipelines. By W Gossett
  34. 34. @micheleidesmith @micheleidesmith @j_h_kim Scientific literature • Biological terms e.g. diseases, organisms, genes, proteins and chemicals (using ontologies). • Biological processes and functions e.g. gene- disease relationships, protein-protein interactions or gene function (from proximity of words in text and position in the article)
  35. 35. @micheleidesmith @micheleidesmith @j_h_kim RESEARCH GOAL 1: To understand how researchers and curators find literature and make decisions about what to read
  36. 36. @micheleidesmith @micheleidesmith @j_h_kim Interviews with 8 researchers and 2 curators
  37. 37. @micheleidesmith @micheleidesmith @j_h_kim Find evidence to inform research
  38. 38. @micheleidesmith @micheleidesmith @j_h_kim Skim read abstracts Look at figures Skim read results CTRL & F to find keywords in text Check for data files Prioritise what to read Researchers prioritise what they want to read, as their time is limited. They use different strategies to identify articles which are worth reading in full.
  39. 39. @micheleidesmith @micheleidesmith @j_h_kim RESEARCH GOAL 2: To find out how well the prototype worked for users
  40. 40. @micheleidesmith @micheleidesmith @j_h_kim Usability tests with 17 users, in 2 iterations
  41. 41. @micheleidesmith @micheleidesmith @j_h_kim Research questions • Do participants discover/use the feature? • How easy is it to use/navigate through annotations? • Do they trust the information? • How do they feel about inaccurate annotations? • Would they provide feedback if they had the opportunity?
  42. 42. @micheleidesmith @micheleidesmith @j_h_kim I used an Invision clickable prototype for the first few sessions
  43. 43. @micheleidesmith @micheleidesmith @j_h_kim But we got best results using a prototype with real data behind it
  44. 44. @micheleidesmith @micheleidesmith @j_h_kim Text miners and developers took turns to observe usability tests
  45. 45. @micheleidesmith @micheleidesmith @j_h_kim Issues were logged in a spreadsheet
  46. 46. @micheleidesmith @micheleidesmith @j_h_kim We worked in sprints to address UX, technical and text mining issues
  47. 47. “If it’s not specific enough, I end up with a lot of things being highlighted.”
  48. 48. @micheleidesmith @micheleidesmith @j_h_kim Granularity • Some terms appeared too frequently, or were too general to be useful e.g. “cell” or “formation”. • Participants expected us to split Gene Ontology (GO) terms into 3 separate categories e.g. Biological process, molecular function, cellular component
  49. 49. “I guess false positives automatically make me anxious about whether to believe…"
  50. 50. @micheleidesmith @micheleidesmith @j_h_kim Trust • Users lost trust in the information if there were false positives e.g. • “oxide” is not an organism, but “oxidae” is • “ubiquitin” is a process not a gene/protein
  51. 51. @micheleidesmith @micheleidesmith @j_h_kim Feedback • Machine annotations are not perfect, so we offered users a way to provide feedback:
  52. 52. “If you do disease, could you do variation? That would be a killer.”
  53. 53. @micheleidesmith @micheleidesmith @j_h_kim Annotation types • Users made suggestions for other types of annotation that would be useful to them • Our platform enables other text mining groups to provide annotations, as we don’t have capacity
  54. 54. “I thought that if it [annotations control panel] had something to tell me about those things, it would already be there”
  55. 55. @micheleidesmith @micheleidesmith @j_h_kim
  56. 56. @micheleidesmith @micheleidesmith @j_h_kim
  57. 57. @micheleidesmith @micheleidesmith @j_h_kim Discoverability • We can only show annotations on articles with a CC- BY, CC-BY-NC or CC-0 license • We can't display numbers in brackets due to the performance impact on page loading • Participants didn’t want highlights on by default • Some people claim to ignore the right column
  58. 58. @micheleidesmith @micheleidesmith @j_h_kim
  59. 59. @micheleidesmith @micheleidesmith @j_h_kim “It would look like a Christmas tree!...For me it would be quite disturbing in terms of reading”
  60. 60. “I think it’s good that you can click more than one. Because you can more easily associate proteins or genes with GO, or the organism. Which is very good. I would look for yellow close to blue or orange.”
  61. 61. @micheleidesmith @micheleidesmith @j_h_kim
  62. 62. “The details one is an extra level of clicking that’s frustrating. This [structure diagram] is great.”
  63. 63. @micheleidesmith @micheleidesmith @j_h_kim Engagement • Once annotations were highlighted in the text, participants didn’t necessarily realise they could interact with them • They expected to see something useful, which makes clicking on the annotation worthwhile
  64. 64. “Maybe I’m trying to be too lazy”. “With my curator hat on accession numbers are exciting, but I wouldn’t want to have to scroll through the article to see if there was one.” “If you click on organisms I’d expect it to expand out and see the unique items e.g. zebrafish”
  65. 65. @micheleidesmith @micheleidesmith @j_h_kim Navigation • Participants wanted to jump straight to highlighted terms in the text • They also wanted to navigate through highlights • Some expected to see a list of terms that appear in the text under the checkbox
  66. 66. @micheleidesmith @micheleidesmith @j_h_kim
  67. 67. @micheleidesmith @micheleidesmith @j_h_kim
  68. 68. @micheleidesmith @micheleidesmith @j_h_kim So did we meet our goal?
  69. 69. “I really think this is amazingly useful to have all the names of the genes highlighted because you can get a quick overview, which is much better than trying to read the text quickly.”
  70. 70. “I do like it, it’s clever! ...It makes life much faster, rather than going in and out….It makes information and searching much faster”
  71. 71. @micheleidesmith @micheleidesmith @j_h_kim It’s early days and we still have many improvements to make, but early indications are positive
  72. 72. @micheleidesmith @micheleidesmith @j_h_kim Two important lessons from working together
  73. 73. @micheleidesmith @micheleidesmith @j_h_kim 1. User research and feedback are essential for improving text mining pipelines
  74. 74. @micheleidesmith @micheleidesmith @j_h_kim 2. Compromises between technical/performance constraints and user needs are inevitable - but make decisions together
  75. 75. @micheleidesmith @j_h_kim Thank you for listening!