SlideShare a Scribd company logo
1 of 32
How taxonomies and facets
bring end-users closer to big data


                  Anna Divoli
                  @annadivoli




Boston Oct 2012
Taxonomies
• τάξις/τάξη + νομία (arrangement/class + method/rule/law)
• hierarchical classification
• formal nomenclature
• varied dimensions
• evaluation/measures/metrics
• types: manually constructed, social, auto-generated
• purposes: auto-indexing, search facilitation, navigation,
  knowledge management, organization….
• it is OK to change the classification systems to adjust to new
  knowledge – not just adding new concepts
• the data have become “big” and available but not accessible
• many “end users”
             Boston Oct 2012
User Studies Types

Specialized domain studies:
1. Facets (HCIR): Biomedical Scientists
    Anna Divoli and Alyona Medelyan
    Search interface feature evaluation in biosciences, HCIR 2011, Google, Mountain View, CA

2. Expert needs (media group)

UI preferred features studies:
3. Existing popular systems (EuroHCIR)
    Matthew Pike, Max L. Wilson, Anna Divoli and Alyona Medelyan
    CUES: Cognitive Usability Evaluation System, EuroHCIR 2012, Nijmegen, Netherlands

4. Mock ups of specific features (survey)

               Boston Oct 2012
Our studies




  1. Facets (HCIR): Biomedical Scientists
   Anna Divoli and Alyona Medelyan
   Search interface feature evaluation in biosciences, HCIR 2011, Google, Mountain View, CA




               Boston Oct 2012
Facets – favorite feature for search systems




    Anna Divoli and Alyona Medelyan, Search interface feature evaluation in
    biosciences, HCIR 2011, Google, Mountain View, CA, USA


              Boston Oct 2012
Facets (in search systems)
           animal models huntington disease




           Boston Oct 2012
Bio-Facets
      Most liked                     Least liked



  animal models huntington disease




                   Boston Oct 2012
Facets as search features for biomedical scientists: Findings

• Faceted search is the most important stand alone feature in a search
  interface for bioscientists.
• Few, query-oriented facets presented as checkboxes work best.
• Overly simple aesthetics, although not desirable, do not hurt overall
  UI score.
• Complex aesthetics turn users away from the systems.
• Bioscientists prefer tools that help them narrow their search, not
  expand it.
• For generic search: doc-based facets.
  For domain-specific search: query-based facets.



             Boston Oct 2012
Search expansions★
Facets as search feature: likes & dislikes
 br
ff
ig
                                                                                                        S
     Facetted refinement                       •       Useful categories
                           + useful categories                           + quick paper access       + “top
br                         - slow functionality•       Simple
                                                   + “reviews” category               + simple      - too
ff                         - too complex/busy
                           - too many colors
                                               •   - limited functional.
                                                       Vertical list
                                                   - poor design
                                                                                  + vertical list
                                                                              - nothing special
ig
                           Semedico               PubMed                                 Solr       Go
   Related searches
br
                           - not scientific
                           + colors           •      Too complex/busy
                                                                  + relevant
ff                         - too small
                           - too busy
                                              •      Too many colorsvariety
                                                                  - poor context
                                                                  - no
ig
                           Bing               •      Poor design PubMed
     Results preview★                         •      Limited functionality
                                              •      Too many symbols
br
ff                                            •      Not special/ Colorless
ig

     Legend
                                                                        +        positive comments
                Boston Oct 2012         positive
Our studies




  2. Expert needs (media group)




              Boston Oct 2012
Case Study: Media Group

They have a system/”taxonomy” in place that nobody
maintains or uses…

~ 10,000 articles / week, ~5 million in their archives
~ 21 years, 10,000 authors
Handful of top categories

Main reasons/uses:
- Advertisement
- Packing up stories and selling them
- Readers finding stories & related stories
- Journalists finding related stories

            Boston Oct 2012
Expert content needs - Case Study: Media Group


 Ideally update the taxonomy daily/weekly
 Must be dynamic & handle new cases/concepts
 Deep nesting is OK
 If multiple inheritance, need to disambiguate where a
  particular article belongs to
 Be able to edit (be able to verify , in case of anomalies
  based on automation & move nodes around)




            Boston Oct 2012
Our studies




  3. Existing popular systems (EuroHCIR)
   Matthew Pike, Max L. Wilson, Anna Divoli and Alyona Medelyan
   CUES: Cognitive Usability Evaluation System, EuroHCIR 2012, Nijmegen, Netherlands




               Boston Oct 2012
Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD




            Boston Oct 2012
Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD




            Boston Oct 2012
Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD




            Boston Oct 2012
Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD




            Boston Oct 2012
Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD




  A B C D E F       A B C D E F       A B C D E F   A B C D E F   A B C DE F



                                      C
                                                                          F
                                  B
                                      D
                A
                                                        E
                Boston Oct 2012
Exploring UI features (Yippy, Carrot, MeSH, ESD): likes & dislikes

      •   Menu highlighting
      •   Hierarchical folder layout
      •   Expand hierarchy with “+” and “–”
      •   Dual view (tree on left, results on right)
      •   Ability to change visualisations of taxonomy
      •   Search function is important
      •   Familiar interface with folders

      •   Too simple or too much writing - would be nice to have color
      •   Lots of scrolling
      •   Dots in carrot circle – confusing
      •   Double click on foam tree is unintuitive
      •   Too broad taxonomies


            Boston Oct 2012
Our studies




  4. Mock ups of specific features (survey)




              Boston Oct 2012
Taxonomy UI preferences (ongoing survey):
                                     The (51) participants


            Age:                                     How comfortable you are with computers?
   25 or younger                         27.3%           Somewhat                        5.5%
           26-40                         60.0%                 Very                      47.3%
           41-60                         12.7%        Second nature                      47.3%
      61 or older                        0%


    Highest level of education:                      Do you have experience using taxonomies?
       High School                        3.6%                      No                    30.9%
College/University                        52.7%    Yes, but very little                   47.3%
  Graduate School                         43.6%                    Yes                    21.8%



                                       bit.ly/pingar_taxonomies

                     Boston Oct 2012
popularity (A)    44.2%
Concept sorting             alphabetically (B)   42.3%
                               no preference     13.5%




          Boston Oct 2012
A   42.3%
Displaying Counts                       B   51.9%
                            no preference   5.8%




          Boston Oct 2012
in frames (A)    72.5%
Using Labels                 with labels (B)   23.5%
                             no preference     3.9%




           Boston Oct 2012
A   47.1%
Plus/minus signs or arrows               B   37.3%
                             no preference   15.7%




           Boston Oct 2012
A   13.7%
Search Results Display                   B   11.8%
                                         C   70.6%
                             no preference   3.9%




           Boston Oct 2012
partial   74.5%
Search Functionality                hidden    64.7%
                             no preference    2.0%




           Boston Oct 2012
Where we stand
Our team works on automatic generated taxonomies but we
realized the need for customization for specific needs




          Boston Oct 2012
Taxonomy

 “Taxonomy is described sometimes as a science and
 sometimes as an art, but really it’s a battleground.”


           Bill Bryson, A Short History of Nearly Everything




           Boston Oct 2012
T echnology
                 A rt
              a X iomatic
           phil O sophy
          desig N
               l O gic
             hu M anities
           lingu I stics
                 E thnonology
                 S cience


Boston Oct 2012
Summary

• There is a place for manually, socially and automatically
  generated taxonomies (as well as hybrids).
• Text is “big” and in many fields dynamic.
• “End-users” (not Information Management experts) need
  access to “big text”.
• Auto-generated taxonomies with manual editing facilities
  is now possible & makes sense.
• Domain specific background knowledge is vital for the
  quality and detail required per solution.
• User friendly systems are very important for end users.


           Boston Oct 2012
Acknowledgements

             Alyona Medelyan (Pingar)
             Max L. Wilson (Swansea/Nottingham)
             Matthew Pike (Swansea/Pingar)

             Pingar Brains
pingar.com   All 65+ anonymous studies participants!




         Boston Oct 2012

More Related Content

More from Anna Divoli

AI for information management: why and how
AI for information management: why and howAI for information management: why and how
AI for information management: why and howAnna Divoli
 
NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...
NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...
NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...Anna Divoli
 
"Findability and usability lessons learnt from text analytics" By: Anna Div...
"Findability and usability   lessons learnt from text analytics" By: Anna Div..."Findability and usability   lessons learnt from text analytics" By: Anna Div...
"Findability and usability lessons learnt from text analytics" By: Anna Div...Anna Divoli
 
Constructing a Focused Taxonomy from a Document Collection - ESWC 2013
Constructing a Focused Taxonomy from a Document Collection - ESWC 2013Constructing a Focused Taxonomy from a Document Collection - ESWC 2013
Constructing a Focused Taxonomy from a Document Collection - ESWC 2013Anna Divoli
 
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...Anna Divoli
 
Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...
Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...
Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...Anna Divoli
 
Divoli Presentation at EBI Apr2011 Usability Part
Divoli Presentation at EBI Apr2011 Usability PartDivoli Presentation at EBI Apr2011 Usability Part
Divoli Presentation at EBI Apr2011 Usability PartAnna Divoli
 
Ebi apr2011 usability-part
Ebi apr2011 usability-partEbi apr2011 usability-part
Ebi apr2011 usability-partAnna Divoli
 

More from Anna Divoli (8)

AI for information management: why and how
AI for information management: why and howAI for information management: why and how
AI for information management: why and how
 
NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...
NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...
NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...
 
"Findability and usability lessons learnt from text analytics" By: Anna Div...
"Findability and usability   lessons learnt from text analytics" By: Anna Div..."Findability and usability   lessons learnt from text analytics" By: Anna Div...
"Findability and usability lessons learnt from text analytics" By: Anna Div...
 
Constructing a Focused Taxonomy from a Document Collection - ESWC 2013
Constructing a Focused Taxonomy from a Document Collection - ESWC 2013Constructing a Focused Taxonomy from a Document Collection - ESWC 2013
Constructing a Focused Taxonomy from a Document Collection - ESWC 2013
 
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...
 
Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...
Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...
Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...
 
Divoli Presentation at EBI Apr2011 Usability Part
Divoli Presentation at EBI Apr2011 Usability PartDivoli Presentation at EBI Apr2011 Usability Part
Divoli Presentation at EBI Apr2011 Usability Part
 
Ebi apr2011 usability-part
Ebi apr2011 usability-partEbi apr2011 usability-part
Ebi apr2011 usability-part
 

Anna Divoli (Pingar Research) "How taxonomies and facets bring end-users closer to big data" TAW2012

  • 1. How taxonomies and facets bring end-users closer to big data Anna Divoli @annadivoli Boston Oct 2012
  • 2. Taxonomies • τάξις/τάξη + νομία (arrangement/class + method/rule/law) • hierarchical classification • formal nomenclature • varied dimensions • evaluation/measures/metrics • types: manually constructed, social, auto-generated • purposes: auto-indexing, search facilitation, navigation, knowledge management, organization…. • it is OK to change the classification systems to adjust to new knowledge – not just adding new concepts • the data have become “big” and available but not accessible • many “end users” Boston Oct 2012
  • 3. User Studies Types Specialized domain studies: 1. Facets (HCIR): Biomedical Scientists Anna Divoli and Alyona Medelyan Search interface feature evaluation in biosciences, HCIR 2011, Google, Mountain View, CA 2. Expert needs (media group) UI preferred features studies: 3. Existing popular systems (EuroHCIR) Matthew Pike, Max L. Wilson, Anna Divoli and Alyona Medelyan CUES: Cognitive Usability Evaluation System, EuroHCIR 2012, Nijmegen, Netherlands 4. Mock ups of specific features (survey) Boston Oct 2012
  • 4. Our studies 1. Facets (HCIR): Biomedical Scientists Anna Divoli and Alyona Medelyan Search interface feature evaluation in biosciences, HCIR 2011, Google, Mountain View, CA Boston Oct 2012
  • 5. Facets – favorite feature for search systems Anna Divoli and Alyona Medelyan, Search interface feature evaluation in biosciences, HCIR 2011, Google, Mountain View, CA, USA Boston Oct 2012
  • 6. Facets (in search systems) animal models huntington disease Boston Oct 2012
  • 7. Bio-Facets Most liked Least liked animal models huntington disease Boston Oct 2012
  • 8. Facets as search features for biomedical scientists: Findings • Faceted search is the most important stand alone feature in a search interface for bioscientists. • Few, query-oriented facets presented as checkboxes work best. • Overly simple aesthetics, although not desirable, do not hurt overall UI score. • Complex aesthetics turn users away from the systems. • Bioscientists prefer tools that help them narrow their search, not expand it. • For generic search: doc-based facets. For domain-specific search: query-based facets. Boston Oct 2012
  • 9. Search expansions★ Facets as search feature: likes & dislikes br ff ig S Facetted refinement • Useful categories + useful categories + quick paper access + “top br - slow functionality• Simple + “reviews” category + simple - too ff - too complex/busy - too many colors • - limited functional. Vertical list - poor design + vertical list - nothing special ig Semedico PubMed Solr Go Related searches br - not scientific + colors • Too complex/busy + relevant ff - too small - too busy • Too many colorsvariety - poor context - no ig Bing • Poor design PubMed Results preview★ • Limited functionality • Too many symbols br ff • Not special/ Colorless ig Legend + positive comments Boston Oct 2012 positive
  • 10. Our studies 2. Expert needs (media group) Boston Oct 2012
  • 11. Case Study: Media Group They have a system/”taxonomy” in place that nobody maintains or uses… ~ 10,000 articles / week, ~5 million in their archives ~ 21 years, 10,000 authors Handful of top categories Main reasons/uses: - Advertisement - Packing up stories and selling them - Readers finding stories & related stories - Journalists finding related stories Boston Oct 2012
  • 12. Expert content needs - Case Study: Media Group  Ideally update the taxonomy daily/weekly  Must be dynamic & handle new cases/concepts  Deep nesting is OK  If multiple inheritance, need to disambiguate where a particular article belongs to  Be able to edit (be able to verify , in case of anomalies based on automation & move nodes around) Boston Oct 2012
  • 13. Our studies 3. Existing popular systems (EuroHCIR) Matthew Pike, Max L. Wilson, Anna Divoli and Alyona Medelyan CUES: Cognitive Usability Evaluation System, EuroHCIR 2012, Nijmegen, Netherlands Boston Oct 2012
  • 14. Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD Boston Oct 2012
  • 15. Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD Boston Oct 2012
  • 16. Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD Boston Oct 2012
  • 17. Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD Boston Oct 2012
  • 18. Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD A B C D E F A B C D E F A B C D E F A B C D E F A B C DE F C F B D A E Boston Oct 2012
  • 19. Exploring UI features (Yippy, Carrot, MeSH, ESD): likes & dislikes • Menu highlighting • Hierarchical folder layout • Expand hierarchy with “+” and “–” • Dual view (tree on left, results on right) • Ability to change visualisations of taxonomy • Search function is important • Familiar interface with folders • Too simple or too much writing - would be nice to have color • Lots of scrolling • Dots in carrot circle – confusing • Double click on foam tree is unintuitive • Too broad taxonomies Boston Oct 2012
  • 20. Our studies 4. Mock ups of specific features (survey) Boston Oct 2012
  • 21. Taxonomy UI preferences (ongoing survey): The (51) participants Age: How comfortable you are with computers? 25 or younger 27.3% Somewhat 5.5% 26-40 60.0% Very 47.3% 41-60 12.7% Second nature 47.3% 61 or older 0% Highest level of education: Do you have experience using taxonomies? High School 3.6% No 30.9% College/University 52.7% Yes, but very little 47.3% Graduate School 43.6% Yes 21.8% bit.ly/pingar_taxonomies Boston Oct 2012
  • 22. popularity (A) 44.2% Concept sorting alphabetically (B) 42.3% no preference 13.5% Boston Oct 2012
  • 23. A 42.3% Displaying Counts B 51.9% no preference 5.8% Boston Oct 2012
  • 24. in frames (A) 72.5% Using Labels with labels (B) 23.5% no preference 3.9% Boston Oct 2012
  • 25. A 47.1% Plus/minus signs or arrows B 37.3% no preference 15.7% Boston Oct 2012
  • 26. A 13.7% Search Results Display B 11.8% C 70.6% no preference 3.9% Boston Oct 2012
  • 27. partial 74.5% Search Functionality hidden 64.7% no preference 2.0% Boston Oct 2012
  • 28. Where we stand Our team works on automatic generated taxonomies but we realized the need for customization for specific needs Boston Oct 2012
  • 29. Taxonomy “Taxonomy is described sometimes as a science and sometimes as an art, but really it’s a battleground.” Bill Bryson, A Short History of Nearly Everything Boston Oct 2012
  • 30. T echnology A rt a X iomatic phil O sophy desig N l O gic hu M anities lingu I stics E thnonology S cience Boston Oct 2012
  • 31. Summary • There is a place for manually, socially and automatically generated taxonomies (as well as hybrids). • Text is “big” and in many fields dynamic. • “End-users” (not Information Management experts) need access to “big text”. • Auto-generated taxonomies with manual editing facilities is now possible & makes sense. • Domain specific background knowledge is vital for the quality and detail required per solution. • User friendly systems are very important for end users. Boston Oct 2012
  • 32. Acknowledgements Alyona Medelyan (Pingar) Max L. Wilson (Swansea/Nottingham) Matthew Pike (Swansea/Pingar) Pingar Brains pingar.com All 65+ anonymous studies participants! Boston Oct 2012

Editor's Notes

  1. Based on our current knowledge, experience and the results of our user studies the direction our research team is taking