Testing Taxonomies: Beyond Card Sorting

@albertatrebla | @saturdave
Testing Taxonomies: Beyond Card Sorting
Alberta Soranzo & Dave Cooksey
IA Summit, Minneapolis
Friday, April 24th 2015
CC image: flickr.com/photos/cannedtuna/1423599488/
NOTE: The text provided here are speaker notes and not a complete transcript

Today’s Presentation
Taxonomy Is Not IA
Traditional Approaches
Testing Methods
Case Studies
Q&A
Taxonomy Is Not IA
Traditional Approaches to Testing
Testing Methods
Case Studies
Q&A

Your Hosts
Dave Cooksey
UX consultant based in Philadelphia with 10 years
experience in information architecture & taxonomy.
Alberta Soranzo
UX director based in London, focused on making the
complex simple for over 20 years.
@saturdave
@albertatrebla
AS: I am Alberta Soranzo. Like the slide says, I’ve been doing IA work since the dinosaurs were roaming the earth, and I worked with Mad Men. That means that I really did start
in an advertising agency, working with technology clients, went onto working on large builds for the UC system and the CDC, and back to agency in the UK.
DC: And I’m Dave Cooksey. I’ve been doing taxonomy and IA work for about 10 years now. I grew up working in e-commerce, which was great because it gave me practical
experience in taxonomy through the design and testing of large online product catalogs.

Brands We’ve Worked on
DC: Here are some of the brands we’ve worked on during our careers, which span various domains and industries.
AS: All of these projects included a significant focus on taxonomy. And about half included direct testing with users.

Our Goal
Demonstrate useful testing methods
…and clarify
taxonomy versus IA
evaluation versus testing
DC: So our goal for today’s session is to demonstrate really useful testing techniques that are not just card sorts.
But to begin our discussion today, we want to clarify a couple of points. One, that taxonomy and IA are not the same thing and how this impacts study design and execution.
And two, the difference between evaluating taxonomies and testing them.

Taxonomy ≠ IA
image: http://commons.wikimedia.org/
DC: So, let’s start with that first point about taxonomy and information architecture…
Often, folks in and out of user experience talk about taxonomy as if it is a practice within IA—you’ll hear them speak about taxonomy as if it is just navigation or search.
But as practitioners who craft taxonomies, we need to keep in mind that taxonomy and information architecture are not the same. While both are concerned with organizing
things, as practices they are more different than alike.
Taxonomy is concerned with the practices of organization both physical and digital and governance processes. Information architecture is concerned with the organization and
use of informational spaces and UI and interaction design.
Where they converge, however, is around organizing information to support user needs. And measuring this sweet spot is what we’re going to talk about today.

Taxonomy Is an Expert Activity
DC: But why do we need to test taxonomies with real users in the first place? If we have done our homework by researching the subject domain through interviewing key
stakeholders, and applying best practices for information organization, won’t our taxonomic structures be user-friendly?
Maybe, maybe not.
And this relates to the second point we wanted to clarify today, the difference between taxonomy evaluation and testing… All the techniques that we do to research a domain
when crafting a taxonomy are evaluation techniques: search log analysis, Web analytics review, competitive analysis, review of organizing standards, subject matter expert
interviews, and so on. These are techniques that review the taxonomy’s terms and structure from the centric position of the taxonomist. But these activities do not tell us how
the actual users of the system will experience the taxonomy. Thus the need for testing with real users.

Photo taken from Dave Daring (http://dave-daring.deviantart.com/art/Leonard-Nimoy-Spock-384975793) on November 5, 2014.
DC: When working with clients, I often try to frame our discussions about taxonomy by referencing Mr. Spock, which usually gets a laugh but once and a while, a client will say,
“ahah—got it.” I refer to Mr. Spock to make the point that taxonomy in a user experience project is all about logic. And not in the sense of being logical, but in figuring out
the abstract logic of how something can be put together. We single out concepts driving an experience and the terms we use to describe these concepts.
Now, these discussions can be very abstract and difficult for stakeholders. So we often discuss them using practical ways, such as how content can be retrieved by users through
navigation and search. But this muddies the waters because we are mixing taxonomy and IA together as if they are the same thing. So we need to keep in mind that when we do
this we are really talking about two things—taxonomy terms and the information architecture that manifests the terms for users.
The same goes for testing taxonomy. When you are testing with users, you need to be clear on whether you are vetting concepts and terms of the taxonomy or their
implementation in the UI. Most of the time, you’re looking at both. It’s your responsibility to make sure that this is clear to stakeholders. It’s also your responsibility to make
sure that this distinction is kept in mind when designing a study. Because if you don’t, your study will be confusing to participants and will not deliver the helpful results you
need.

IATaxonomy
terms ☛ relationships
controlled vocabulary
hierarchies
value-attribute pairs
search
navigation
dynamic content
UI & interactions
DC: If we compare the focus areas of taxonomy and IA in the simple terms, we see the following.
Taxonomy is concerned with identifying terms to describe concepts, which it them organizes into relationships.
The terms and relationships then are used to build controlled vocabularies, hierarchies, and attribution metadata.
We could say that IA builds on taxonomy. The terms identified by taxonomy is used to drives experience through its UI and interactions. So, the taxonomy comes to life
through search, navigation, and content.

Photo taken from Rob Boudon’s photo stream on Flickr (http://tinyurl.com/lgf6t4v) on November 5, 2014.
There is no magical method.
And before we start talking about methods, let me state this now: There is no one magical method for testing taxonomies—no one method works in all cases or is superior to
others.
It all depends on what question you are trying to answer.
When planning a study, steer conversation around the question(s) you are trying to answer. Once you have agreement on that, you can then go to your methodological toolbox.
And a piece of advice on answering questions with research: if you are able, use multiple methods to get at that question from different angles. Alberta will talk about this in a
moment.

Testing Is Not Difficult image: http://commons.wikimedia.org/
AS: First of all let’s make a distinction between evaluating and testing taxonomies. Like “The Accidental Taxonomist” wisely states, the evaluation of a taxonomy by a
taxonomist is needed when a taxonomy is created by non-taxonomists (such as by subject-matter experts instead).
Testing of a taxonomy, on the other hand, like Dave said, is recommended in all cases, no matter who created the taxonomy, to ensure the finished product is usable and logical.
And testing isn’t difficult, but different tests are for different purposes and fit into different stages of the taxonomy process. An inappropriate test or inappropriately timed test
can be a waste of time and money.
Taxonomy testing allows you test how topics are organized — independently of everything else.

When to Test
AS: Taxonomy testing allows you to validate your site architecture before you get into design.
It’s important to test and validate during every step of a big project like a website redesign. Once you’re done with taxonomy testing and have a final site map, you can move into
wireframing and design with confidence, because you know there are no information architecture issues you’ll have to compensate for.
If you skip taxonomy testing and wait until prototype testing, how will you know if users have issues because of a faulty layout/design, or because pages aren’t located
where they would expect to find them?
If there’s anything we should have all learned in middle school, it’s to test one thing at a time.

What do we look at?
• Search logs
• Analytics
• Google adword suggestions
• Comparative analysis
• Keywords research
• User research and vernacular
• Existing taxonomies
AS: This is what we look at when we do evaluations, and why:
• Search logs — to understand what words visitors use, what they’re looking for and alternative spelling/typos.
• Analytics — to highlight paths, understand what seems to work and what doesn’t, where users may get lost, and how the site is seen by search engines.
• Keywords — to understand and expand the cognitive domain.
• Google Adword suggestions — in order to make our content findable, the more variations per concept we can provide, the easier to discover our content will be
• Comparative analysis — to understand the language competing sites use and gather a better domain expertise over the site’s topic.
• User research and vernacular — real humans will use our product, and they will find it and engage with it using their own language and not necessarily the stakeholder’s.
Understanding what terms will be used to get to the site and to navigate it, will allow use to create controlled vocabularies and tables of synonyms.
• Existing taxonomies — this should go without saying.

Traditional Card Sorting
AS: Let’s take a quick look at card sorting. Card sorting is a method used to help design or evaluate the information architecture of a site. In a card sorting session,
participants organize topics into categories that make sense to them and they may also help you label these groups. To conduct a card sort, you can use actual cards, pieces of
paper, or one of several online card-sorting software tools. I tend to use Optimal Workshop’s Optimalsort because of its ease of use in running both online and offline sorts, and
because of the analysis it provides.

Card Sorting - the basics
• Planning
• Preparing the cards
• Sorting
• Analysis
AS: We said we would go beyond card sorting, but in order to do so, we’d like to quickly show you how card sorting is done.
A typical card sorting exercise is conducted in 4 stages:
• Planning — the decision of what to test (for are sites we recommend testing one section at the time) and how to recruit participants.
• Preparing the cards — write one concept/term per card. If an explanation is needed, it can be included, or printed on the back of the card.
• Sorting — I’ll get to the different types of card sorts in a minute, but the idea is that participants organize cards/concepts into logical group that make sense to them..
• Analysis — the results are collected and analyzed. This can be done manually (using Joe Lamantia’s spreadsheet will help you greatly, you can find a link in the resources
section of the slide deck) or automatically by a software.

Open Card Sorting
Mufﬁn
Coffee
Milk
Water
Catering
Sandwich
AS: In an open card sort, participants create their own names for the categories.
This helps reveal not only how they mentally classify the cards, but also what terms they use for the categories.
There are no preset categories and participants get to create their own labels/groups.
Open sorting is generative; it is typically used to discover patterns in how participants classify concepts, which in turn helps generate ideas for organizing information.

Dessert
Open Card Sorting
Mufﬁn
Food
Sandwich
Drinks
Coffee
Milk
Water
Services
Catering
Mufﬁn
Coffee
Milk
Water
Catering
Sandwich
AS: An open card sort looks a bit like this:
On the left, the participants are presented with a group of cards in random order.
The participants organize them into groups that they create (shown on the right) and give a name of their choosing to the groups.

Closed Card Sorting
Mufﬁn
Coffee
Milk
Water
Catering
Sandwich
Menu Beverages For the ofﬁce
AS: In a closed card sort, participants are provided with a predetermined set of category names. They then assign the index cards to these fixed categories.
This helps reveal the degree to which the participants agree on which cards belong under each category.
Closed sorting is evaluative; it is typically used to judge whether a given set of category names provides an effective way to organize a given collection of content.
The categories/groups are provided and participants simply distribute the cards among the predefined groups.

Menu Beverages For the office
Closed Card Sorting
Muffin Coffee
Milk
Water
Catering
Sandwich
Muffin
Coffee
Milk
Water
Catering
Sandwich
AS: An closed card sort looks a bit like this:
The participants move them into groups (shown on the right) that are predefined.
One of the things that may happen is that you end with a bunch of unsorted cards that users don’t know where to place, especially if the group names reflect very technical or
proprietary language.
I particularly like to use this type of sort to evaluate existing structures, and I found this method very helpful in showing clients the differences between their own mental model
and that of their users. It’s the old tension between business jargon and natural language.

Hybrid Card Sorting
Mufﬁn
Coffee
Milk
Water
Catering
Sandwich
Nibbles
AS: In a hybrid card sort, one or more groups are provided, and the other groups are created by the participants.
This type of test is helpful in situations where certain categories are non-negotiable because of regulatory constraints or other reasons.

Hot drinks
Nibbles Cold drinks What we do
Hybrid Card Sorting
Mufﬁn Catering
Sandwich
Milk
Water
Coffee
Mufﬁn
Coffee
Milk
Water
Catering
Sandwich
AS: An hybrid card sort looks a bit like this:
The participants move them into the group or groups (shown on the right) that are predetermined, and create new groups of their choosing which they also get to name.
How do we analyze this data, and what do the results of this test mean? We’ll get to that in a moment.

Why go beyond?
image: http://commons.wikimedia.org/
AS: Card sorting gives us an insight into how people think and allows us to classify and organize information into logical structure. But, as the title of this talk states, we want to
move beyond card sorting. Why?
Because people!
People are different, and they look for information in different ways, and it is our duty and responsibility to test and make sure we’ve covered all the bases.
No user interacts with a taxonomy in a vacuum, we want to look at taxonomies as conversations and how people experience them in context from the perspective of
information architecture.

Delphi-method Card Sorting
DC: Okay. Let’s kick off our discussion of methods with Delphi-method card sorting, which is by far my favorite method of testing taxonomies.
And why do I say that? Because Delphi gives you an entire taxonomy vetted by users ready to go after 1 day of testing. No sitting down and analyzing session notes or
comparing cards orderings. And you will get rich, descriptive data on how users think about the taxonomy terms and structure—that’s the logic we discussed earlier.
Now, I’m sure most of you are familiar with card sorting techniques. Delphi-method card sorting is based on the same principles with a few really important tweaks that result in
1. Focused testing sessions
2. Lower costs in terms of time and money

• Hierarchy laid out in cards
• 8 - 10 participants one after the other go through cards and
modify as they see ﬁt
• Participants can add, delete, move, and re-label cards
• Watch and interview for details
• Also test navigation schema, facets, images for labeling
DC: Delphi-method card sorting was introduced by Professor Kathryn Summers and Celeste Lyn Paul. Celeste, incidentally, gave a talk on Delphi-method card sorting here at the
IA Summit in 2007. Why it is not used by every IA practitioner out there, I do not know. Srsly. Try this once and you will be in love.
So, how do we do it?
- You lay out the hierarchy in index cards. We’re testing is for both categorization (placement of cards) and labels (terms on the cards)
- Perform with 8 - 10 participants who work through the deck cards one participant after another until the hierarchy and labels “stabilize”.
By stabilize we mean that after a while you will have noticed the parts of the structure that most participants accept and you will see the specific areas that there was not an
overall consensus on. In e-commerce, this generally means areas where polyhierarchy (or cross-merchandising) is needed. And you’ll see which labels are problematic.
- Participants modify the deck by adding new cards, deleting cards (flipping them over), moving cards (flipping over one and adding a duplicate card in another spot on the
table), and re-labeling cards (flipping over the existing card and placing a new card one top of it).
- Watch and learn – Interview for detail into why participants placed things where they did and why they labeled the cards the way they did
- Can also present navigation schema, images for labeling, filters or facets to get more feedback
That’s it. At the end of a testing day, you’ll have a tested taxonomy ready to go.
Record with your iPhone or digital camera—simple point the camera at the table so you can see the cards) you can check details later if you need. But your written notes should
suffice.

1. Decide whether to seed the deck
2. Put categories on cards
3. Interview participant
4. Explain exercise
5. Allow participant to organize and comment
6. Watch and ask questions
7. Repeat with participants until you are satisﬁed
8. Analyze data (optional)
DC: So, step by step, how do we do Delphi-method card sorting?
1. Decide to seed deck or let first user create the seed. I recommend seeding the deck yourself, especially if you have built and evaluated the taxonomy. If you do go with letting
the first participant seed the deck, be very careful. That person will set the stage for all the following participants.
Also, think of other pertinent test points: navigation, facets, images for labeling. These are effective ways of letting participants understand the taxonomy. Remember,
taxonomy is difficult for most people given how abstract it is. Concrete examples from the collection of items begin classified will help them think through their decisions.
2. Put categories on cards Go through and write down all the categories or have the first participant do so. If you are seeding the card deck, printing the cards is an easy way for
you to distinguish the seed from participant added cards. Representative item images are optional.
3. Perform an initial interview so you understand the background of the participant and any of her particular needs and wants. This will help you in interpreting her comments.
You can also discuss the participant’s experience with the brand. This will help get her in the frame of mind to discuss the taxonomy in relation to that brand.
4. Then explain the exercise. Most people get this right off the bat. You may have to help some participants with when to turn over a card, when to write a new one, and so on.
But this is okay and will actually allow you to build up a rapport with them. But be careful not to explain too much, especially if the participant wants explanations to what labels
mean or why some things are in one place and not another.
5. Allow the participant to organize the deck and comment. Let the participant work through at her own pace and style. Most people will go from the left to the right, top to
bottom in our culture being that is the way we read. But a few may want to jump around. Keep track of areas not addressed in your notes so you can make sure that the
participant covers all the categories during the session.
6. Watch and interview. While the session is underway, encourage dialogue so you get the rich, descriptive detail in what the participant is thinking. The main strength of a
qualitative method like this is the ability to get inside the mind of the user by allowing participants to explain their thought processes and mental models. This kind of
information is invaluable to your process as an IA.
7. Repeat with participants until you are satisfied. Typically you will need 8 - 10 sessions before you see the same issues popping up over and over. There will certainly be
some items that there were strong opposing opinions on. Note them as they will be important in finalizing the taxonomy. Most of the time, you will not be surprised by these,
especially if testing is following an evaluation phase.
8. And finally, if you need to, analyze the data. This might be necessary if you need detail on metal models or preferred terms. But for the most part, at the end of the day, no
analysis is needed because the cards themselves and their arrangement will be all the data you need.

Usability Studies
DC: Okay, let’s talk about usability testing.
While Delphi-method card sorting is a good way to get at understanding how users interpret a taxonomy, usability studies are excellent at measuring how well they interact with
the implementation of a taxonomy in a UI. And though they are interacting with a representation of a system and not the terms and hierarchy directly, this is a very good way
to determine if your taxonomy is effectively supporting the user experience and vice versa.

• Piggyback existing study or create new study
• Interactive UI (testing IA not taxonomy)
• Step-by-step tasks
• Interaction of terms in system
• Demonstrates value of taxonomy and reassures the team
Usability Studies
DC: Now when it comes to usability testing, you’re not creating a taxonomy test per se. Often what you’re doing is piggyback on user acceptance testing that is already being
done. By inserting a few questions that are related to wayfinding or search, you can rest assured that your taxonomy design is what it needs to be to support user needs.
Now, having said that, you can also create a usability test just for testing the taxonomy, which is something I have done for e-commerce projects. In both cases, your test will
focus on the implementation of the taxonomy in the UI, not specifically on taxonomy terms. So you’ll build a UI to represent the taxonomy and put it in front of users. I’ve
done this with clickable PDFs. You can use HTML prototypes, paper prototypes, or start with existing systems to inform your taxonomy design.
What you’ll want to do is have the user perform a task step-by-step, which will give you the granularity you need to test out the taxonomy’s terms in the context of the system.
You can do this with tasks associated to navigating, searching, or tagging content.
What the tests will give you is both an understanding of how users think about specific terms used in the taxonomy but within the context of the system, which is difficult to
do with having users only look at terms on index cards.
And a side benefit to testing taxonomy during usability testing is that it explicitly demonstrates the value of taxonomy to your stakeholders. As we all know, taxonomy is
abstract and often difficult for folks to understand. So a usability test will make concrete all that hard work you’ve been doing and show how it is essential for the user
experience.
It also reassures the business and the design team that we are all headed in the right direction with the design. And piece of mind is something we all could use more of.

Piggybacking Existing Study
1. Identify goals
2. Craft tasks
3. Attend the sessions and ask questions
4. Explain to stakeholders what you are looking for
5. Offer to create a summary
Usability Studies
DC: So, how do you do a usability study for taxonomy?
1. If someone else is testing, simply explain your goals and explain what questions you are looking to answer form the test.
2. Offer example tasks to be inserted into the study. Remember to keep your tasks simple. You want to make sure that you are allowing the user to explore the interface and
do things the way she would. Making the tasks too detailed will prohibit natural exploration and will influence the results of the test.
Which tasks should you pick? Pick tasks that are most important from a business perspective to ensure you are getting the most significant results for the test.
And also make sure to test areas that you have questions about. These generally will line of with the areas important to the business.
3. Attend the study sessions in person and take notes.
Pay attention to when users discuss their understanding of concepts and terms and ask follow ups questions during Q&A. This will help you understand their mental models
and will help you choose preferred and alternate terms for the taxonomy.
4. During the tests, take the opportunity to show to other stakeholders how the data is driving the experience. This is an invaluable opportunity given that what we do as
practitioners of taxonomy is not always obvious to others on our teams or to clients. But it is still the bedrock of the user experience and very important.
5. And finally, offer to create an overview for the study organizer if you are piggybacking someone else’s study. You need to create an overview for yourself anyway so why
not share what you’ve done and ease the burden of someone else on the team?

Creating a New Study
1. Create an interactive UI (no lorem ipsum)
2. Craft simple tasks
3. Run the sessions
4. Explain to stakeholders what you are looking for
5. Create and circulate a summary
Usability Studies
DC: So, how do you craft a usability study for testing a taxonomy?
1. First, you’ll need to have an artifact that represents the translation of the taxonomy terms into a UI. If part of the project is the creation of wireframes, you can easily create
a clickable PDF or whip up an HTML prototype for the study using them.
One caveat: unlike general usability tests, lorem ipsum and other placeholder copy that is not actual content will cause problems with the test and will dilute the reliability of the
results. Users need real terms and real copy in order to make sense of the UI and to give you good feedback on the design of the taxonomy. So make sure you plan for the
creation of real copy where it is needed on the page and real terms in navigation, search, and other areas that are built on the taxonomy.
2. Just as you would if someone else were running the tests, create simple tasks that focus on areas that are most important from a business perspective or that you have
questions about.
3. Run the sessions and make sure to probe into the user’s understanding of concepts and terms in order to understand their mental models. You’ll want that detail to interpret
the results of how the participants fared using the UI.
4. and 5. Again, show to other stakeholders how the data is driving the experience and create a summary that explains what was discovered in the sessions. Focus on explaining
how the UI is an implementation of the taxonomy and how it fared under testing. Also be sure to talk about how participants thought of the taxonomy terms and structure in
the context of the user experience.

Other Tests with Optimal Workshop
image: participatorypolitics.org/
AS: As I mentioned earlier, when I need to conduct online tests, I like to use the Optimal Workshop suite of products, because they are easy to use and provide a very detailed
level of analysis quickly and in format that’s interactive and easy to understand.
If you wonder if this section is going to turn into some kind of infomercial, fear not. The lovely folks at Optimal Workshop don’t even know about this talk.
In the next few minutes, I will show you how to conduct:
a click path test
a Treejack test
and a Mixed Card Sort
and I will guide you through the results and insights you can gain from these activities.

Click Path Studies
• Get reliable metrics for benchmarking
• Deﬁne a click path for analysis
• Look for the drops to discover the cause
• Correct and analyze results
AS: A click path (clickstream) is the sequence of hyperlinks one or more website visitors follows on a given site, presented in the order viewed. A visitor's click path may start
within the site or elsewhere, often from search engine results page or a link that someone forwarded, and it continues as a sequence of successive webpages visited by the user.
Click paths take call data and can match it to ad sources, keywords, and/or referring domains, in order to capture data.
I actually prefer not to conduct pure click path studies in isolation because if as, it is bound to happen, it turns out that visitors don’t follow the path that WE want them to
follow, then it’s back to the drawing board to redesigning the site structure/architecture in order to adapt it to what we think it should be.
Or, worse, it may mean that we spend hours to “understand” what they were thinking when they clicked on a specific button or went to a specific page, which cannot be
determined by simply looking at a path.
The issue with this kind of test is that we cannot tell what a user was thinking when they were clicking on a series of links. But, a click path test in conjunction with other kind of
tests can give us further insight into the issues with our structure, so please hang on.

Treejack testing
• Create navigation tree
• Set up tasks
• Analyze results
AS: Treejack is sometime called a reverse card sort.
In a reverse card sort, tree jack or card-based classification test, an existing structure of categories and sub-categories is tested.
Users are given specific tasks and are asked to complete them navigating a collection of cards. Each card contains the names of subcategories related to a category, and the user
should find the card most relevant to the given task starting from the main card with the top-level categories.
This ensures that the structure is evaluated in isolation, and that the effects of navigational aids and visual design, to name a few, are neutralized.
Again, this is a evaluative type of test and it is used to judge whether a predetermined hierarchy is conducive to finding information efficiently.

TreejackAS: I am about to show you a video that will illustrate how a Treejack test works, using Optimal Workshop.
The test asks for an email address (when you set up the test you can decide whether you’d like to share result with your participants, so having email addresses is important) or
you may want to compensate them, and you will need the addresses to verify whether people have completed the test.
The users are then shown with instructions on how to complete the test and then presented with tasks — using clear and concise language when defining a task is crucial as you
don’t want to be (mis)leading the users.
In this test, which was created specifically for this talk, I simulate the activities of a user who sometimes is able to retrieve information easily, and other times takes a wrong turn,
so to speak.
The user was asked to complete three tasks, which entailed indicating where they would find a specific piece of information needed to complete the task (i.e. finding information
on how to contact a fictional business.)

AS: I spoke earlier of the level of analysis and the results we can expect to get from our research. Let’s take a look.
Optimal Workshop provides all sorts of insights, beginning with a task analysis.
This page shows a breakdown of success, directness, time taken, and an overall score calculated for each of the tasks set. You can also drill down into each specific task. For
example, for question number 1 — which has 1 possible correct destination points — you can see all people found a correct answer directly, 1 person found it indirectly (had to
backtrack), and 0 people failed.
And remember when I said that I didn’t find click path studies helpful in isolation? If I click on the button that says “View Pietree”…

AS: I can get to my click path. And here we can see an exact breakdown of how that user that got the right destination navigated down the wrong path and then went back to the
right one. However, what a click path doesn’t tell me is why the users went down the wrong path, just that they did.
What we can learn from this result is that some of our labels/categories aren’t very clear and we may want to revise them.
If you want to dig even deeper, you can also explore the pietree for each question, which illustrates the exact paths your users took.

AS: This page shows which branches were clicked first for each task, and what percentage of participants did so.
We can also compare the percentage of first clicks on a branch with the total percentage of participants who visited that branch at any point during the task.
If you like numbers, this is a neat view into what kind of terminology users seem to relate to.

AS: This table visualizes the paths your participants took for each task in a linear way. You can see where they went wrong or skipped the task entirely.
Interestingly, you can use this table to get a sense of what your breadcrumb trail might look like and understand where users might get confused, but not why. In this case, was
user no. 5 thinking of ordering baked goods when they clicked on “Bakery” first? This is the kind of questions that only a moderated test (or thorough follow-up) can answer.

AS: And finally, you can view a breakdown of your results in a comprehensive table that displays where all users ended up (the site map is along the left, and the task number is
along the top).
If any boxes are marked in red, you should probably explore why that is and whether it necessitates a change.

Mixed card sortAS: The video I am abut to show, is a simulation of a a mixed card sort conducted using Optimal Workshop’s Optimalsort.
This is a real survey that Jessica DuVerneay (one of this year’s co-chairs) and I launched last year in preparation for a workshop on taxonomy that we hosted at last year’s IA
Summit.
The survey was for a fictional travel business that sold vacation packages to fictional worlds and related services.
Like Treejack, the test begins asking for an email address (when you set up the test you can decide whether you’d like to share result with your participants, so having email
addresses is important) or you may want to compensate them, and you will need the addresses to verify whether people have completed the test.
The users were then shown with instructions on how to complete the test and then presented with the cards they had to organize into predetermined groups or into new ones of
their making — because the topic was rather specific, most of the cards contained instructions, which were accessible by moving the mouse over the blue triangle in each cards
upper right corner.
In this video I simulate the activities of a user who sometimes is able to move the cards easily and without hesitation, and other times struggles a bit.
I didn’t complete the entire survey because it was very long (about 100 cards), but this should be sufficient to understand how the test works.
One of the things that I really like about this product is that the cards are printable — so, if I wanted to conduct a moderated session, I could print my cards, ask a user to sort
them and then I could capture the results of the test by scanning the unique barcode (which is printed on each card) back into Optimalsort, effectively collating the results of
moderated and unmoderated sessions. Neat, eh?

AS: But let’s look at the results. I find the overview page to be one of the most important pages in the results section.
It tells us how long users took completing the card sort, and it allows us to, in a way, test the test. After the test is launched, it’s worth coming back to this page often to
understand whether it’s taking participants too long a time to complete the survey and whether it may make sense to revise the test at all.
The page also tells us what percentage of participants completed the survey (in this case 68% for a 100-card, uncompensated survey) and gives us another measure of what I call
“participant fatigue”, or failure to design a sensible experiment.

AS: Next in the analysis is a view of the categories the cards were organized into.
I can expand each category to see exactly what cards were placed into it and how many times, but I can also choose to standardize my categories and group together labels with
(perhaps) different spelling, or similar concepts (i.e. “About” and “About Us”) to make data analysis simpler.
To standardize a set of categories, I selected each label containing the word “About” (or “About Us” or similar) and then clicked on the 'Standardize selected categories' button
above the table. I can choose to rename the new category anything I want or pick one of the most commonly used labels.
The page will reload, and the separate categories will now be treated as one category. Clicking on the + next to the new category “About" will show you which cards were sorted
into this category, and the number of participants out of the 4 that did this (in brackets next to the card name). You'll also see the original category labels.
Optimal Workshop recommends assessing standardized categories to make sure they give valid data, and to check for any cards that seem out of place.
The “Agreement” column of the categories table gives a figure that represents the agreement between participants on the cards that belong in that category. A perfect
agreement score is 1.0 (and will be the score if a category has only been created by one person.) In the example above, the agreement score for the standardized category
“About” is 0.38.
When you see an agreement score this low, participants are probably thinking about the categories in different ways, or may have placed cards in categories that don't seem to
make sense.

AS: The OptimalSort dendrograms analyze participant responses and provide an interpretation of how cards were categorized together. Dendrogram data can help you create a
new IA because they show you how much agreement there was between participants about one potential information architecture.
You'll see dendrograms on both open and hybrid card sorts. You won't see a dendrogram on a closed card sort because closed sorts generally test an established IA, rather than
aim to create a new one.
The two dendrograms you'll see contain data generated by two different algorithms:
The Actual Agreement Method (AAM) depicts only factual relationships, and provides the most useful data if over 30 participants have completed your survey. The scores tell
you that 'X% of participants agree with this grouping.'
The Best Merge Method (BMM) makes assumptions about larger clusters based on individual pair relationships, and provide the most useful data if your survey has fewer than
30 participants. The scores tell you that 'X% of participants agree with parts of this grouping.’ BMM's ability to compromise and extrapolate helps you squeeze the most out of
small or incomplete responses.

AS: A similarity matrix shows how many participants agree with each pair combination of cards, and groups related clusters together. A pair is considered stronger the more
participants agree with it.
It clusters related pairs together by finding the strongest pair, grouping them with the next strongest pair that either of those cards have, then repeats the process for that new
pair. This way, clusters of cards that are strongly related to each other appear together on the matrix.
The similarity matrix is a simple representation of pair combinations, and you can get useful insights quickly. The algorthim attempts to cluster similar cards down the right
edge, so at a glance you can see which cards relate to each other. Clusters are presented in the same shade of blue.
You can also hover your mouse over any of the squares to find out the percentage of participants who grouped the cards in the same category. In the example, we've hovered the
‘Never Never Land / Hades' card pair, and we can see that the two cards were grouped together 80 times.

AS: The Participant-Centric Analysis selects the most popular response submitted by a participant, and then selects the second and third most popular alternative responses
submitted by other participants.
The PCA essentially treats all the participants as Information Architects, and treats their responses as 'votes'. In the example, we can see that 51 out of 60 participants created
groups that were similar to the first one.
One can adjust which participant IAs you view based on how many groups (or categories) the participants sorted their cards into. The setting below shows that we're viewing the
three most popular IAs that created between 4 and 8 groups.

Delphi & Anthropologie
DC: Let’s quickly take a look at a couple of case studies.
The first is an in-person Delphi-method card sort I ran for Anthropologie.
Now, if you are familiar with the Anthropologie brand, you’ll know that they aspire to be unique and memorable. Online, they use lifestyle imagery and rich, descriptive copy to
represent the uniqueness of the Anthropologie brand. But this focus on being unique had become a cause for concern for the website’s navigation. In using creative, brand-
focused labels for navigation, they were concerned that their shoppers might be getting lost.
Another concern they had was the size of the taxonomy. They had recently placed a large section of home goods on the site and were afraid it was a bit too much for their
shoppers. They needed to no if product categories needed to be dropped or collapsed.
We first conducted several activities to evaluate the taxonomy from an expert perspective: including subject matter expert interviews, Web analytics review, and competitive
analysis. We then drafted an updated taxonomy, which we then wanted to test with users. We needed to understand how the users of the site would interpret the category labels.
And we needed to see if the breadth and depth of the taxonomy caused wayfinding issues.

DC: We recruited 13 participants, all who were Anthropologie shoppers. 9 of the participants were avid Anthropologie shoppers, which the rest shopped Anthropologie
periodically. About half of the participants were familiar with the website while the other half were not online customers. We used this mix of participants because we wanted to
make sure that the proposed taxonomy worked for folks who knew the current website organization and those who would be new to it. This is an important point: knowing
what your participants know about the testing domain is important for Delphi-Method testing as each participant builds on the previous participants’ work.
We set up a table in a storage area in the Philadelphia flagship store. The Anthropologie staff put an Anthropologie spin on the testing by placing flowers in the room, offering
gourmet snacks, and placing honoraria in gift boxes. These little touches, which are part of the Anthropologie brand, got the participants in the frame of mind for discussing the
taxonomy in relation to the Anthropologie brand. So consider how you can frame the card sort in a way that prepares the participant mentally for the exercise.

DC: We placed the first- and second-level of the new taxonomy on cards and placed them on the table with Sharpies and blank index cards. We also printed out the second-
and third-level of the home goods category on a sheet of paper. We wanted to get feedback on this important and growing category as it was by far the largest on the website at
the time, but we knew we did not have time for 2 card sorts. I placed my iPhone on a nearby shelf and aimed it down at the table to record the sessions.

DC: I took notes on a sheet that listed all the categories in the proposed taxonomy. I also took pictures of the table after each participant was through in order to document the
progress of the deck throughout the day.
In each session, we explained how the card sort worked and then let the participant start modifying the cards. We asked a lot of questions while they worked through the cards,
each with the intention of trying to understand the participants’ mental models—how they thought about the labels and categories in relation to the brand.
At the end of the day, we had a vetted taxonomy ready for implementation. We knew the breadth and depth of the taxonomy worked for folks. We knew which labels were
optimal for wayfinding. We knew where duplicate category placements made sense.
We also gained a valuable insight about labels and their relationship to the Anthropologie brand. We discovered that terms that were not simply descriptive of their contents
were more problematic at the top of the tree where the user began her wayfinding experience because context had yet to be established. At a lower level when the participants
were looking at category that was unclear to the them, they were able to use the previously clicked categories as well as the sibling categories to interpret the category label.
Something like ‘New & Now’ was confusing at the top of the navigation bar because it looked a little like new arrivals and a little like trends. Having seen this during testing, we
made the decision to use more straight-forward language on higher-level categories, like “New Arrivals” not “New & Now”.

Usability Testing & OnlineShoes
Usability Testing & OnlineShoes
DC: Our second case study is an example of vetting taxonomy work through a usability study.
OnlineShoes is a large e-commerce site dedicated to selling shoes, obviously. They were in the middle of a large project to improve the site’s IA in order to improve the
shopping experience. The project was to update the organization of the site, as well as expand faceted navigation and search. But they needed a new taxonomy to do this.
So, we started by doing usability test on existing site. After that, we executed a large taxonomy evaluation project in order to update the taxonomy.

DC: Now, once we had the new taxonomy, we could have tested it though card sorting exercise both in-person and online. But we had performed a lot of evaluation techniques
and the new taxonomy was an update of the current taxonomy, which was live and working pretty well for the most part. So we felt the taxonomy’s terms and categories were
sound.
What were we interested in was to see how different UI treatments of taxonomy worked for typical users.

DC: So in order to visual the taxonomy, we drafted wireframes that showed the various states of the global navigation’s dropdown navigation, which is the beginning of the
primary shopping path by browsing. We also drafted wireframes for faceted navigation and search to test out how implementation of metadata could support finding products.
The wireframes were then exported into a clickable HTML prototype, something easy to do in Omnigraffle. We used this artifact with 8 participants who were asked to perform
simple findability tasks. While participants worked through the tasks, I asked them questions about wayfinding and using filters to find shoes.
What this testing of the preliminary new navigation gave us was peace of mind we were headed in the right direction with the UI.

DC: And one final point. Just because we were testing the implementation of a taxonomy, that doesn’t mean we could ask questions about classification or labeling.
We had each participant perform a labeling exercise of representative images of show that are difficult to classify. This gave us insight into how users think about cross-
category products and informed our classification strategy for hybrid items, like these open-toed birdcage heels.
We also found out that when confronted with describing these types of products, folks often used labels based on occasion or use instead of product attribution.

What Really Works
• Know your users
• Multiple methods
• Card sorting when researching taxonomy terms
• Usability studies when researching use of information
• Focusing on problem areas instead of entire picture
DC: Before setting off to test your taxonomy, you need to keep these things in mind.
1. How many user groups do you have? Do you have a single group of users who share or more of a general population made up of different kinds of users? Just like other
user research methods, recruiting the right participants is crucial to a good test.
Consider level of expertise or training. Physicians and patients use different terms to refer to the same concepts, such as
“physician” and “carcinoma” versus “doctor” and “cancer”.
2. Try to use multiple methods if you can. For example, if you perform a Delphi-method card sort to vet the taxonomy against users’ mental models, do an online tree jack
exercise to see how people use the terms in context with a larger sample to see patterns.
3. Card sorts work really well for understanding mental models and user acceptance of terms
4. Usability studies work well in assessing the use of information in the context of the experience
5. Focus your studies. Taxonomy is abstract work and can get difficult to execute if you are trying to do too much. Stay focused on key problem areas. In keeping studies small,
you'll be able to repeat studies more often.

What (Usually) Doesn’t Work
• Testing design and information structure at the same time
• Testing too late
• Designing tests that are too large
• Confusing taxonomy with navigation
• Not testing at all
AS: Now that Dave has shown you what really works, here are a few things that usually send you on a wild goose chase.
• Testing design and information structure at the same time — remember your middle school science class, test only one variable at the time to obtain accurate and sensible
results.
• Testing too late — testing your information structure early in the process, allows you to design knowing that information is easily available and retrievable by most users.
And it allows your users to focus on interaction and visual elements during subsequent tests, without being hindered by a clunky navigational structure.
• Designing tests that are too large — for maximum accuracy, focus on small sections of the IA you’re testing. Tests that are too large lead to fatigue and loss of focus.
• Confusing taxonomy with navigation — we’ve said before, taxonomy isn’t navigation, but the categorization system that supports navigation. Testing your entire category
structure (unless you’re Craigslist) won’t give you a usable menu.
• Not testing at all — this is the biggest sin of them all. Much as we think we know our users, we can’t intuitively know how others find, retrieve and consume information.
Testing to ensure flexibility and fundability is key.
While there would be much more to say on the topic, we hope this overview has been helpful. We’ve added a few links in the resources section and we’re always available to
answer any question you might have on the topic. And speaking of that…

Now go test something!
(And THANK YOU!)
@albertatrebla | @saturdave
CC image: flickr.com/photos/dwallick/3390197350/
AS: Thank you!

Resources
The Accidental Taxonomist
Heather Hedden
Information Today, Inc., 2010
“A Delphi Approach to Card Sorting”
Celeste Lyn Paul (PDF)
http://www.asis.org/Conferences/IA07/Sunday/Celeste_Lynn_Paul-Introduction%20to
%20Delphi%20Card%20Sorting.pdf
“A Modiﬁed Delphi Approach to a New Card Sorting Methodology”
Celeste Lyn Paul
http://uxpajournal.org/a-modiﬁed-delphi-approach-to-a-new-card-sorting-
methodology/

Resources
Card Sorting, Designing Usable Categories
Donna Spencer,
Rosenfeld Media, 2009
Information Architecture for the World Wide Web: Designing for the Web and Beyond
Louis Rosenfeld, Peter Morville and Jorge Arango
O’Reilly Media, 2015
Understanding Context: Environment, Language, and Information Architecture
Andrew Hinton
O’Reilly Media, 2014
Joe Lamantia’s Taxonomy Analysis Spreadsheet (with explanation)
http://boxesandarrows.com/analyzing-card-sort-results-with-a-spreadsheet-template/

Testing Taxonomies: Beyond Card Sorting

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Similar to Testing Taxonomies: Beyond Card Sorting

Similar to Testing Taxonomies: Beyond Card Sorting (20)

More from Alberta Soranzo

More from Alberta Soranzo (20)

Recently uploaded

Recently uploaded (20)

Testing Taxonomies: Beyond Card Sorting