A presentation from Museums and the Web 2010.
A recent redesign of the collection search interface for an on-line art education tool (ArtsConnectEd.org) has provided an opportunity to compare usage patterns between the two versions. In this paper I first survey current search interface design patterns, then discuss the new interface, the log cleanup and analysis, and finally present evidence-based recommendations that may be applied to the general problem of presenting large collections on-line.
http://www.flickr.com/photos/st3f4n/4046427260/ Today we're going to talk about our collections online, and the interfaces we build to give access to them. Or, as is often to the case, to essentially hide them.
Last year we completed a redesign of a collection search interface, and this paper explores the different usage patterns between the old and new version. By comparing the same site, same users, and same timeframe (across two years), we were able to see some clear patterns in how the interface both helps and hinders exploration. So first, the problems we identified. Second, the research and results. Then a few recommendations based on these numbers. And finally, what's next beyond our interfaces?
Let's start by defining the problem. To avoid pointing fingers, I'll use the Walker Art Center as an example -- with the disclaimer for the problems that &quot;it's on our list&quot;. :-) This is a pretty typical page for a museum collection site: a few highlights, some news, and ...
... a search box. In order to find ANYTHING in our collection, you have to search for it. ... but that's ok, because everyone's familiar with our collection, right? Of course not. And with this interface, there's no way you could be. It's all hidden. In a plain site. Bam...
To rub it in, we use the word &quot;Explore&quot;. Good luck.
This, however, is the site I'm going to talk about today: ArtsConnectEd. Specifically, ArtsConnectEd version 1, from 1997-2009. ACE is a joint project between the Minneapolis Institute of Arts and the Walker Art Center. It provides a suite of tools for teaching the arts, and it contains our combined collections online. As you can see, this interface is also pretty plain, and due to the single search box it basically continues to hide our collections from casual users.
In 2007 we decided to start tracking WHAT people were searching for, to help inform a redesign. Note the lack of IP or a session identifier. This is terrible news for our analysis. (I didn't build the logging mechanism.)
So the data isn't ideal, but it's enough to get us started looking at the top search terms. The top 10 is split: 5 look like words someone might put in to get a sense of what's in the collection, and most of the others look like they might be part of assignments. How can we tell what's what? What do these words mean?
I decided to break the words down into buckets of information, using the broad facets of information about the artwork: artist names, medium, and culture. There's actually a 4th bucket for &quot;other&quot;, words that don't fit into one of these facet lists. The word lists I used came from the cleaned-up metadata in ACE2. So now that we've given some meaning to the words, what do the searches look like?
In ACE1, almost exactly half of all search terms contained a &quot;facet word&quot; from one of those buckets. So the top half is the 3 facet buckets, the bottom half is every other kind of word. At this point we don't have the context to interpret it, so we'll just remember this graph: 50/50.
This is a snapshot of the facet terms over the entire log period. It does show us where the buckets overlap: the blue culture bucket is most likely to also contain another facet word. Again, pretty useful, but it doesn't really show us usage trends during a search session -- because we didn't log that information. But what if we can rebuild the session information and trace a search through its results?
It turns out we can. Using the unique information in the User Agent string: browser version & build, OS version & build, etc., we are able to make fairly confident assertions about session groupings. I also added a lot of code to watch for computer labs where we'd see hits coming faster than someone could reasonably type, and also rapid switching of between search terms. Spot checking shows very accurate session recovery.
So now that we have session information, we know it's the same person, and we can tell if they paged within results. You can see the solid black line is the percent of searches that contained a facet word, and it averages to 50%, just like we saw in that pie chart: 50/50. What that pie chart didn't tell us is how it grows over time: the further someone pages, the more likely they are to be searching with a facet word, and especially a Medium word. This is the big red flag in the data that screams to me: &quot;people want to browse!&quot; Interesting to note, as compared to Venn diagram: artist names occur most often in the first few pages of results, but then medium takes over. So this is how people used ACE1. What does the new site, ACE2, look like?
So if we think people want to browse, what should we be building? With the new interface we wanted to combine the power of a book's index with the enjoyability of flipping to a random page. We wanted a map that told you exactly where to find the piece you want, but also gave the serendipity of browsing a gallery...
Did we do it? Hard to say. Here's what we built. We start with the whole collection, and let users narrow it down with filters using those facets we talked about.
Piotr Adamczki(sp?) has a paper this year about collection dashboards, sort of an at-a-glance summary of the collection and what's in it. Using these pulldowns it's easy to see the breadth of the collections by the numbers after each facet.
If we pick Sculpture from the Medium pulldown, the Culture numbers automatically update to reflect the new number of results for each.
Finally, we use autocomplete lists to help users spell artist names. 'Cause they're hard, sometimes.
So how does it compare? Instead of a 50/50 split in facet words and other words, ACE2 sees only about 1/3 of the search words being facet terms. The other 2/3 are words to refine their search. So are they browsing using the filters? Yes: about 25% of all searches use filters corresponding to these three types of facets (artist name, medium, culture)
Here's that same graph of usage over a session, this time for ACE2. Wildly different from ACE1: as a user pages deeper into a result set, in ACE2 they are much more likely to have an empty search term. They're browsing using the filters.
This is probably the best metric we have for actual success of the interface. We see a 20% bump in coverage in the collection, which means people found and looked at 20% more unique objects from the collection. Some of this number is due to objects now being in search engines, but it's largely internal traffic. The bump on the right, however, total views of all objects, did get more help from search engines as you'll see later.
Big, big caveats here. Most of this extra time was spent on the object detail pages, presumably because there's more to do there. (or maybe because they're confusing) I was expecting to see a big jump in time spent on the site as people enjoyed the new browse experience and spent hours exploring the collection. Not the case. Overall the time spent on the site went down a bit, but this seems to be because the number of unique visitors went way up. So the moral of this slide is: be careful if you decide one of your metrics for success is going to be &quot;time on site&quot;.
This is the &quot;do users get it?&quot; question. Do they understand the interface enough to not type &quot;Sculpture&quot; into the search box, but instead use the pulldown? Yes, they get it: on average they use the browse pulldowns 4 times more frequently than typing in the equivalent words.
More of an observation masquerading as a recommendation: &quot;Users want to browse&quot; Front door: &quot;Highlights&quot; is not browse Back door: &quot;More like this&quot; is not browse Side door: &quot;Artist list&quot; is not browse
In cases where we don't show users what to expect, they are three times more likely to browse into an empty result set than when we warn them. So that ends up being about 1% of all our users who don't believe those (0)s, but the point still stands. These numbers help. There's a whole other debate as to whether we should remove the empty options or not - we have very small pulldown lists, so we keep everything.
We see too much of this: a user has a successful search including a browse facet (step 1), and then thinks &quot;ah, I know what I want to look for next!&quot; In step 2, they get no results because their browse facet is mutually exclusive. Of the users that get to step 2 (blocked results), 25% continue on to step 3 because they don't notice their browse facet is still on. We have breadcrumbs, but apparently they aren't doing it. The &quot;no results&quot; page needs an explicit set of links explaining what's limiting their results, and what to do about it.
A few things here: Suggest spelling based on the dictionary of words in your database, not Websters. You only want to suggest words that exist. Second, we see that it works: there are more respellings in ACE2, presumably because we're suggesting the correct words. Also, the graph on the right reinforces the uptake on the browse pulldowns and auto-complete: every facet has fewer spelling corrections, presumably because they're choosing from a list instead of trying to type it.
Searches start on Google. So even if you have the best interface in the world, you have to think about how people find your content before they get to your interface.
This is the terrible truth. I did some hard looking at our incoming links, and tried searching for a few random pieces from our collection, and it's just not there. Search engines try their best, but if your whole site is behind a text input field, it's hidden. A sitemap can solve this without an interface change: it's just a list of all the URLs in your collection, so everyone can crawl them without guessing.
Enough said. Get yourself a sitemap, make your collection crawlable, and use some basic SEO techniques. Or else you essentially don't exist.
What about marking up our well-formed HTML with extra semantic tags? Three things we're shooting for here: 1. Better search result displays in major engines. SearchMonkey already lets you define templates to display custom RDFa information. 2. Better search results. If we can tell the search engines *about* our data, they can search it better. 3. All the cosmic rainbow goodness that comes from Linked Open Data. This is the next phase of findability - your web site still matters, but less and less.
MW2010: Nate Solas, Hiding our Collections in Plain Site: Interface Strategies for "Findability"
Hiding our Collections in Plain Site Interface Strategies for Findability
Nate Solas <ul><li>(@homebrewer) </li></ul>Webmaster at the Walker Art Center Today: The problem The numbers Recommendations Beyond the interface
What do the logs tell us about how this interface is used? id timestamp term # user agent 187710 2008-04-28 12:03:21.043 fire 53 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 187711 2008-04-28 12:03:22.123 tropical beaches Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 187712 2008-04-28 12:03:26.233 water 148 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 187713 2008-04-28 12:03:27.170 nature and animals 25 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Top search terms in ACE 1 trying to browse? part of an assignment? Term Frequency photographs 734 paintings 413 sculpture 275 arts 263 ceramics 169 pattern 156 photograph 150 landscape 123 masks 121 children 112
Artist names are big, then medium, and finally culture
Recovering session information (simplified) EFF showed user agent usually contains enough unique information to identify a user out of 1,500 others. Combined with 3-minute session window = good enough. Timestamp Term User Agent Session 1:10:01pm horse Firefox 3 1 1:10:45pm pony Firefox 3 1 1:12:22pm flower IE 7 2 1:30:12pm gallop Firefox 3 3 1:31:42pm portrait Firefox 3.5 4 1:33:08pm portrait Firefox 3 3