MW2010: Nate Solas, Hiding our Collections in Plain Site: Interface Strategies for "Findability"

•

1 like•1,688 views

A presentation from Museums and the Web 2010. A recent redesign of the collection search interface for an on-line art education tool (ArtsConnectEd.org) has provided an opportunity to compare usage patterns between the two versions. In this paper I first survey current search interface design patterns, then discuss the new interface, the log cleanup and analysis, and finally present evidence-based recommendations that may be applied to the general problem of presenting large collections on-line. see http://www.archimuse.com/mw2010/abstracts/prg_335002302.html

Technology

Hiding our Collections in Plain Site Interface Strategies for Findability

Nate Solas ,[object Object],Webmaster at the Walker Art Center Today: The problem The numbers Recommendations Beyond the interface

What do the logs tell us about how this interface is used? id timestamp term # user agent 187710 2008-04-28 12:03:21.043 fire 53 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 187711 2008-04-28 12:03:22.123 tropical beaches Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 187712 2008-04-28 12:03:26.233 water 148 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 187713 2008-04-28 12:03:27.170 nature and animals 25 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)

Top search terms in ACE 1 trying to browse? part of an assignment? Term Frequency photographs 734 paintings 413 sculpture 275 arts 263 ceramics 169 pattern 156 photograph 150 landscape 123 masks 121 children 112

Artist names are big, then medium, and finally culture

Recovering session information (simplified) EFF showed user agent usually contains enough unique information to identify a user out of 1,500 others. Combined with 3-minute session window = good enough. Timestamp Term User Agent Session 1:10:01pm horse Firefox 3 1 1:10:45pm pony Firefox 3 1 1:12:22pm flower IE 7 2 1:30:12pm gallop Firefox 3 3 1:31:42pm portrait Firefox 3.5 4 1:33:08pm portrait Firefox 3 3

Time on site: easy to measure, hard to understand Is your interface inviting deeper exploration, or is it just taking them forever to figure it out?

Still typing facet terms into the search box?

Prevent dead ends 3x more dead ends! You Should Tell users (133) How many (826) Results (0) To expect (26) Bad Design Makes Them guess What's behind Each door

No search engine can crawl this site ... unless you provide a sitemap

[object Object],[object Object],[object Object],[object Object],[object Object]

The end. All Stormtrooper photos via CC license at: http://www.flickr.com/photos/st3f4n/ Pic. of baby with book: by me

Similar to MW2010: Nate Solas, Hiding our Collections in Plain Site: Interface Strategies for "Findability"

Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...rschuppe

44CON 2013 - Browser bug hunting - Memoirs of a last man standing - Atte Kett...44CON

Metrics-Driven Engineering at EtsyMike Brittain

Webinar: Untethering Compute from StorageAvere Systems

44 con slidesgeeksec80

44 con slides (1)geeksec80

Beyond PHP - it's not (just) about the codeWim Godden

String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider

Atlassian - Software For Every TeamSven Peters

A year in the life of Firebird .Net providerMind The Firebird

What you wanted to know about MySQL, but could not find using inernal instrum...Sveta Smirnova

Beyond php - it's not (just) about the codeWim Godden

DAW: Duplicate-AWare Federated Query Processing over the Web of DataMuhammad Saleem

Beyond php - it's not (just) about the codeWim Godden

A curious case of broken dns responses - RIPE75Babak Farrokhi

SIEM 101: Get a Clue About IT Security Analysis AlienVault

Scaling MySQL Strategies for DevelopersJonathan Levin

Networking and Computer TroubleshootingRence Montanes

Coding for multiple coresLee Hanxue

Website development for FLO meetingdejp3

Similar to MW2010: Nate Solas, Hiding our Collections in Plain Site: Interface Strategies for "Findability" (20)

Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...

44CON 2013 - Browser bug hunting - Memoirs of a last man standing - Atte Kett...

Metrics-Driven Engineering at Etsy

Webinar: Untethering Compute from Storage

44 con slides

44 con slides (1)

Beyond PHP - it's not (just) about the code

String Comparison Surprises: Did Postgres lose my data?

Atlassian - Software For Every Team

A year in the life of Firebird .Net provider

What you wanted to know about MySQL, but could not find using inernal instrum...

Beyond php - it's not (just) about the code

DAW: Duplicate-AWare Federated Query Processing over the Web of Data

Beyond php - it's not (just) about the code

A curious case of broken dns responses - RIPE75

SIEM 101: Get a Clue About IT Security Analysis

Scaling MySQL Strategies for Developers

Networking and Computer Troubleshooting

Coding for multiple cores

Website development for FLO meeting

Recently uploaded

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Real Time Object Detection Using Open CVKhem

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Histor y of HAM Radio presentation slidevu2urc

GenCyber Cyber Security Day PresentationMichael W. Hawkins

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

A Domino Admins Adventures (Engage 2024)Gabriella Davis

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Handwritten Text Recognition for manuscripts and early printed texts

Finology Group – Insurtech Innovation Award 2024

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Advantages of Hiring UIUX Design Service Providers for Your Business

How to Troubleshoot Apps for the Modern Connected Worker

Real Time Object Detection Using Open CV

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Histor y of HAM Radio presentation slide

GenCyber Cyber Security Day Presentation

What Are The Drone Anti-jamming Systems Technology?

Tata AIG General Insurance Company - Insurer Innovation Award 2024

A Domino Admins Adventures (Engage 2024)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Exploring the Future Potential of AI-Enabled Smartphone Processors

2024: Domino Containers - The Next Step. News from the Domino Container commu...

MW2010: Nate Solas, Hiding our Collections in Plain Site: Interface Strategies for "Findability"

1. Hiding our Collections in Plain Site Interface Strategies for Findability

6. ArtsConnectEd 1 (ACE1)

7. What do the logs tell us about how this interface is used? id timestamp term # user agent 187710 2008-04-28 12:03:21.043 fire 53 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 187711 2008-04-28 12:03:22.123 tropical beaches Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 187712 2008-04-28 12:03:26.233 water 148 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 187713 2008-04-28 12:03:27.170 nature and animals 25 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)

8. Top search terms in ACE 1 trying to browse? part of an assignment? Term Frequency photographs 734 paintings 413 sculpture 275 arts 263 ceramics 169 pattern 156 photograph 150 landscape 123 masks 121 children 112

9. "Term Facets"

10. Facet Other

11. Artist names are big, then medium, and finally culture

12. Recovering session information (simplified) EFF showed user agent usually contains enough unique information to identify a user out of 1,500 others. Combined with 3-minute session window = good enough. Timestamp Term User Agent Session 1:10:01pm horse Firefox 3 1 1:10:45pm pony Firefox 3 1 1:12:22pm flower IE 7 2 1:30:12pm gallop Firefox 3 3 1:31:42pm portrait Firefox 3.5 4 1:33:08pm portrait Firefox 3 3

13.

14. Browsing the collection, circa 1990

15.

16.

17.

18.

19. Facet Other

20.

21. Coverage of the collection +20% +100%

22. Time on site: easy to measure, hard to understand Is your interface inviting deeper exploration, or is it just taking them forever to figure it out?

23. Still typing facet terms into the search box?

24. Recommendations

25.

26. Prevent dead ends 3x more dead ends! You Should Tell users (133) How many (826) Results (0) To expect (26) Bad Design Makes Them guess What's behind Each door

27. Explain the "no results" page

28. Suggest speling spelling

29. Beyond the interface

30.

31. No search engine can crawl this site ... unless you provide a sitemap

32. SEO + Crawlable site + Sitemap =

33.

34. The end. All Stormtrooper photos via CC license at: http://www.flickr.com/photos/st3f4n/ Pic. of baby with book: by me

Editor's Notes

http://www.flickr.com/photos/st3f4n/4046427260/ Today we're going to talk about our collections online, and the interfaces we build to give access to them. Or, as is often to the case, to essentially hide them.
Last year we completed a redesign of a collection search interface, and this paper explores the different usage patterns between the old and new version. By comparing the same site, same users, and same timeframe (across two years), we were able to see some clear patterns in how the interface both helps and hinders exploration. So first, the problems we identified. Second, the research and results. Then a few recommendations based on these numbers. And finally, what's next beyond our interfaces?
Let's start by defining the problem. To avoid pointing fingers, I'll use the Walker Art Center as an example -- with the disclaimer for the problems that &quot;it's on our list&quot;. :-) This is a pretty typical page for a museum collection site: a few highlights, some news, and ...
... a search box. In order to find ANYTHING in our collection, you have to search for it. ... but that's ok, because everyone's familiar with our collection, right? Of course not. And with this interface, there's no way you could be. It's all hidden. In a plain site. Bam...
To rub it in, we use the word &quot;Explore&quot;. Good luck.
This, however, is the site I'm going to talk about today: ArtsConnectEd. Specifically, ArtsConnectEd version 1, from 1997-2009. ACE is a joint project between the Minneapolis Institute of Arts and the Walker Art Center. It provides a suite of tools for teaching the arts, and it contains our combined collections online. As you can see, this interface is also pretty plain, and due to the single search box it basically continues to hide our collections from casual users.
In 2007 we decided to start tracking WHAT people were searching for, to help inform a redesign. Note the lack of IP or a session identifier. This is terrible news for our analysis. (I didn't build the logging mechanism.)
So the data isn't ideal, but it's enough to get us started looking at the top search terms. The top 10 is split: 5 look like words someone might put in to get a sense of what's in the collection, and most of the others look like they might be part of assignments. How can we tell what's what? What do these words mean?
I decided to break the words down into buckets of information, using the broad facets of information about the artwork: artist names, medium, and culture. There's actually a 4th bucket for &quot;other&quot;, words that don't fit into one of these facet lists. The word lists I used came from the cleaned-up metadata in ACE2. So now that we've given some meaning to the words, what do the searches look like?
In ACE1, almost exactly half of all search terms contained a &quot;facet word&quot; from one of those buckets. So the top half is the 3 facet buckets, the bottom half is every other kind of word. At this point we don't have the context to interpret it, so we'll just remember this graph: 50/50.
This is a snapshot of the facet terms over the entire log period. It does show us where the buckets overlap: the blue culture bucket is most likely to also contain another facet word. Again, pretty useful, but it doesn't really show us usage trends during a search session -- because we didn't log that information. But what if we can rebuild the session information and trace a search through its results?
It turns out we can. Using the unique information in the User Agent string: browser version & build, OS version & build, etc., we are able to make fairly confident assertions about session groupings. I also added a lot of code to watch for computer labs where we'd see hits coming faster than someone could reasonably type, and also rapid switching of between search terms. Spot checking shows very accurate session recovery.
So now that we have session information, we know it's the same person, and we can tell if they paged within results. You can see the solid black line is the percent of searches that contained a facet word, and it averages to 50%, just like we saw in that pie chart: 50/50. What that pie chart didn't tell us is how it grows over time: the further someone pages, the more likely they are to be searching with a facet word, and especially a Medium word. This is the big red flag in the data that screams to me: &quot;people want to browse!&quot; Interesting to note, as compared to Venn diagram: artist names occur most often in the first few pages of results, but then medium takes over. So this is how people used ACE1. What does the new site, ACE2, look like?
So if we think people want to browse, what should we be building? With the new interface we wanted to combine the power of a book's index with the enjoyability of flipping to a random page. We wanted a map that told you exactly where to find the piece you want, but also gave the serendipity of browsing a gallery...
Did we do it? Hard to say. Here's what we built. We start with the whole collection, and let users narrow it down with filters using those facets we talked about.
Piotr Adamczki(sp?) has a paper this year about collection dashboards, sort of an at-a-glance summary of the collection and what's in it. Using these pulldowns it's easy to see the breadth of the collections by the numbers after each facet.
If we pick Sculpture from the Medium pulldown, the Culture numbers automatically update to reflect the new number of results for each.
Finally, we use autocomplete lists to help users spell artist names. 'Cause they're hard, sometimes.
So how does it compare? Instead of a 50/50 split in facet words and other words, ACE2 sees only about 1/3 of the search words being facet terms. The other 2/3 are words to refine their search. So are they browsing using the filters? Yes: about 25% of all searches use filters corresponding to these three types of facets (artist name, medium, culture)
Here's that same graph of usage over a session, this time for ACE2. Wildly different from ACE1: as a user pages deeper into a result set, in ACE2 they are much more likely to have an empty search term. They're browsing using the filters.
This is probably the best metric we have for actual success of the interface. We see a 20% bump in coverage in the collection, which means people found and looked at 20% more unique objects from the collection. Some of this number is due to objects now being in search engines, but it's largely internal traffic. The bump on the right, however, total views of all objects, did get more help from search engines as you'll see later.
Big, big caveats here. Most of this extra time was spent on the object detail pages, presumably because there's more to do there. (or maybe because they're confusing) I was expecting to see a big jump in time spent on the site as people enjoyed the new browse experience and spent hours exploring the collection. Not the case. Overall the time spent on the site went down a bit, but this seems to be because the number of unique visitors went way up. So the moral of this slide is: be careful if you decide one of your metrics for success is going to be &quot;time on site&quot;.
This is the &quot;do users get it?&quot; question. Do they understand the interface enough to not type &quot;Sculpture&quot; into the search box, but instead use the pulldown? Yes, they get it: on average they use the browse pulldowns 4 times more frequently than typing in the equivalent words.
http://www.flickr.com/photos/st3f4n/4193370268/in/set-72157616350171741/
More of an observation masquerading as a recommendation: &quot;Users want to browse&quot; Front door: &quot;Highlights&quot; is not browse Back door: &quot;More like this&quot; is not browse Side door: &quot;Artist list&quot; is not browse
In cases where we don't show users what to expect, they are three times more likely to browse into an empty result set than when we warn them. So that ends up being about 1% of all our users who don't believe those (0)s, but the point still stands. These numbers help. There's a whole other debate as to whether we should remove the empty options or not - we have very small pulldown lists, so we keep everything.
We see too much of this: a user has a successful search including a browse facet (step 1), and then thinks &quot;ah, I know what I want to look for next!&quot; In step 2, they get no results because their browse facet is mutually exclusive. Of the users that get to step 2 (blocked results), 25% continue on to step 3 because they don't notice their browse facet is still on. We have breadcrumbs, but apparently they aren't doing it. The &quot;no results&quot; page needs an explicit set of links explaining what's limiting their results, and what to do about it.
A few things here: Suggest spelling based on the dictionary of words in your database, not Websters. You only want to suggest words that exist. Second, we see that it works: there are more respellings in ACE2, presumably because we're suggesting the correct words. Also, the graph on the right reinforces the uptake on the browse pulldowns and auto-complete: every facet has fewer spelling corrections, presumably because they're choosing from a list instead of trying to type it.
http://www.flickr.com/photos/st3f4n/3951143570/
Searches start on Google. So even if you have the best interface in the world, you have to think about how people find your content before they get to your interface.
This is the terrible truth. I did some hard looking at our incoming links, and tried searching for a few random pieces from our collection, and it's just not there. Search engines try their best, but if your whole site is behind a text input field, it's hidden. A sitemap can solve this without an interface change: it's just a list of all the URLs in your collection, so everyone can crawl them without guessing.
Enough said. Get yourself a sitemap, make your collection crawlable, and use some basic SEO techniques. Or else you essentially don't exist.
What about marking up our well-formed HTML with extra semantic tags? Three things we're shooting for here: 1. Better search result displays in major engines. SearchMonkey already lets you define templates to display custom RDFa information. 2. Better search results. If we can tell the search engines *about* our data, they can search it better. 3. All the cosmic rainbow goodness that comes from Linked Open Data. This is the next phase of findability - your web site still matters, but less and less.
http://www.flickr.com/photos/st3f4n/3493855156/

MW2010: Nate Solas, Hiding our Collections in Plain Site: Interface Strategies for "Findability"

Recommended

Recommended

More Related Content

Similar to MW2010: Nate Solas, Hiding our Collections in Plain Site: Interface Strategies for "Findability"

Similar to MW2010: Nate Solas, Hiding our Collections in Plain Site: Interface Strategies for "Findability" (20)

More from museums and the web

More from museums and the web (20)

Recently uploaded

Recently uploaded (20)

MW2010: Nate Solas, Hiding our Collections in Plain Site: Interface Strategies for "Findability"

Editor's Notes