Successfully reported this slideshow.
Your SlideShare is downloading. ×

Visualization of Information (ProQuest)

Loading in …3

Check these out next

1 of 62 Ad

Visualization of Information (ProQuest)

Download to read offline

Discusses information visualization, common data models, tools, and how information architecture fits into the picture. Includes many links to additional resources.

Part of the Technology and Content Professional Development Day.

Discusses information visualization, common data models, tools, and how information architecture fits into the picture. Includes many links to additional resources.

Part of the Technology and Content Professional Development Day.


More Related Content

Slideshows for you (20)


Similar to Visualization of Information (ProQuest) (20)

Recently uploaded (20)


Visualization of Information (ProQuest)

  2. 2. WHO AM I?
  3. 3. AGENDA
  5. 5. Why Visualize? understanding discussion
  6. 6. Why? Understanding!
  7. 7. Why? Understanding! Quote from Karl Fast during IA Summit 2014 “Design for Understanding” workshop
  8. 8. Why? Understanding! “…the most difficult mental act of all is to re-arrange a familiar bundle of data, to look at it differently and escape from the prevailing doctrine.” -- Professor H. Butterfield
  9. 9. Why? Discussion! “Allow the information to tell you how it wants to be displayed. As architecture is ‘frozen music’, information architecture is ‘frozen conversation’. Any good conversation is based on understanding.” -- Richard Saul Wurman
  10. 10. DATA MODELS
  11. 11. Data Models
  12. 12. Data Models & Formats
  13. 13. Data Models: Tabular (spreadsheet)
  14. 14. Data Models: Relational (database)
  15. 15. Data Models: Hierarchical (markup)
  16. 16. Data Models: RDF (triples)
  17. 17. Data Models: Tradeoffs
  18. 18. TOOLS
  19. 19. Tools or Toolbox?
  20. 20. Tools for Understanding Adapted from Abby Covert’s “Make Sense: Information Architecture for Everybody”
  21. 21. Tools: Notepad++
  22. 22. Tools: Notepad++
  23. 23. Tools: Open Refine (Google Refine)
  24. 24. Tools: Open Refine (Google Refine)
  25. 25. Tools: Open Refine (Help!)
  26. 26. Tools: Using OpenRefine
  27. 27. Tools: TiddlyWiki
  28. 28. Tools: TiddlyWiki
  29. 29. Tools: TiddlyWiki (Help!)
  30. 30. Tools: TiddlyWiki Example (Thesaurus)
  31. 31. Tools: TiddlyWiki Example (Thesaurus)
  32. 32. Tools: TiddlyWiki Example (RTA Migration Tool)
  33. 33. Tools: TiddlyWiki Example (RTA Migration Tool)
  34. 34. Tools: TiddlyWiki Example (RTA Migration Tool)
  35. 35. Tools: TiddlyWiki Example (RTA Migration Tool)
  36. 36. Tools: TiddlyWiki Example (RTA Migration Tool) Under the hood:
  37. 37. Tools: TiddlyWiki Example (RTA Migration Tool)
  38. 38. Tools: TiddlyWiki Example (RTA Migration Tool)
  39. 39. Tools: Gephi
  40. 40. Tools: Gephi Example (RightNow Support Center)
  41. 41. Tools: Gephi Example (RightNow Support Center)
  42. 42. Tools: Gource
  43. 43. Tools: Gource Example (RightNow Support Center)
  44. 44. Tools: D3
  45. 45. Tools: D3 Galleries
  46. 46. Tools: D3 Example (Venn Diagram)
  47. 47. Tools: RAW
  48. 48. Tools: Datawrapper
  49. 49. Tools: DataWrangler
  50. 50. Tools: yEd
  51. 51. Tools: yEd Example (Training Prep)
  52. 52. Tools: Finding more…
  53. 53. EXAMPLES
  55. 55. Information Architecture?
  56. 56. Information Architecture?
  57. 57. RESOURCES
  58. 58. Resources: Online Tutorials • Using charts in Excel – a-chart-RZ001105505.aspx • Data Visualization Fundamentals – Visualization-Fundamentals/153776- 2.html?srchtrk=index:1%0Alinktypeid:2%0Aq:gephi%0Apage:1% 0As:relevance%0Asa:true%0Aproducttypeid:2 • Using D3 – D3js/162449- 2.html?srchtrk=index:1%0Alinktypeid:2%0Aq:data%2Bvisualizati on%0Apage:1%0As:relevance%0Asa:true%0Aproducttypeid:2
  59. 59. Resources: Tools • Notepad++ – • OpenRefine – – • TiddlyWiki – – – • Gephi – – • Gource –
  60. 60. Resources: Tools • D3 – – • RAW – • Datawrapper – • DataWrangler – • yEd – • alternativeTo –
  61. 61. Resources: Books • Linked Data for Libraries, Archives, and Museums – • Using OpenRefine – • How to Make Sense of Any Mess – – mess
  62. 62. THANKS!

Editor's Notes

  • Thanks to Matt Sorg for inviting me to speak, and to ProQuest for having this Development Day!
  • I’ve done stuff; I continue to do stuff.
  • Why visualize? -> Data Models -> Tools -> Examples -> IA -> Resources
  • Visualization isn’t just for making pretty pictures or even dashboards.
    It helps when we have some data, but we want to understand it better, or discuss it with others.
  • Understanding: patterns, anomalies, scale, perspective
    Discussing: asking questions, describing things, telling stories
  • These quotes are from Richard Saul Wurman, who founded the TED conference and coined the term Information Architecture.

    Understanding and organizing information can be difficult. Organizing information in a meaningful way can be really difficult!

    Wurman quotes from Dan Klyn’s Information Architecture course:
  • The tools I’ll be talking about later are for figuring things out, not just for making things. They let us break data and information up into different kinds of chunks, discover relationships between them, arrange it in different ways, find meanings, and reframe things.
  • It’s not easy though. Since the way information is organized changes its meaning, we have to be careful how we organize it. This book from 1957 argues that science is more than just following correct processes and procedures, and discusses the creative thinking required to do good work. These ideas apply to information science too. It’s as much an art as a science.
  • With appropriate tools and models, maybe we can have better conversations with the information we’re working with.

  • Now we’re going to take a look at some common data models, and how they affect things.
    I guess it’s kind of meta -- thinking about how we think about and use data.
  • These four models come from this book, which was just published a few months ago. Unlike many books out there on data, this one isn’t arguing that a particular model is the best (RDF will replace everything, we should store everything in XML, etc.) Instead, it does a good job of analyzing and comparing current data formats and models. And the authors claim (rightfully so), that none of these models are going away, since they all have their uses.
  • The authors draw a distinction between the model and the format. The formats are more concrete, and are usually the things we talk about. But abstracting it out to the model can help us see some features of these formats. Ideally, the context of the data problem you want to solve will determine the data model, but we may get the data in some other format. Some tools I’ll talk about later can help translate data from one format to another, but care should be taken so that meaning isn’t lost in the translation.
  • Essentially, what we’re doing with tabular data is making a list. Each row is a record, with values at an intersection between a row and column defined by the header. Looking at items in a column gives a sense of different values for a particular element across records.
    PROS: easily human readable, everyone has Excel or a spreadsheet application, easy to make changes to structure (add/remove column)
    CONS: inconsistencies in values tend to happen, difficult to combine tables, issues with the file format (commas, quotes, linebreaks, character encoding, is there a header or not?)
  • The relational model builds on the tabular one. Entities have attributes, and are connected to each other through relations. Every record gets a unique key. If used properly, we don’t have to deal with as many redundancies and inconsistencies.
    PROS: easier to maintain consistency (change item once), fast performance via indexing, complex (SQL) queries are possible
    CONS: difficult to set up and balance tradeoffs, difficult to change structure, difficult import/export to new systems
  • Markup is esssentially annotations added to a document to identify structured elements. Entities can either be elements, or an attribute of an element. XML is the main example here, but JSON is a related one (also hierarchical).
    PROS: platform independent, identifies encoding (UTF8), very flexible to use (schemas, namespaces, XPath)
    CONS: difficult to set up and balance tradeoffs, slower search than databases, tricky to change structure (backwards compatibility)
  • This last model is for Linked Data. But since we don’t really use it here (yet), I’ll skip over the details.
  • The entire chapter on modelling is available as a PDF sample online, so I recommend checking it out if you’re interested. It’s got a lot of examples too, along with a case study on linked data. And it discusses the tradeoffs between each data model in more detail than I did here.
  • This quote was taken from an article I mentioned in my presentation at the IA Summit this year, and although Pasquale was writing about tools for interaction design, I think it applies equally well here. If we’re going to work with data, we need more than just a spreadsheet application, a database, or some kind of content management system.
  • “Most of the word information contains the word inform, so I call things information only if they inform me, not if they are just collections of data, of stuff.” - RSW

    Abby’s original slide only had the understanding arrow, but I added the “tools” arrows. I’d like to propose that while sometimes we transition between these areas using just our wits, more often than not we need to rely on tools to help us get from data to information, and from information to knowledge. At the first overlap, we’re using tools to discover information in the data. We find patterns, we discover relationships, we look for clues given the context that we’re working in. I suspect that most of the time, we’re using tools here to figure stuff out on our own, or with our colleagues. The result may make sense to us, but only because we’re able to make the leap to knowledge in our own heads because we’ve internalized the stories around the information. At the second overlap, between information and knowledge, I think we’re using tools to present the information in a way that other people might understand. That leap we were making in our own heads needs to be shared through a conversation or story that others can relate to. We want to give them a chance to see what we see in the information, and hopefully be able to use that to accomplish their own goals.
  • A good editor is important.
  • Something I’ve used many times when poking around in data files is the find in files feature of Notepad++. If you really want to do some cool stuff you can even define a regular expression.
  • Like a spreadsheet Swiss Army knife.

    (Showing Facets)
  • (Showing Undo)

    There are all sorts of ways to explore and modify tabular data in Google Refine, and while many things can be done quickly, there’s always the danger of messing up your data. Thankfully, it also provides a really nice undo feature. It’s not just a step-by-step undo. As you work, it automatically maintains a complete list of all the changes that have been made, with descriptions, and you can easily roll back (or forward) to any step in the process. It also saves after every change is made.
  • It’s fairly difficult to understand though. While it’s not like learning how to use macros or VLOOKUP in Excel, just knowing where to look in the menus, or what different operations are called can be confusing. Enough other people have found the tool to be equally useful and frustrating that there are several resources on the web like this blog that gives examples of how to do various things.

    I’ve found I can spot problems in spreadsheets more easily in Refine, and things that would take some major effort to do in Excel can be done quickly in Refine if you know where to look. The time you save might be worth the extra bit of research up front.
  • A “web notebook” that I’ve found to provide an interesting and useful self-contained, extensible, hypertext data model. The original creator of Tiddlywiki intended it to be a sort of personal note taking and mind mapping tool. But conceptually it’s like a database, with uniquely named chunks of content that can link to each other like a wiki, and also a tagging system and a dynamically generated timeline showing changes made to the content inside. Implementation-wise, it’s a single HTML file that contains your data and all the features and functionality implemented in Javascript. Edits you make get saved back to the file. Tiddlywiki also supports add-ons and extensions that are written in Javascript – lots of interesting extensions are available for the classic version.
  • The newest version of Tiddlywiki is in beta, and has been rewritten from the ground up using HTML5 and JQuery. It can also be run via Node.js, has some touch input features, is responsive to different screen sizes ad layouts, and performs better than the classic version. The one thing it’s currently missing is many add-ons, though they will probably appear in time.
  • There’s also some documentation appearing for the newest verion, including sites like this one that show various tip and tricks. So while it can be used as a note taking tool or personal without effort, I think the true value of this tool is as a platform to build textual analysis and text-driven exploration tools.
  • When I was in grad school, back in 2007 earning my MLIS degree, I took a course on thesaurus construction. We had a course long group project to construct a thesaurus. While collecting terms and doing some initial work in Excel was reasonable, it seems like it would be really cumbersome and error prone to copy the terms around between Excel, index cards, and then type them all up again in our final project report and in the classified and alphabetical schedules.
  • There’s a lot of housekeeping tasks to keep track of too, like making sure that if a broader term points to a narrower term, that the narrower term points back. While there are large commercial tools for this sort of thing, we couldn’t find anything simple and free. So I adapted a Tiddlywiki to not only import our terms from Excel, but also to dynamically build the two schedules based on the relationships between the terms. I added some error checking tools to ensure the linkages between terms were correct, and if anything strange or missing was found we knew what we needed to fix. It took some effort to build up front, but it ended up saving us more time in the long run, and it made it easier to tweak and update the terms even later in the project because that didn’t generate any extra work for us.
  • Just last year, I used a Tiddlywiki at work to analyze lots of client configuration files, so we could make informed decisions about migrating those configurations from one system to another. In this case I didn’t import the data into the Tiddlywiki, but instead built the Tiddlywiki around the data. Since it’s just a single HTML file, I figured out where the data was stored, and what format it was in, and built a new file by writing some Perl script to insert the configuration data into the right place in the empty file. The timeline of changes appeared for free since I included metadata about when the configurations had changed. The graphs, charts, and other analysis tools were built using plugins and some custom code, but they are all data driven by the configurations in the file.
  • Tiddlywiki also provides a full text search feature. And it can be extended with some powerful plugins.
  • Since I had all the GIT repository information, were the configurations were stored, I used the information about changes to show recent churn.
  • Tables (sortable) and graphs were already available to tiddlywiki via plugins. As long as I put the data in the right format, this stuff just magically worked. And it helped me play with the configurations to better understand and group them.
  • The code to make it happen as pretty simple. This is how the pie chart was defined – most of the code there is the data itself.
  • Again, through a plugin (with some minor tweaks), I added visual diffs between the files, all inside the tiddlywiki.
  • The last thing I added was a color coding mechanism. As I learned more about the similarities between config files, I added some code to identify and group the configs based on certain properties. Then I could given each group a unique hash (ID) and assign a color. Made it easier to work through them in similar chunks.
  • Gephi is a network visualization and analysis tool for exploring graphs. But really, you can look at anything as long as you can get your data into the form of a list of nodes and a list of connections between those nodes. If your data fits that pattern, you can start to explore it visually with Gephi. It can show the connections between things we’re working with, and identify the patterns and groupings that naturally exist because of those connections.
  • A year or so ago, I worked on a project to analyze the online documentation we provided to customers at ProQuest. It wasn’t hosted in a content management system, but instead in a CRM system that also could host documentation. Because of that, the system was missing some features like the ability to check for broken links. I extracted all the articles, wrote some code to identify and record all the links between articles, and then generated a node and edge list that could be fed into Gephi. I also included some metadata for each article, so we’d know what we were looking at. This screenshot shows a web based exploration tool that can be generated with an extension in Gephi, once you’ve got your data loaded and analyzed. Being able to see these connections and play with them a bit led to all sort of insights beyond just finding dead links. And this didn’t even look at usage data which we also had and took a look at separately.
  • Sometimes Gephi produces something that is pretty, instead of useful. 
  • Sometimes I have the luxury of experimenting for the sake of experimenting, and this was one of those cases. Years ago I had seen a cool video that graphically showed changes over time in a source control system. I wondered if it might be used for other types of information.
  • The data format was pretty simple, with just a timecode, a name for the person who made the change, what type of change was made, and some other things like the color to use for the marker. Since I still had that documentation metadata lying around, I got it into the correct format and fed it into Gource. While it made a nice video, I didn’t realize it could be useful until I showed it to the manager who was responsible for that documentation. As he watched, he started noticing major events that had happened in the evolution of the system, and the conversation we had revealed a number of things I wouldn’t have known about the system. That came in handy when it came time to migrate the content to a completely different system this year. In this case the tool didn’t help me understand anything directly, but it prompted someone else to share useful information.
  • Though more of a framework and almost like a software development kit, it’s possible to use the extensive library of D3 examples without creating a new visualization from scratch or writing much, if any, Javascript.
  • There are lots of examples available, so if you can find a visual framing of your data that seems to make sense, you might be able to adapt a sample to look at your data instead. Usually some amount of editing is needed though.
  • “Who are our Summon clients?” We have all these different labels or buckets that we assign customers to in the Client Center, and some of our tools and products use them for various things. Some of the information is out of date, and needs to be cleaned up. There’s not even a consistent idea of what each category actually means, or how it is being used. But looking at sets of clients this way, and especially the overlap, we can get a better idea of what we actually have to deal with!
  • RAW is another visualization tool that requires no coding at all. You simply copy your data into the web based tool, and you can play with different types of visualizations using the same data set. It uses D3 to show its visualizations.

  • Similar to RAW, but with support for basic visualizations like line charts and bar charts. It also supports choropleth maps for geographic data.
  • Somewhat similar to OpenRefine. Unfortunately it’s no longer maintained as a free product, as they turned it into a commercial tool.
  • Sort of a cross between Visio and Gephi.
  • Peter asked his team to spend at least 30 minutes researching a client before they call them to set up training. Knowing who the client is, and as much about their situation as possible allows his team to offer more personalized training. But how do they actually do this research? They have to dig around in Client Center, hunting for specific things in lots of different places. It takes time just to find the information, and also makes it easy to miss things.
  • As its name implies, this website called alternativeTo allows you to search for a particular application and then find alternatives to it. So if you know about a potentially useful tool for one platform, you might be able to find a similar tool on another platform by seeing what comes up. The relationships this site often reveal related tools rather than replacements. But that’s great too, because if you have a tool that almost does what you need, you might be able to find a slightly different replacement for it that does what you need.

  • If time, show: Support Center (Gource, Tiddlywiki, Gephi, RTA Migration Tiddlywiki), Random D3 examples, yEd example (preparing for training call), OpenRefine
  • I’ve mentioned this a few times, so I thought I’d explain why…
  • “Academic” description.
  • Easier to understand description. (Abby is the current president of the Information Architecture Institute.)