• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Visualising data: Seeing is Believing - CS Forum 2012
 

Visualising data: Seeing is Believing - CS Forum 2012

on

  • 3,719 views

When patterns and connections are revealed between numbers, content and people that might otherwise be too abstract or scattered to be grasped, we’re able to make better sense of where we are, what ...

When patterns and connections are revealed between numbers, content and people that might otherwise be too abstract or scattered to be grasped, we’re able to make better sense of where we are, what it might mean and what needs to be done.

Statistics

Views

Total Views
3,719
Views on SlideShare
3,545
Embed Views
174

Actions

Likes
11
Downloads
73
Comments
0

6 Embeds 174

http://www.scoop.it 90
http://lanyrd.com 54
https://twitter.com 26
https://si0.twimg.com 2
http://pinterest.com 1
http://translate.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • humans -> observe + analyse -> contents -> world around us -> unique + astonishing\n\n*BUT* form + communicate -> verbal + visual concepts\n
  • principle factor -> adaptive advantage -> competition -> other animal species\n
  • one respect -> superior -> other animals -> we've developed -> greater variety -> systems of communication\n\n*AND* one = art\n
  • earliest known examples -> human expression -> demonstrate -> our ability -> chaotic + complex environments -> under control -> magic of art\n\n*BECAUSE* depict something = transform it -> whatever form/shape -> want\n
  • *THOUGH* no one knows for sure -> real purpose? = cave paintings = experts believe -> primarily wild animals (cows, deer, horses, elk + bison) -> attempt to control or ‘tame’ them\n\n*INTERESTING* domesticated -> 1000s years\n
  • if art + other forms -> creative expression = transform + interpret -> science = identifier + unifier\n\nfew better collisions -> 2 cultures = diagram\n
  • world = full of famous + recognisable diagram\n\nmost potent -> ability -> express complex ideas simply -> intellectual + artistic beauty -> power -> shift perspectives -> change minds\n
  • 500+ years ago -> (little known) Polish cleric -> Nicolaus Copernicus -> came to revolutionise -> we see -> our place -> universe\n
  • 1000s years -> scholars + religious scriptures -> steadfast -> belief -> we (earth) = static centrepiece of universe = everything revolved around us\n\n*SO* against weight -> established teachings + opinions -> *HOW* back up theory?\n
  • scan pages -> life's work = contains pages of tabular data and calculations\n\n*AS WELL AS* using = own astrological observations = recalculate planetary positions -> 1000s year's worth data from past astronomers = underpinned diagram\n\n
  • despite weight (foundations) = beauty lies -> simplicity -> you or I could draw it\n
  • for someone -> benefited -> data from diff sources -> would've welcomed -> data revolution\n
  • for someone -> benefited -> data from diff sources -> would've welcomed -> data revolution\n
  • for someone -> benefited -> data from diff sources -> would've welcomed -> data revolution\n
  • for someone -> benefited -> data from diff sources -> would've welcomed -> data revolution\n
  • barriers (lowered) -> ourselves -> rich datasets -> information -> communities + politics + governments\n\n*IF* access to -> numbers + statistics = what's really going on = we can start make things better -> local + national + Inter'l level\n
  • barriers (lowered) -> ourselves -> rich datasets -> information -> communities + politics + governments\n\n*IF* access to -> numbers + statistics = what's really going on = we can start make things better -> local + national + Inter'l level\n
  • barriers (lowered) -> ourselves -> rich datasets -> information -> communities + politics + governments\n\n*IF* access to -> numbers + statistics = what's really going on = we can start make things better -> local + national + Inter'l level\n
  • barriers (lowered) -> ourselves -> rich datasets -> information -> communities + politics + governments\n\n*IF* access to -> numbers + statistics = what's really going on = we can start make things better -> local + national + Inter'l level\n
  • offshoots -> emergence -> new + powerful tools = interrogating + presenting data.\n\nallow us = communicate findings -> in a manner -> audiences = can understand + act upon\n
  • offshoots -> emergence -> new + powerful tools = interrogating + presenting data.\n\nallow us = communicate findings -> in a manner -> audiences = can understand + act upon\n
  • offshoots -> emergence -> new + powerful tools = interrogating + presenting data.\n\nallow us = communicate findings -> in a manner -> audiences = can understand + act upon\n
  • offshoots -> emergence -> new + powerful tools = interrogating + presenting data.\n\nallow us = communicate findings -> in a manner -> audiences = can understand + act upon\n
  • These tools (basic level) = free to access and use\n
  • From -> graphics software = produce interactive visualisations\n
  • …tools = visualising + manipulating -> data -> graph form\n
  • …tools = mapping location data\n
  • …tools for scraping + retrieving + converting data\n
  • I've found -> visualising data -> effective way -> test buoyancy theories\n\n*ALSO* gain deeper understanding -> hidden processes -> exist (organisations)\n
  • *ALSO* I've found -> effective method -> amplifying + simplifying -> communication -> my recommendations + arguments -> people effect change in organisations\n
  • \n
  • \n
  • \n
  • \n
  • (one of) effective methods employed -> discover + analyse -> lifecycle -> organisation's content -> conduct series 1-on-1 interviews -> key members -> authoring team\n\n*BUT* locating -> right people -> multi-departmental organisation = isn't always easy\n
  • consulting -> hierarchal org charts = useful\n*BUT* don't (always) reveal -> hidden relationships -> forged by content\n*NOR* always show -> real influence lies -> organisation\n
  • *THIS* a sociogram -> put simply = visualised social network\n\n*IT CAN* powerful tool = discovering deeper meanings behind -> relationships + communities -> within a network\n
  • *SO* each node (bubble) = person within network\n\n*AND* each edge (line) = connection between two people\n
  • used = reveal community clusters = might not always be reflected -> org chart\n\n*AND* used = calculate network science parameters = degrees of separation\n\n*SO* the more connections -> individual has = higher their degree\n
  • *ALSO* used = calculate betweenness centrality\n\n*OR* I prefer to call it = influence\n\n*SO* more connections -> individual has -> different community clusters = higher their betweenness\n
  • most CMSs + intranet servers -> generate + store -> sets log files = record activity (users)\n\n*WHEN* opened = might look like this\n
  • focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  • focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  • focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  • focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  • focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  • focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  • focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  • *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  • *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  • *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  • *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  • *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  • *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  • *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  • …add them -> 2 columned spreadsheet\n\nfirst usernames -> under ‘Source’ -> second -> under ‘Target’\n\n*FINALLY* save as .csv\n
  • …add them -> 2 columned spreadsheet\n\nfirst usernames -> under ‘Source’ -> second -> under ‘Target’\n\n*FINALLY* save as .csv\n
  • …add them -> 2 columned spreadsheet\n\nfirst usernames -> under ‘Source’ -> second -> under ‘Target’\n\n*FINALLY* save as .csv\n
  • import our data in Gephi = tool = visualising + manipulating -> data -> graph form\n
  • we have a big mess = randomly distributed nodes and edges\n\nwe’ll run a layout algorithm, which positions our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
  • we have a big mess = randomly distributed nodes and edges\n\nwe’ll run a layout algorithm, which positions our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
  • we have a big mess = randomly distributed nodes and edges\n\nwe’ll run a layout algorithm, which positions our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
  • this looks better! = formed of defined community clusters\n\nTo help pick out the most important individuals in the network, we need to rank this by ‘Degrees of separation’\n
  • this looks better! = formed of defined community clusters\n\nTo help pick out the most important individuals in the network, we need to rank this by ‘Degrees of separation’\n
  • this looks better! = formed of defined community clusters\n\nTo help pick out the most important individuals in the network, we need to rank this by ‘Degrees of separation’\n
  • we’ve ranked degrees by colour. The ‘greenest’ nodes have the highest degree\n\nNow let’s find out who are the most influential by ranking the network by ‘Betweenness Centrality’\n
  • we’ve ranked degrees by colour. The ‘greenest’ nodes have the highest degree\n\nNow let’s find out who are the most influential by ranking the network by ‘Betweenness Centrality’\n
  • we’ve ranked degrees by colour. The ‘greenest’ nodes have the highest degree\n\nNow let’s find out who are the most influential by ranking the network by ‘Betweenness Centrality’\n
  • Despite the colour and size of nodes helping us to see who are the most influential in this network, it is still too busy\n\nso we’ll filter out the weaker points of the network\n
  • Despite the colour and size of nodes helping us to see who are the most influential in this network, it is still too busy\n\nso we’ll filter out the weaker points of the network\n
  • Despite the colour and size of nodes helping us to see who are the most influential in this network, it is still too busy\n\nso we’ll filter out the weaker points of the network\n
  • These look like the most influential people we should be talking to\n\nThey might have been deep on an org chart but in this context they are very important to us.\n
  • \n
  • we (able) see = different community clusters -> these people belonged to -> identified -> our graph\n
  • in effect we created = alternative org chart\n\n*AND* revealed = another power structure within an organisation\n
  • studying sociograms might = enable large organisations -> understand which individuals -> connecting remote communities\n\n*AND* very useful = those of us -> air-dropped into organisations -> limited time to learn -> people + culture\n
  • Let's look = how content relationships -> visualised\n
  • Let's look = how content relationships -> visualised\n
  • Let's look = how content relationships -> visualised\n
  • Let's look = how content relationships -> visualised\n
  • though many of us = prepare web content -> isolated docs + locations -> ready for assembly -> they are = rarely read that way\n\n*SO* fully appreciate -> story we're telling (audiences) = we might -> take a look from above...\n
  • ...like this this graph = rendering = most frequently trodden paths -> through a website\n\nby studying it = possible glean significant information -> our content + relationships between it\n\n*BUT* before we make one = me explain what we're looking at\n
  • ...like this this graph = rendering = most frequently trodden paths -> through a website\n\nby studying it = possible glean significant information -> our content + relationships between it\n\n*BUT* before we make one = me explain what we're looking at\n
  • ...like this this graph = rendering = most frequently trodden paths -> through a website\n\nby studying it = possible glean significant information -> our content + relationships between it\n\n*BUT* before we make one = me explain what we're looking at\n
  • ...here’s a simplified example:\n\n*SO* each node (bubble) = web page\n\n*AND* each edge (line) = flow of users between pages (traffic)\n
  • ...here’s a simplified example:\n\n*SO* each node (bubble) = web page\n\n*AND* each edge (line) = flow of users between pages (traffic)\n
  • *SO* larger nodes = higher PageRank = (essentially) higher importance -> relation other pages (set)\n\n*AND* thicker edges = frequency of flow -> between pages -> this direction\n
  • *SO* larger nodes = higher PageRank = (essentially) higher importance -> relation other pages (set)\n\n*AND* thicker edges = frequency of flow -> between pages -> this direction\n
  • Indebted = Dorian Taylor -> this technique\n\nread everything he's ever written, particularly his essay for contents magazine\n
  • In his article = "visualising paths through the web" -> Dorian points out -> web servers = log information -> each available referring resource (prev. page) + each new request we make (next page)\n
  • let's take a look -> sample web server log = looks a lot like = log file sample we saw earlier *BUT* = subtle differences\n
  • focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
  • focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
  • focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
  • focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
  • focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
  • cleaning data = case of stripping it -> all non-human visits -> web crawlers = Google bot, Bing Bot and others\n\nfind and replace = your friend and ally\n
  • …add them -> 2 columned spreadsheet -> like before\n\nreferring URLs -> under ‘Source’ -> referent URLs -> under ‘Target’\n\n*FINALLY* save as .csv\n
  • …add them -> 2 columned spreadsheet -> like before\n\nreferring URLs -> under ‘Source’ -> referent URLs -> under ‘Target’\n\n*FINALLY* save as .csv\n
  • …add them -> 2 columned spreadsheet -> like before\n\nreferring URLs -> under ‘Source’ -> referent URLs -> under ‘Target’\n\n*FINALLY* save as .csv\n
  • before continue -> introduce -> working example -> remainder (this talk)\n\ncross country coaches = fictitious UK bus + coach operator = generous with their data!\n\nmain services include: airport runs -> day trips to major towns + cities -> attractions -> sporting + music events\n
  • all roads (should) -> journey planner = buying e-tickets\n
  • *AGAIN* import our data in Gephi = tool = visualising + manipulating -> data -> graph form\n
  • we have a BIGGER mess = randomly distributed nodes and edges\n\nwe’ll run the same layout algorithm as before to position our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
  • we have a BIGGER mess = randomly distributed nodes and edges\n\nwe’ll run the same layout algorithm as before to position our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
  • we have a BIGGER mess = randomly distributed nodes and edges\n\nwe’ll run the same layout algorithm as before to position our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
  • this looks better! = formed of defined clusters\n\nTo help pick out the most important pages/content in the network, we need to rank this by ‘PageRank’\n
  • this looks better! = formed of defined clusters\n\nTo help pick out the most important pages/content in the network, we need to rank this by ‘PageRank’\n
  • this looks better! = formed of defined clusters\n\nTo help pick out the most important pages/content in the network, we need to rank this by ‘PageRank’\n
  • Despite the colour and size of nodes helping us to see what is the most influential content in this network, it is still too busy\n\nso, again, we’ll filter out the weaker points of the network\n
  • All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
  • All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
  • All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
  • All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
  • All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
  • Nevertheless, what we have ourselves is a sample of the most frequently-trodden paths this website = a very useful starting point\n\nSo we’ll export the node data as a .csv as this will come in handy later\n
  • \n
  • visualising the referring/referent data -> web server logs = provided different perspective -> stories -> we're telling our audiences\n
  • by filtering data = removed visual complexity = to reveal key paths + stops users making\n
  • *AND* exporting node data = generated sample most frequently accessed site content\n
  • \n
  • \n
  • \n
  • \n
  • we can add -> further value + depth -> visualisations = informing them -> data from our own investigations\n
  • Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  • Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  • Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  • Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  • Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  • Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  • Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  • Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  • Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  • Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  • *OR* could use = Google Fusion Tables = help quickly merge data across 2 spreadsheets = pairing up 1 (or more) columns -> same values\n
  • now we = download CSV -> take it where we please\n
  • now we = download CSV -> take it where we please\n
  • now we = download CSV -> take it where we please\n
  • What could we do -> exported node table -> now we have = additional data?\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  • let's pose a question\n
  • here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  • here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  • here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  • here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  • here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  • here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  • here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  • *SO* let’s return to our intro = Cross Country Coaches\n\nTourism Partners responsible for = content for each town/city destination\n
  • by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
  • by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
  • by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
  • by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
  • by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
  • could external data = help us prioritise our efforts?\n\nlet's bring some into play\n
  • *SO* I downloaded data = domestic tourism stats for 09-11 = VisitBritain\n
  • only problem was = data = only available PDF format\n\nthis happens a lot!\n
  • *BUT* you can save hours -> re-keying + checking = using tools like these\n\nthese free tools = convert PDF-to-Excel\n
  • we can use = Google Fusion Tables = merge our data -> each town + city destination -> with downloaded VisitBritain data\n\nonly inherit the data -> corresponds -> our towns + cities \n\n*SO* let's map the data\n
  • we can use = Google Fusion Tables = merge our data -> each town + city destination -> with downloaded VisitBritain data\n\nonly inherit the data -> corresponds -> our towns + cities \n\n*SO* let's map the data\n
  • here's a map of British Isles -> generated via Tableau Public\n\nincluded are = bubbles marking -> each town + city -> from our data\n\nTableau public automatically recognised this as location data\n
  • *SO* bubble sizes represent = total domestic trips 09-11 (millions)\n\n*AND* colour intensity represents = total spend 09-11 (millions)\n
  • *SO* bubble sizes represent = total domestic trips 09-11 (millions)\n\n*AND* colour intensity represents = total spend 09-11 (millions)\n
  • *SO* bubble sizes represent = total domestic trips 09-11 (millions)\n\n*AND* colour intensity represents = total spend 09-11 (millions)\n
  • *SO* bubble sizes represent = total domestic trips 09-11 (millions)\n\n*AND* colour intensity represents = total spend 09-11 (millions)\n
  • as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  • as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  • as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  • as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  • as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  • as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  • as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  • as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  • \n
  • filtering + partitioning -> data -> own investigations = took us deeper into -> data\n\n
  • importing -> external data = provided possible new angles + ideas\n
  • adding interactive elements = helped develop narrative around -> data\n\n*AND* invites our audiences = delve deeper still\n
  • \n
  • \n
  • \n
  • \n
  • play + experiment -> your data\n\nhave fun with it -> ask questions of it -> don't fear it\n\n*THEN* = often yield its secrets + stories = surprising ease\n
  • we (often) associate numbers = authority and certainty\n\nbut uncertainly = great way -> raising new questions -> sharing them with others\n
  • by creating + sharing -> your work -> with others = you might find -> get help + co-operation back\n
  • \n
  • \n
  • \n
  • \n

Visualising data: Seeing is Believing - CS Forum 2012 Visualising data: Seeing is Believing - CS Forum 2012 Presentation Transcript

  • Visualising Data Seeing is BelievingRichard Ingram@richardjingram
  • Observe
  • Observe Analyse
  • It’s worked to our adaptive advantage in competition with other animal species.
  • We’ve developed a greater variety of ways to communicate and express ourselves.
  • Paleolithic painting of giant elkLascaux, southwestern France (c. 15,300 BC)
  • Upper Paleolithic painting of bisonAltamira cave, northern Spain (c. 23,000 BC – c. 33,000 BC)
  • Da Vinci’s Vitruvian ManPen and ink on paper (c. 1487)
  • Florence Nightingale’s coxcomb diagramShowed the causes of death in the British Army in the Crimea (1858)
  • Copernicus’s heliocentric universe diagramHe literally moved heaven and earth to draw it (c. 1543)
  • Ptolemy’s theory of a heliocentric universeEverything revolved around us, and that suited just fine.
  • On the Revolutions of the Celestial SpheresContains thousands of years worth of astrological data (c. 1543)
  • The data delugePower to the people
  • “The release of data is acornerstone of how tostrengthen the role of citizensand government, and recast therelationship between the two.” ~ Sir Tim Berners-Lee Interview with The Guardian (2010)
  • “Having the data is notenough.You have to show itin ways that people bothenjoy and understand.” ~ Dr. Hans Rosling The Joy Of Stats (BBC Television, 2011)
  • Free tools for data extraction, exploration and visualisation Gephi (gephi.org) Google Chart Tools (developers.google.com/chart/) Google Fusion Tables (google.com/fusiontables/) Google Refine (code.google.com/p/google-refine/) Lucid Chart (lucidchart.com) ManyEyes (www-958.ibm.com) OpenStreetMap (openstreetmap.org) ScraperWiki (scraperwiki.com) Tableau Public (tableausoftware.com/public/)
  • Tableau Public Lucid Charttableausoftware.com/public/ lucidchart.comGoogle Chart Tools ManyEyesdevelopers.google.com/chart/ www-958.ibm.com
  • Gephigephi.org
  • OpenStreetMap Google Fusion Tablesopenstreetmap.org google.com/fusiontables/
  • ScraperWikiscraperwiki.com
  • Visualising data... ...allows us to quickly explore and analysepossible relationships between things and how they vary together.
  • Visualising data... ...can be an effective way to amplify and simplify the communication of ourrecommendations and arguments to other audiences.
  • Visualising dataThe hidden people networks
  • Internal author interviewsWho to speak to? Org charts can help.
  • Internal author interviewsHow do we reveal those hidden relationships?
  • SociogramVisualises the structure and patterns of group interactions. Connection
  •   (Edge) Person
  •   (Node)
  • SociogramVisualises the structure and patterns of group interactions. Connection
  •   (Edge) Person
  •   (Node)
  • SociogramCan reveal community clusters and calculate networkscience parameters like degrees of separation. More
  •    connections
  •   =
  •    Higher
  •   degree
  • SociogramCan reveal community clusters and calculate networkscience parameters like degrees of separation. More
  •    connections
  •   =
  •    Higher
  •   degree
  • SociogramCan reveal community clusters and calculate networkscience parameters like betweenness centrality. Higher
  •   network
  •    importance
  •    =
  •   Higher
  •   betweenness
  • SociogramCan reveal community clusters and calculate networkscience parameters like betweenness centrality. Higher
  •   network
  •    importance
  •    =
  •   Higher
  •   betweenness
  • Log file entries Whoa, what in the name of thunder does this all mean?#Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-querysc-status cs(User-Agent)2012-09-03 00:10:19 XXX.XXX.X.211 clarke_n XXX.XXX.X.103 80 GET /admin/pages/content.php?id=84 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/pages/content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/pages/content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:13:20 XXX.XXX.X.29 arecchi_f XXX.XXX.X.103 80 GET /admin/pages/content.php?id=168 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/pages/content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:14:38 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/content.php?id=73 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • Log file entries: a closer lookImportant elements in bold: date/time stamp,client ip address, username, and contentaccessed:2012-09-03 00:09:53 XXX.XXX.X.104russell_g XXX.XXX.X.103 80 GET /admin/pages/content.php?id=12Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • Log file entries: a closer lookImportant elements in bold: date/time stamp,client ip address, username, and contentaccessed:2012-09-03 00:09:53 XXX.XXX.X.104russell_g XXX.XXX.X.103 80 GET /admin/pages/content.php?id=12Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • Log file entries: a closer lookImportant elements in bold: date/time stamp,client ip address, username, and contentaccessed:2012-09-03 00:09:53 XXX.XXX.X.104russell_g XXX.XXX.X.103 80 GET /admin/pages/content.php?id=12Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • Log file entries: a closer lookImportant elements in bold: date/time stamp,client ip address, username, and contentaccessed:2012-09-03 00:09:53 XXX.XXX.X.104russell_g XXX.XXX.X.103 80 GET /admin/pages/content.php?id=12Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • Log file entries: a closer lookImportant elements in bold: date/time stamp,client ip address, username, and contentaccessed:2012-09-03 00:09:53 XXX.XXX.X.104russell_g XXX.XXX.X.103 80 GET /admin/pages/content.php?id=12Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • Log file entries We want to locate authors accessing the same content within a certain timeframe.#Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-querysc-status cs(User-Agent)2012-09-03 00:10:19 XXX.XXX.X.211 clarke_n XXX.XXX.X.103 80 GET /admin/pages/content.php?id=84 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/pages/content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/pages/content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:13:20 XXX.XXX.X.29 arecchi_f XXX.XXX.X.103 80 GET /admin/pages/content.php?id=168 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/pages/content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:14:38 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/content.php?id=73 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • Log file entries We want to locate authors accessing the same content within a certain timeframe.2012-09-03 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/pages/content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/pages/content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)2012-09-03 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/pages/content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • Log file entries We want to locate authors accessing the same content within a certain timeframe.00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/page?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-av)00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/page?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-av)00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-av)00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/p?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-av)
  • Log file entriesWe return to the relative safety of the spreadsheet.
  • Log file entriesWe return to the relative safety of the spreadsheet.
  • Log file entriesWe return to the relative safety of the spreadsheet.
  • Gephigephi.org
  • Big messLet’s run a layout algorithm.
  • That’s better...But who are the most influential?
  • So, what did we learn?
  • We were able to easily spot the differentcommunity clusters to which people were connected.
  • Visualising the log file data helped us create an alternative org chart.
  • Using network science parameters we sawwhich individuals held the most influence over multiple groups.
  • Visualising dataRelationships between content
  • For many of us, published web content is increasinglyformed of flexible modules scattered across documentsand locations, but they are seldom read that way.
  • Frequently trodden pathsMapping the movement of users between web pages.
  • Frequently trodden pathsWeightier nodes and edges indicate key paths and stops.
  • Frequently trodden paths Weightier nodes and edges indicate key paths and stops.Page
  •   (Node)
  • Frequently trodden paths Weightier nodes and edges indicate key paths and stops. User
  •   flow (Edge)Page
  •   (Node)
  • Frequently trodden paths Weightier nodes and edges indicate key paths and stops. User
  •   flow (Edge)Page
  •   (Node)
  • Frequently trodden paths Weightier nodes and edges indicate key paths and stops. User
  •   flow (Edge)Page
  •   (Node)
  • Dorian Taylordoriantaylor.org
  • Web server logsStore information on each request we make. Referring Resource New request (Previous page) (Next page) Source: doriantaylor.com/visualizing-paths-through-the-web
  • Web server logs Haven’t we been here before? Not quite.XX.XXX.XXX.86 [30/Aug/2012:11:09:27 +0100] GET /contact-us/ HTTP/1.1 200 14728 -"http://www.crosscountrycoaches.com/destinations/" "Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100] GET / HTTP/1.1 200 12007 - "http://www.crosscountrycoaches.com/destinations/" "Mozilla/5.0 (Windows NT 6.0; rv:14.0)Gecko/20100101 Firefox/14.0.1"XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100] GET /contact-us/view-your-ticket/ HTTP/1.1200 14084 - "http://www.crosscountrycoaches.com/" "Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"XX.XXX.XXX.86 [30/Aug/2012:11:09:37 +0100] GET /services/ HTTP/1.1 200 13428 -"http://www.crosscountrycoaches.com/" "Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"XX.XXX.XXX.86 [30/Aug/2012:11:09:38 +0100] GET /login/ HTTP/1.1 200 17284 - "http://www.crosscountrycoaches.com/services/" "Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100] GET /reprint-your-ticket/ HTTP/1.1 20027788 - "http://www.crosscountrycoaches.com/services/" "Mozilla/5.0 (Windows NT 6.0;rv:14.0) Gecko/20100101 Firefox/14.0.1"XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100] GET /services/terms-and-conditions/ HTTP/1.1 200 11638 - "http://www.crosscountrycoaches.com/reprint-your-ticket/" "Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"
  • Web server logs: a closer look Important elements in bold: client ip address, next page and previous page:XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100]GET /contact-us/view-your-ticket/ HTTP/1.1 20014084 - "http://www.crosscountrycoaches.com/""Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100]GET /reprint-your-ticket/ HTTP/1.1 200 27788 -"http://www.crosscountrycoaches.com/services/""Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"
  • Web server logs: a closer look Important elements in bold: client ip address, next page and previous page:XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100]GET /contact-us/view-your-ticket/ HTTP/1.1 20014084 - "http://www.crosscountrycoaches.com/""Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100]GET /reprint-your-ticket/ HTTP/1.1 200 27788 -"http://www.crosscountrycoaches.com/services/""Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"
  • Web server logs: a closer look Important elements in bold: client ip address, next page and previous page:XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100]GET /contact-us/view-your-ticket/ HTTP/1.1 20014084 - "http://www.crosscountrycoaches.com/""Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100]GET /reprint-your-ticket/ HTTP/1.1 200 27788 -"http://www.crosscountrycoaches.com/services/""Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"
  • Web server logs: a closer look Important elements in bold: client ip address, next page and previous page:XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100]GET /contact-us/view-your-ticket/ HTTP/1.1 20014084 - "http://www.crosscountrycoaches.com/""Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100]GET /reprint-your-ticket/ HTTP/1.1 200 27788 -"http://www.crosscountrycoaches.com/services/""Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1"
  • Web server logsCleaning up the data.Most non-human activity can be removed by searchingfor: Bingbot (google.com/bot.html) Googlebot (bing.com/bingbot.htm) Any mentions of ‘bots’, ‘spiders’, and ‘crawlers’
  • Web server logsExtract the URL data and add to a spreadsheet.
  • Web server logsExtract the URL data and add to a spreadsheet.
  • Web server logsExtract the URL data and add to a spreadsheet.
  • Cross Country Coaches The entirely fictitious UK-wide bus and coach operator. Airports Attractions Holiday camps andDirect to terminal amusement parks Day trips EventsTowns and cities Sports and music
  • Cross Country Coaches The entirely fictitious UK-wide bus and coach operator. Airports Attractions Holiday camps andDirect to terminal amusement parks Journey Planner Purchasing e-tickets Day trips EventsTowns and cities Sports and music
  • Gephigephi.org
  • An even bigger messLet’s run a layout algorithm.
  • That’s better...Let’s pick out the influential pages.
  • Export node dataThis will come in handy later
  • So, what did we learn?
  • Visualising the web server log data providedus with a different perspective of the stories we’re telling our audiences.
  • Filtering the data took away the visualcomplexity and revealed to us the key paths and stops.
  • Exporting the node data generated a page- level inventory of the most frequently accessed content.
  • Visualising dataPlaying with numbers
  • Utilising internal dataBringing data from your own investigations into play.
  • Exported node tableManually feed in data from your own investigations.
  • Exported node tableManually feed in data from your own investigations.
  • Exported node tableManually feed in data from your own investigations.
  • Exported node tableManually feed in data from your own investigations.
  • Exported node tableManually feed in data from your own investigations.
  • Exported node tableManually feed in data from your own investigations.
  • Exported node tableAutomatically feed in data from your own investigations. Exported node table + Content inventory = Extended node table
  • Google Fusion Tablesgoogle.com/fusiontables/
  • What can we do with this additional data?
  • Tableau Publictableausoftware.com/public/
  • Q. Is there a perceived difference in thequality of content maintained in-house and through external partners?
  • Cross Country Coaches The entirely fictitious UK-wide bus and coach operator. Airports Attractions Holiday camps andDirect to terminal amusement parks Day trips EventsTowns and cities Sports and music
  • Cross Country Coaches The entirely fictitious UK-wide bus and coach operator. Airports Attractions Holiday camps andDirect to terminal amusement parks Day trips EventsTowns and cities Sports and music
  • Utilising external dataBringing public data into play.
  • VisitBritain.orgDomestic tourism statistics
  • Regional breakdown of tourism valueOnly available in handy PDF format [!].
  • PDF to Excel Online Zamzarpdftoexcelonline.com zamzar.com
  • Google Fusion Tablesgoogle.com/fusiontables/
  • Priorities?
  • So, what did we learn?
  • Furnishing our exported node table withdata from our own investigations allowed for deeper dives into our sample.
  • Importing external data provided us with possible new angles and ideas.
  • Adding interactive elements helped usdevelop a basic narrative around our data.
  • Visualising data You can do it!
  • Don’t be afraid of data. Treat it as something to play with and explore.
  • Dont worry about not having all theanswers. Uncertainly can be a great way of raising new questions.
  • Create something that could be of use toothers and you might find you get help and cooperation back.
  • Thank you.Richard Ingram@richardjingram
  • Recommended readingSemiology of Graphics: Diagrams, Networks, Mapsby Jacques BertinThe Data Journalism Handbook FREE!edited by Jonathan Gray, Liliana Bounegru, and Lucy ChambersDesigning Data Visualizationsby Noah Iliinsky and Julie SteeleInformation is Beautifulby David McCandlessEnvisioning Informationby Edward R. Tufte
  • Image sources and creditsMount Teide (palmstorys.com/bilder/Englishversions/Tenerife.html)Stratovolcano cross-section by Woudloper (http://commons.wikimedia.org/wiki/File:Stratovolcano_cross-section.svg)The Expression of the Emotions in Man and Animals (various) (http://commons.wikimedia.org/wiki/The_Expression_of_the_Emotions_in_Man_and_Animals)Paleolithic painting of giant elk (http://commons.wikimedia.org/wiki/File:Lascaus,_Megaloceros.jpg)Upper Paleolithic painting of bison by Ramessos (http://commons.wikimedia.org/wiki/File:AltamiraBison.jpg)Vitruvian Man (http://commons.wikimedia.org/wiki/File:Da_Vinci_Vitruve_Luc_Viatour.jpg)Diagram of the causes of mortality in the army in the East (http://commons.wikimedia.org/wiki/File:Nightingale-mortality.jpg)
  • Image sources and credits cont...On the revolutions of the celestial spheres (http://ads.harvard.edu/books/1543droc.book/)Geocentric model of the solar system (http://commons.wikimedia.org/wiki/File:Ptolemy_Sky.jpg)Tim Berners-Lee by Silvio Tanaka (http://commons.wikimedia.org/wiki/File:Tim_Berners-Lee_CP.jpg)Hans Rosling (http://novartisfoundation.org/platform/content/element/3967/2336.jpg)Nicolaus Copernicus (http://commons.wikimedia.org/wiki/File:Nikolaus_Kopernikus.jpg)Isaac Newton (http://commons.wikimedia.org/wiki/File:Sir_Isaac_Newton_by_Sir_Godfrey_Kneller,_Bt.jpg)