When patterns and connections are revealed between numbers, content and people that might otherwise be too abstract or scattered to be grasped, we’re able to make better sense of where we are, what it might mean and what needs to be done.
24. “The release of data is a
cornerstone of how to
strengthen the role of citizens
and government, and recast the
relationship between the two.”
~ Sir Tim Berners-Lee
Interview with The Guardian (2010)
25.
26. “Having the data is not
enough.You have to show it
in ways that people both
enjoy and understand.”
~ Dr. Hans Rosling
The Joy Of Stats (BBC Television, 2011)
27. Free tools for data extraction, exploration
and visualisation
Gephi (gephi.org)
Google Chart Tools (developers.google.com/chart/)
Google Fusion Tables (google.com/fusiontables/)
Google Refine (code.google.com/p/google-refine/)
Lucid Chart (lucidchart.com)
ManyEyes (www-958.ibm.com)
OpenStreetMap (openstreetmap.org)
ScraperWiki (scraperwiki.com)
Tableau Public (tableausoftware.com/public/)
28. Tableau Public Lucid Chart
tableausoftware.com/public/ lucidchart.com
Google Chart Tools ManyEyes
developers.google.com/chart/ www-958.ibm.com
69. Log file entries
Whoa, what in the name of thunder does this all mean?
#Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query
sc-status cs(User-Agent)
2012-09-03 00:10:19 XXX.XXX.X.211 clarke_n XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=84 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:20 XXX.XXX.X.29 arecchi_f XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=168 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:14:38 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=73 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
70. Log file entries: a closer look
Important elements in bold: date/time stamp,
client ip address, username, and content
accessed:
2012-09-03 00:09:53 XXX.XXX.X.104
russell_g XXX.XXX.X.103 80 GET /
admin/pages/content.php?id=12
Cmd=contents 200 Mozilla/4.76+[en]+
(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
71. Log file entries: a closer look
Important elements in bold: date/time stamp,
client ip address, username, and content
accessed:
2012-09-03 00:09:53 XXX.XXX.X.104
russell_g XXX.XXX.X.103 80 GET /
admin/pages/content.php?id=12
Cmd=contents 200 Mozilla/4.76+[en]+
(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
72. Log file entries: a closer look
Important elements in bold: date/time stamp,
client ip address, username, and content
accessed:
2012-09-03 00:09:53 XXX.XXX.X.104
russell_g XXX.XXX.X.103 80 GET /
admin/pages/content.php?id=12
Cmd=contents 200 Mozilla/4.76+[en]+
(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
73. Log file entries: a closer look
Important elements in bold: date/time stamp,
client ip address, username, and content
accessed:
2012-09-03 00:09:53 XXX.XXX.X.104
russell_g XXX.XXX.X.103 80 GET /
admin/pages/content.php?id=12
Cmd=contents 200 Mozilla/4.76+[en]+
(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
74. Log file entries: a closer look
Important elements in bold: date/time stamp,
client ip address, username, and content
accessed:
2012-09-03 00:09:53 XXX.XXX.X.104
russell_g XXX.XXX.X.103 80 GET /
admin/pages/content.php?id=12
Cmd=contents 200 Mozilla/4.76+[en]+
(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
75. Log file entries
We want to locate authors accessing the same content
within a certain timeframe.
#Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query
sc-status cs(User-Agent)
2012-09-03 00:10:19 XXX.XXX.X.211 clarke_n XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=84 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:20 XXX.XXX.X.29 arecchi_f XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=168 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:14:38 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=73 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
76. Log file entries
We want to locate authors accessing the same content
within a certain timeframe.
2012-09-03 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
77. Log file entries
We want to locate authors accessing the same content
within a certain timeframe.
00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/page
?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
av)
00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/page
?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
av)
00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/
?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
av)
00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/p
?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
av)
106. For many of us, published web content is increasingly
formed of flexible modules scattered across documents
and locations, but they are seldom read that way.
123. Web server logs
Store information on each request we make.
Referring Resource New request
(Previous page) (Next page)
Source: doriantaylor.com/visualizing-paths-through-the-web
124. Web server logs
Haven’t we been here before? Not quite.
XX.XXX.XXX.86 [30/Aug/2012:11:09:27 +0100] GET /contact-us/ HTTP/1.1 200 14728 -
http://www.crosscountrycoaches.com/destinations/ Mozilla/5.0 (Windows NT 6.0; rv:
14.0) Gecko/20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100] GET / HTTP/1.1 200 12007 - http://
www.crosscountrycoaches.com/destinations/ Mozilla/5.0 (Windows NT 6.0; rv:14.0)
Gecko/20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100] GET /contact-us/view-your-ticket/ HTTP/1.1
200 14084 - http://www.crosscountrycoaches.com/ Mozilla/5.0 (Windows NT 6.0; rv:
14.0) Gecko/20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:37 +0100] GET /services/ HTTP/1.1 200 13428 -
http://www.crosscountrycoaches.com/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:38 +0100] GET /login/ HTTP/1.1 200 17284 - http://
www.crosscountrycoaches.com/services/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100] GET /reprint-your-ticket/ HTTP/1.1 200
27788 - http://www.crosscountrycoaches.com/services/ Mozilla/5.0 (Windows NT 6.0;
rv:14.0) Gecko/20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100] GET /services/terms-and-conditions/ HTTP/
1.1 200 11638 - http://www.crosscountrycoaches.com/reprint-your-ticket/ Mozilla/
5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1
125. Web server logs: a closer look
Important elements in bold: client ip address, next
page and previous page:
XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100]
GET /contact-us/view-your-ticket/ HTTP/1.1 200
14084 - http://www.crosscountrycoaches.com/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100]
GET /reprint-your-ticket/ HTTP/1.1 200 27788 -
http://www.crosscountrycoaches.com/services/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
126. Web server logs: a closer look
Important elements in bold: client ip address, next
page and previous page:
XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100]
GET /contact-us/view-your-ticket/ HTTP/1.1 200
14084 - http://www.crosscountrycoaches.com/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100]
GET /reprint-your-ticket/ HTTP/1.1 200 27788 -
http://www.crosscountrycoaches.com/services/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
127. Web server logs: a closer look
Important elements in bold: client ip address, next
page and previous page:
XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100]
GET /contact-us/view-your-ticket/ HTTP/1.1 200
14084 - http://www.crosscountrycoaches.com/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100]
GET /reprint-your-ticket/ HTTP/1.1 200 27788 -
http://www.crosscountrycoaches.com/services/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
128. Web server logs: a closer look
Important elements in bold: client ip address, next
page and previous page:
XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100]
GET /contact-us/view-your-ticket/ HTTP/1.1 200
14084 - http://www.crosscountrycoaches.com/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100]
GET /reprint-your-ticket/ HTTP/1.1 200 27788 -
http://www.crosscountrycoaches.com/services/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
129. Web server logs
Cleaning up the data.
Most non-human activity can be removed by searching
for:
Bingbot (google.com/bot.html)
Googlebot (bing.com/bingbot.htm)
Any mentions of ‘bots’, ‘spiders’, and ‘crawlers’
133. Cross Country Coaches
The entirely fictitious UK-wide bus and coach operator.
Airports Attractions
Holiday camps and
Direct to terminal
amusement parks
Day trips Events
Towns and cities Sports and music
134. Cross Country Coaches
The entirely fictitious UK-wide bus and coach operator.
Airports Attractions
Holiday camps and
Direct to terminal
amusement parks
Journey Planner
Purchasing e-tickets
Day trips Events
Towns and cities Sports and music
174. Q. Is there a perceived difference in the
quality of content maintained in-house and
through external partners?
175.
176.
177.
178.
179.
180.
181. Cross Country Coaches
The entirely fictitious UK-wide bus and coach operator.
Airports Attractions
Holiday camps and
Direct to terminal
amusement parks
Day trips Events
Towns and cities Sports and music
182. Cross Country Coaches
The entirely fictitious UK-wide bus and coach operator.
Airports Attractions
Holiday camps and
Direct to terminal
amusement parks
Day trips Events
Towns and cities Sports and music
219. Recommended reading
Semiology of Graphics: Diagrams, Networks, Maps
by Jacques Bertin
The Data Journalism Handbook FREE!
edited by Jonathan Gray, Liliana Bounegru, and Lucy Chambers
Designing Data Visualizations
by Noah Iliinsky and Julie Steele
Information is Beautiful
by David McCandless
Envisioning Information
by Edward R. Tufte
220. Image sources and credits
Mount Teide (palmstorys.com/bilder/Englishversions/Tenerife.html)
Stratovolcano cross-section by Woudloper (http://commons.wikimedia.org/wiki/
File:Stratovolcano_cross-section.svg)
The Expression of the Emotions in Man and Animals (various) (http://
commons.wikimedia.org/wiki/The_Expression_of_the_Emotions_in_Man_and_Animals)
Paleolithic painting of giant elk (http://commons.wikimedia.org/wiki/
File:Lascaus,_Megaloceros.jpg)
Upper Paleolithic painting of bison by Ramessos (http://commons.wikimedia.org/wiki/
File:AltamiraBison.jpg)
Vitruvian Man (http://commons.wikimedia.org/wiki/
File:Da_Vinci_Vitruve_Luc_Viatour.jpg)
Diagram of the causes of mortality in the army in the East (http://
commons.wikimedia.org/wiki/File:Nightingale-mortality.jpg)
221. Image sources and credits cont...
On the revolutions of the celestial spheres (http://ads.harvard.edu/books/
1543droc.book/)
Geocentric model of the solar system (http://commons.wikimedia.org/wiki/
File:Ptolemy_Sky.jpg)
Tim Berners-Lee by Silvio Tanaka (http://commons.wikimedia.org/wiki/File:Tim_Berners-
Lee_CP.jpg)
Hans Rosling (http://novartisfoundation.org/platform/content/element/3967/2336.jpg)
Nicolaus Copernicus (http://commons.wikimedia.org/wiki/File:Nikolaus_Kopernikus.jpg)
Isaac Newton (http://commons.wikimedia.org/wiki/
File:Sir_Isaac_Newton_by_Sir_Godfrey_Kneller,_Bt.jpg)
Editor's Notes
\n
humans -> observe + analyse -> contents -> world around us -> unique + astonishing\n\n*BUT* form + communicate -> verbal + visual concepts\n
one respect -> superior -> other animals -> we've developed -> greater variety -> systems of communication\n\n*AND* one = art\n
earliest known examples -> human expression -> demonstrate -> our ability -> chaotic + complex environments -> under control -> magic of art\n\n*BECAUSE* depict something = transform it -> whatever form/shape -> want\n
*THOUGH* no one knows for sure -> real purpose? = cave paintings = experts believe -> primarily wild animals (cows, deer, horses, elk + bison) -> attempt to control or ‘tame’ them\n\n*INTERESTING* domesticated -> 1000s years\n
if art + other forms -> creative expression = transform + interpret -> science = identifier + unifier\n\nfew better collisions -> 2 cultures = diagram\n
world = full of famous + recognisable diagram\n\nmost potent -> ability -> express complex ideas simply -> intellectual + artistic beauty -> power -> shift perspectives -> change minds\n
500+ years ago -> (little known) Polish cleric -> Nicolaus Copernicus -> came to revolutionise -> we see -> our place -> universe\n
1000s years -> scholars + religious scriptures -> steadfast -> belief -> we (earth) = static centrepiece of universe = everything revolved around us\n\n*SO* against weight -> established teachings + opinions -> *HOW* back up theory?\n
scan pages -> life's work = contains pages of tabular data and calculations\n\n*AS WELL AS* using = own astrological observations = recalculate planetary positions -> 1000s year's worth data from past astronomers = underpinned diagram\n\n
despite weight (foundations) = beauty lies -> simplicity -> you or I could draw it\n
for someone -> benefited -> data from diff sources -> would've welcomed -> data revolution\n
for someone -> benefited -> data from diff sources -> would've welcomed -> data revolution\n
for someone -> benefited -> data from diff sources -> would've welcomed -> data revolution\n
for someone -> benefited -> data from diff sources -> would've welcomed -> data revolution\n
barriers (lowered) -> ourselves -> rich datasets -> information -> communities + politics + governments\n\n*IF* access to -> numbers + statistics = what's really going on = we can start make things better -> local + national + Inter'l level\n
barriers (lowered) -> ourselves -> rich datasets -> information -> communities + politics + governments\n\n*IF* access to -> numbers + statistics = what's really going on = we can start make things better -> local + national + Inter'l level\n
barriers (lowered) -> ourselves -> rich datasets -> information -> communities + politics + governments\n\n*IF* access to -> numbers + statistics = what's really going on = we can start make things better -> local + national + Inter'l level\n
barriers (lowered) -> ourselves -> rich datasets -> information -> communities + politics + governments\n\n*IF* access to -> numbers + statistics = what's really going on = we can start make things better -> local + national + Inter'l level\n
offshoots -> emergence -> new + powerful tools = interrogating + presenting data.\n\nallow us = communicate findings -> in a manner -> audiences = can understand + act upon\n
offshoots -> emergence -> new + powerful tools = interrogating + presenting data.\n\nallow us = communicate findings -> in a manner -> audiences = can understand + act upon\n
offshoots -> emergence -> new + powerful tools = interrogating + presenting data.\n\nallow us = communicate findings -> in a manner -> audiences = can understand + act upon\n
offshoots -> emergence -> new + powerful tools = interrogating + presenting data.\n\nallow us = communicate findings -> in a manner -> audiences = can understand + act upon\n
These tools (basic level) = free to access and use\n
From -> graphics software = produce interactive visualisations\n
…tools = visualising + manipulating -> data -> graph form\n
…tools = mapping location data\n
…tools for scraping + retrieving + converting data\n
I've found -> visualising data -> effective way -> test buoyancy theories\n\n*ALSO* gain deeper understanding -> hidden processes -> exist (organisations)\n
*ALSO* I've found -> effective method -> amplifying + simplifying -> communication -> my recommendations + arguments -> people effect change in organisations\n
\n
\n
\n
\n
(one of) effective methods employed -> discover + analyse -> lifecycle -> organisation's content -> conduct series 1-on-1 interviews -> key members -> authoring team\n\n*BUT* locating -> right people -> multi-departmental organisation = isn't always easy\n
consulting -> hierarchal org charts = useful\n*BUT* don't (always) reveal -> hidden relationships -> forged by content\n*NOR* always show -> real influence lies -> organisation\n
*THIS* a sociogram -> put simply = visualised social network\n\n*IT CAN* powerful tool = discovering deeper meanings behind -> relationships + communities -> within a network\n
*SO* each node (bubble) = person within network\n\n*AND* each edge (line) = connection between two people\n
used = reveal community clusters = might not always be reflected -> org chart\n\n*AND* used = calculate network science parameters = degrees of separation\n\n*SO* the more connections -> individual has = higher their degree\n
*ALSO* used = calculate betweenness centrality\n\n*OR* I prefer to call it = influence\n\n*SO* more connections -> individual has -> different community clusters = higher their betweenness\n
most CMSs + intranet servers -> generate + store -> sets log files = record activity (users)\n\n*WHEN* opened = might look like this\n
focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
*BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
*BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
*BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
*BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
*BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
*BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
*BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
…add them -> 2 columned spreadsheet\n\nfirst usernames -> under ‘Source’ -> second -> under ‘Target’\n\n*FINALLY* save as .csv\n
…add them -> 2 columned spreadsheet\n\nfirst usernames -> under ‘Source’ -> second -> under ‘Target’\n\n*FINALLY* save as .csv\n
…add them -> 2 columned spreadsheet\n\nfirst usernames -> under ‘Source’ -> second -> under ‘Target’\n\n*FINALLY* save as .csv\n
import our data in Gephi = tool = visualising + manipulating -> data -> graph form\n
we have a big mess = randomly distributed nodes and edges\n\nwe’ll run a layout algorithm, which positions our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
we have a big mess = randomly distributed nodes and edges\n\nwe’ll run a layout algorithm, which positions our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
we have a big mess = randomly distributed nodes and edges\n\nwe’ll run a layout algorithm, which positions our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
this looks better! = formed of defined community clusters\n\nTo help pick out the most important individuals in the network, we need to rank this by ‘Degrees of separation’\n
this looks better! = formed of defined community clusters\n\nTo help pick out the most important individuals in the network, we need to rank this by ‘Degrees of separation’\n
this looks better! = formed of defined community clusters\n\nTo help pick out the most important individuals in the network, we need to rank this by ‘Degrees of separation’\n
we’ve ranked degrees by colour. The ‘greenest’ nodes have the highest degree\n\nNow let’s find out who are the most influential by ranking the network by ‘Betweenness Centrality’\n
we’ve ranked degrees by colour. The ‘greenest’ nodes have the highest degree\n\nNow let’s find out who are the most influential by ranking the network by ‘Betweenness Centrality’\n
we’ve ranked degrees by colour. The ‘greenest’ nodes have the highest degree\n\nNow let’s find out who are the most influential by ranking the network by ‘Betweenness Centrality’\n
Despite the colour and size of nodes helping us to see who are the most influential in this network, it is still too busy\n\nso we’ll filter out the weaker points of the network\n
Despite the colour and size of nodes helping us to see who are the most influential in this network, it is still too busy\n\nso we’ll filter out the weaker points of the network\n
Despite the colour and size of nodes helping us to see who are the most influential in this network, it is still too busy\n\nso we’ll filter out the weaker points of the network\n
These look like the most influential people we should be talking to\n\nThey might have been deep on an org chart but in this context they are very important to us.\n
\n
we (able) see = different community clusters -> these people belonged to -> identified -> our graph\n
in effect we created = alternative org chart\n\n*AND* revealed = another power structure within an organisation\n
studying sociograms might = enable large organisations -> understand which individuals -> connecting remote communities\n\n*AND* very useful = those of us -> air-dropped into organisations -> limited time to learn -> people + culture\n
Let's look = how content relationships -> visualised\n
Let's look = how content relationships -> visualised\n
Let's look = how content relationships -> visualised\n
Let's look = how content relationships -> visualised\n
though many of us = prepare web content -> isolated docs + locations -> ready for assembly -> they are = rarely read that way\n\n*SO* fully appreciate -> story we're telling (audiences) = we might -> take a look from above...\n
...like this this graph = rendering = most frequently trodden paths -> through a website\n\nby studying it = possible glean significant information -> our content + relationships between it\n\n*BUT* before we make one = me explain what we're looking at\n
...like this this graph = rendering = most frequently trodden paths -> through a website\n\nby studying it = possible glean significant information -> our content + relationships between it\n\n*BUT* before we make one = me explain what we're looking at\n
...like this this graph = rendering = most frequently trodden paths -> through a website\n\nby studying it = possible glean significant information -> our content + relationships between it\n\n*BUT* before we make one = me explain what we're looking at\n
...here’s a simplified example:\n\n*SO* each node (bubble) = web page\n\n*AND* each edge (line) = flow of users between pages (traffic)\n
...here’s a simplified example:\n\n*SO* each node (bubble) = web page\n\n*AND* each edge (line) = flow of users between pages (traffic)\n
*SO* larger nodes = higher PageRank = (essentially) higher importance -> relation other pages (set)\n\n*AND* thicker edges = frequency of flow -> between pages -> this direction\n
*SO* larger nodes = higher PageRank = (essentially) higher importance -> relation other pages (set)\n\n*AND* thicker edges = frequency of flow -> between pages -> this direction\n
Indebted = Dorian Taylor -> this technique\n\nread everything he's ever written, particularly his essay for contents magazine\n
In his article = "visualising paths through the web" -> Dorian points out -> web servers = log information -> each available referring resource (prev. page) + each new request we make (next page)\n
let's take a look -> sample web server log = looks a lot like = log file sample we saw earlier *BUT* = subtle differences\n
focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
cleaning data = case of stripping it -> all non-human visits -> web crawlers = Google bot, Bing Bot and others\n\nfind and replace = your friend and ally\n
…add them -> 2 columned spreadsheet -> like before\n\nreferring URLs -> under ‘Source’ -> referent URLs -> under ‘Target’\n\n*FINALLY* save as .csv\n
…add them -> 2 columned spreadsheet -> like before\n\nreferring URLs -> under ‘Source’ -> referent URLs -> under ‘Target’\n\n*FINALLY* save as .csv\n
…add them -> 2 columned spreadsheet -> like before\n\nreferring URLs -> under ‘Source’ -> referent URLs -> under ‘Target’\n\n*FINALLY* save as .csv\n
before continue -> introduce -> working example -> remainder (this talk)\n\ncross country coaches = fictitious UK bus + coach operator = generous with their data!\n\nmain services include: airport runs -> day trips to major towns + cities -> attractions -> sporting + music events\n
all roads (should) -> journey planner = buying e-tickets\n
*AGAIN* import our data in Gephi = tool = visualising + manipulating -> data -> graph form\n
we have a BIGGER mess = randomly distributed nodes and edges\n\nwe’ll run the same layout algorithm as before to position our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
we have a BIGGER mess = randomly distributed nodes and edges\n\nwe’ll run the same layout algorithm as before to position our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
we have a BIGGER mess = randomly distributed nodes and edges\n\nwe’ll run the same layout algorithm as before to position our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
this looks better! = formed of defined clusters\n\nTo help pick out the most important pages/content in the network, we need to rank this by ‘PageRank’\n
this looks better! = formed of defined clusters\n\nTo help pick out the most important pages/content in the network, we need to rank this by ‘PageRank’\n
this looks better! = formed of defined clusters\n\nTo help pick out the most important pages/content in the network, we need to rank this by ‘PageRank’\n
Despite the colour and size of nodes helping us to see what is the most influential content in this network, it is still too busy\n\nso, again, we’ll filter out the weaker points of the network\n
All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
Nevertheless, what we have ourselves is a sample of the most frequently-trodden paths this website = a very useful starting point\n\nSo we’ll export the node data as a .csv as this will come in handy later\n
\n
visualising the referring/referent data -> web server logs = provided different perspective -> stories -> we're telling our audiences\n
by filtering data = removed visual complexity = to reveal key paths + stops users making\n
*AND* exporting node data = generated sample most frequently accessed site content\n
\n
\n
\n
\n
we can add -> further value + depth -> visualisations = informing them -> data from our own investigations\n
Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
*OR* could use = Google Fusion Tables = help quickly merge data across 2 spreadsheets = pairing up 1 (or more) columns -> same values\n
now we = download CSV -> take it where we please\n
now we = download CSV -> take it where we please\n
now we = download CSV -> take it where we please\n
What could we do -> exported node table -> now we have = additional data?\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
let's pose a question\n
here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
*SO* let’s return to our intro = Cross Country Coaches\n\nTourism Partners responsible for = content for each town/city destination\n
by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
could external data = help us prioritise our efforts?\n\nlet's bring some into play\n
*SO* I downloaded data = domestic tourism stats for 09-11 = VisitBritain\n
only problem was = data = only available PDF format\n\nthis happens a lot!\n
*BUT* you can save hours -> re-keying + checking = using tools like these\n\nthese free tools = convert PDF-to-Excel\n
we can use = Google Fusion Tables = merge our data -> each town + city destination -> with downloaded VisitBritain data\n\nonly inherit the data -> corresponds -> our towns + cities \n\n*SO* let's map the data\n
we can use = Google Fusion Tables = merge our data -> each town + city destination -> with downloaded VisitBritain data\n\nonly inherit the data -> corresponds -> our towns + cities \n\n*SO* let's map the data\n
here's a map of British Isles -> generated via Tableau Public\n\nincluded are = bubbles marking -> each town + city -> from our data\n\nTableau public automatically recognised this as location data\n
*SO* bubble sizes represent = total domestic trips 09-11 (millions)\n\n*AND* colour intensity represents = total spend 09-11 (millions)\n
*SO* bubble sizes represent = total domestic trips 09-11 (millions)\n\n*AND* colour intensity represents = total spend 09-11 (millions)\n
*SO* bubble sizes represent = total domestic trips 09-11 (millions)\n\n*AND* colour intensity represents = total spend 09-11 (millions)\n
*SO* bubble sizes represent = total domestic trips 09-11 (millions)\n\n*AND* colour intensity represents = total spend 09-11 (millions)\n
as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
\n
filtering + partitioning -> data -> own investigations = took us deeper into -> data\n\n
importing -> external data = provided possible new angles + ideas\n
adding interactive elements = helped develop narrative around -> data\n\n*AND* invites our audiences = delve deeper still\n
\n
\n
\n
\n
play + experiment -> your data\n\nhave fun with it -> ask questions of it -> don't fear it\n\n*THEN* = often yield its secrets + stories = surprising ease\n
we (often) associate numbers = authority and certainty\n\nbut uncertainly = great way -> raising new questions -> sharing them with others\n
by creating + sharing -> your work -> with others = you might find -> get help + co-operation back\n