SlideShare a Scribd company logo
1 of 192
Visualising Data
                   Seeing is Believing



Richard Ingram
@richardjingram
Observe
Observe   Analyse
It’s worked to our adaptive advantage in
 competition with other animal species.
We’ve developed a greater variety of ways to
   communicate and express ourselves.
Paleolithic painting of giant elk
Lascaux, southwestern France (c. 15,300 BC)
Upper Paleolithic painting of bison
Altamira cave, northern Spain (c. 23,000 BC – c. 33,000 BC)
Da Vinci’s Vitruvian Man
Pen and ink on paper (c. 1487)
Florence Nightingale’s coxcomb diagram
Showed the causes of death in the British Army in the Crimea (1858)
Copernicus’s heliocentric universe diagram
He literally moved heaven and earth to draw it (c. 1543)
Ptolemy’s theory of a heliocentric universe
Everything revolved around us, and that suited just fine.
On the Revolutions of the Celestial Spheres
Contains thousands of years worth of astrological data (c. 1543)
The data deluge
Power to the people
“The release of data is a
cornerstone of how to
strengthen the role of citizens
and government, and recast the
relationship between the two.”
               ~ Sir Tim Berners-Lee
                   Interview with The Guardian (2010)
“Having the data is not
enough.You have to show it
in ways that people both
enjoy and understand.”
              ~ Dr. Hans Rosling
              The Joy Of Stats (BBC Television, 2011)
Free tools for data extraction, exploration
             and visualisation
   Gephi (gephi.org)
   Google Chart Tools (developers.google.com/chart/)
   Google Fusion Tables (google.com/fusiontables/)
   Google Refine (code.google.com/p/google-refine/)
   Lucid Chart (lucidchart.com)
   ManyEyes (www-958.ibm.com)
   OpenStreetMap (openstreetmap.org)
   ScraperWiki (scraperwiki.com)
   Tableau Public (tableausoftware.com/public/)
Tableau Public                 Lucid Chart
tableausoftware.com/public/    lucidchart.com




Google Chart Tools             ManyEyes
developers.google.com/chart/   www-958.ibm.com
Gephi
gephi.org
OpenStreetMap       Google Fusion Tables
openstreetmap.org   google.com/fusiontables/
ScraperWiki
scraperwiki.com
Visualising data...


  ...allows us to quickly explore and analyse
possible relationships between things and how
               they vary together.
Visualising data...


  ...can be an effective way to amplify and
      simplify the communication of our
recommendations and arguments to other
                   audiences.
Visualising data
The hidden people networks
Internal author interviews
Who to speak to? Org charts can help.
Internal author interviews
How do we reveal those hidden relationships?
Sociogram
Visualises the structure and patterns of group interactions.


                              Connection
 (Edge)



                                 Person
 (Node)
Sociogram
Visualises the structure and patterns of group interactions.


                              Connection
 (Edge)



                                 Person
 (Node)
Sociogram
Can reveal community clusters and calculate network
science parameters like degrees of separation.




                                     More
 
                                 connections
 =
 
                                 Higher
 degree
Sociogram
Can reveal community clusters and calculate network
science parameters like degrees of separation.




                                     More
 
                                 connections
 =
 
                                 Higher
 degree
Sociogram
Can reveal community clusters and calculate network
science parameters like betweenness centrality.




                                                                 Higher
 network
 
                                                                   importance
 
                             =
 Higher
 betweenness
Sociogram
Can reveal community clusters and calculate network
science parameters like betweenness centrality.




                                                                 Higher
 network
 
                                                                   importance
 
                             =
 Higher
 betweenness
Log file entries
    Whoa, what in the name of thunder does this all mean?

#Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query
sc-status cs(User-Agent)
2012-09-03 00:10:19 XXX.XXX.X.211 clarke_n XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=84 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:20 XXX.XXX.X.29 arecchi_f XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=168 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:14:38 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=73 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
Log file entries: a closer look
Important elements in bold: date/time stamp,
client ip address, username, and content
accessed:


2012-09-03 00:09:53 XXX.XXX.X.104
russell_g XXX.XXX.X.103 80 GET /
admin/pages/content.php?id=12
Cmd=contents 200 Mozilla/4.76+[en]+
(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
Log file entries: a closer look
Important elements in bold: date/time stamp,
client ip address, username, and content
accessed:


2012-09-03 00:09:53 XXX.XXX.X.104
russell_g XXX.XXX.X.103 80 GET /
admin/pages/content.php?id=12
Cmd=contents 200 Mozilla/4.76+[en]+
(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
Log file entries: a closer look
Important elements in bold: date/time stamp,
client ip address, username, and content
accessed:


2012-09-03 00:09:53 XXX.XXX.X.104
russell_g XXX.XXX.X.103 80 GET /
admin/pages/content.php?id=12
Cmd=contents 200 Mozilla/4.76+[en]+
(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
Log file entries: a closer look
Important elements in bold: date/time stamp,
client ip address, username, and content
accessed:


2012-09-03 00:09:53 XXX.XXX.X.104
russell_g XXX.XXX.X.103 80 GET /
admin/pages/content.php?id=12
Cmd=contents 200 Mozilla/4.76+[en]+
(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
Log file entries: a closer look
Important elements in bold: date/time stamp,
client ip address, username, and content
accessed:


2012-09-03 00:09:53 XXX.XXX.X.104
russell_g XXX.XXX.X.103 80 GET /
admin/pages/content.php?id=12
Cmd=contents 200 Mozilla/4.76+[en]+
(X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
Log file entries
    We want to locate authors accessing the same content
    within a certain timeframe.
#Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query
sc-status cs(User-Agent)
2012-09-03 00:10:19 XXX.XXX.X.211 clarke_n XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=84 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:20 XXX.XXX.X.29 arecchi_f XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=168 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:14:38 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=73 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
Log file entries
    We want to locate authors accessing the same content
    within a certain timeframe.



2012-09-03 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)



2012-09-03 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
2012-09-03 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/pages/
content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
ac7+i686;+Nav)
Log file entries
      We want to locate authors accessing the same content
      within a certain timeframe.

00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/page
?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
av)
00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/page
?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
av)



00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/
?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
av)
00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/p
?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9-
av)
Log file entries
We return to the relative safety of the spreadsheet.
Log file entries
We return to the relative safety of the spreadsheet.
Log file entries
We return to the relative safety of the spreadsheet.
Gephi
gephi.org
Big mess
Let’s run a layout algorithm.
That’s better...
But who are the most influential?
So, what did we learn?
We were able to easily spot the different
community clusters to which people were
              connected.
Visualising the log file data helped us create an
              alternative org chart.
Using network science parameters we saw
which individuals held the most influence over
               multiple groups.
Visualising data
Relationships between content
For many of us, published web content is increasingly
formed of flexible modules scattered across documents
and locations, but they are seldom read that way.
Frequently trodden paths
Mapping the movement of users between web pages.
Frequently trodden paths
Weightier nodes and edges indicate key paths and stops.
Frequently trodden paths
            Weightier nodes and edges indicate key paths and stops.




Page
 (Node)
Frequently trodden paths
            Weightier nodes and edges indicate key paths and stops.




                                                         User
 flow
                                                           (Edge)



Page
 (Node)
Frequently trodden paths
            Weightier nodes and edges indicate key paths and stops.




                                                         User
 flow
                                                           (Edge)



Page
 (Node)
Frequently trodden paths
            Weightier nodes and edges indicate key paths and stops.




                                                         User
 flow
                                                           (Edge)



Page
 (Node)
Dorian Taylor
doriantaylor.org
Web server logs
Store information on each request we make.




     Referring Resource                New request
       (Previous page)                 (Next page)


                      Source: doriantaylor.com/visualizing-paths-through-the-web
Web server logs
    Haven’t we been here before? Not quite.

XX.XXX.XXX.86 [30/Aug/2012:11:09:27 +0100] GET /contact-us/ HTTP/1.1 200 14728 -
http://www.crosscountrycoaches.com/destinations/ Mozilla/5.0 (Windows NT 6.0; rv:
14.0) Gecko/20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100] GET / HTTP/1.1 200 12007 - http://
www.crosscountrycoaches.com/destinations/ Mozilla/5.0 (Windows NT 6.0; rv:14.0)
Gecko/20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100] GET /contact-us/view-your-ticket/ HTTP/1.1
200 14084 - http://www.crosscountrycoaches.com/ Mozilla/5.0 (Windows NT 6.0; rv:
14.0) Gecko/20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:37 +0100] GET /services/ HTTP/1.1 200 13428 -
http://www.crosscountrycoaches.com/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:38 +0100] GET /login/ HTTP/1.1 200 17284 - http://
www.crosscountrycoaches.com/services/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100] GET /reprint-your-ticket/ HTTP/1.1 200
27788 - http://www.crosscountrycoaches.com/services/ Mozilla/5.0 (Windows NT 6.0;
rv:14.0) Gecko/20100101 Firefox/14.0.1
XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100] GET /services/terms-and-conditions/ HTTP/
1.1 200 11638 - http://www.crosscountrycoaches.com/reprint-your-ticket/ Mozilla/
5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1
Web server logs: a closer look
  Important elements in bold: client ip address, next
  page and previous page:


XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100]
GET /contact-us/view-your-ticket/ HTTP/1.1 200
14084 - http://www.crosscountrycoaches.com/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1

XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100]
GET /reprint-your-ticket/ HTTP/1.1 200 27788 -
http://www.crosscountrycoaches.com/services/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
Web server logs: a closer look
  Important elements in bold: client ip address, next
  page and previous page:


XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100]
GET /contact-us/view-your-ticket/ HTTP/1.1 200
14084 - http://www.crosscountrycoaches.com/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1

XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100]
GET /reprint-your-ticket/ HTTP/1.1 200 27788 -
http://www.crosscountrycoaches.com/services/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
Web server logs: a closer look
  Important elements in bold: client ip address, next
  page and previous page:


XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100]
GET /contact-us/view-your-ticket/ HTTP/1.1 200
14084 - http://www.crosscountrycoaches.com/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1

XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100]
GET /reprint-your-ticket/ HTTP/1.1 200 27788 -
http://www.crosscountrycoaches.com/services/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
Web server logs: a closer look
  Important elements in bold: client ip address, next
  page and previous page:


XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100]
GET /contact-us/view-your-ticket/ HTTP/1.1 200
14084 - http://www.crosscountrycoaches.com/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1

XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100]
GET /reprint-your-ticket/ HTTP/1.1 200 27788 -
http://www.crosscountrycoaches.com/services/
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/
20100101 Firefox/14.0.1
Web server logs
Cleaning up the data.

Most non-human activity can be removed by searching
for:
     Bingbot (google.com/bot.html)
     Googlebot (bing.com/bingbot.htm)
     Any mentions of ‘bots’, ‘spiders’, and ‘crawlers’
Web server logs
Extract the URL data and add to a spreadsheet.
Web server logs
Extract the URL data and add to a spreadsheet.
Web server logs
Extract the URL data and add to a spreadsheet.
Cross Country Coaches
 The entirely fictitious UK-wide bus and coach operator.



   Airports                               Attractions
                                        Holiday camps and
Direct to terminal
                                        amusement parks




  Day trips                                  Events
Towns and cities                         Sports and music
Cross Country Coaches
 The entirely fictitious UK-wide bus and coach operator.



   Airports                                  Attractions
                                            Holiday camps and
Direct to terminal
                                            amusement parks


                     Journey Planner
                     Purchasing e-tickets


  Day trips                                     Events
Towns and cities                            Sports and music
Gephi
gephi.org
An even bigger mess
Let’s run a layout algorithm.
That’s better...
Let’s pick out the influential pages.
Export node data
This will come in handy later
So, what did we learn?
Visualising the web server log data provided
us with a different perspective of the stories
         we’re telling our audiences.
Filtering the data took away the visual
complexity and revealed to us the key paths
                 and stops.
Exporting the node data generated a page-
  level inventory of the most frequently
             accessed content.
Visualising data
Playing with numbers
Utilising internal data
Bringing data from your own investigations into play.
Exported node table
Manually feed in data from your own investigations.
Exported node table
Manually feed in data from your own investigations.
Exported node table
Manually feed in data from your own investigations.
Exported node table
Manually feed in data from your own investigations.
Exported node table
Manually feed in data from your own investigations.
Exported node table
Manually feed in data from your own investigations.
Exported node table
Automatically feed in data from your own investigations.


       Exported node table


                     +
         Content inventory


                     =
       Extended node table
Google Fusion Tables
google.com/fusiontables/
What can we do with this additional data?
Tableau Public
tableausoftware.com/public/
Q. Is there a perceived difference in the
quality of content maintained in-house and
         through external partners?
Cross Country Coaches
 The entirely fictitious UK-wide bus and coach operator.



   Airports                               Attractions
                                        Holiday camps and
Direct to terminal
                                        amusement parks




  Day trips                                  Events
Towns and cities                         Sports and music
Cross Country Coaches
 The entirely fictitious UK-wide bus and coach operator.



   Airports                               Attractions
                                        Holiday camps and
Direct to terminal
                                        amusement parks




  Day trips                                  Events
Towns and cities                         Sports and music
Utilising external data
Bringing public data into play.
VisitBritain.org
Domestic tourism statistics

More Related Content

Similar to Visualising data: Seeing is Believing - CS Forum 2012

Safeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data PerspectivesSafeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data PerspectivesParang Saraf
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Michael Mathioudakis
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics EnvironmentIan Foster
 
Anonymization of centralized and distributed social networks by sequential cl...
Anonymization of centralized and distributed social networks by sequential cl...Anonymization of centralized and distributed social networks by sequential cl...
Anonymization of centralized and distributed social networks by sequential cl...JPINFOTECH JAYAPRAKASH
 
Graphing Grifters: Identify & Display Patterns of Corruption With Oracle Graph
Graphing Grifters: Identify & Display Patterns of Corruption With Oracle GraphGraphing Grifters: Identify & Display Patterns of Corruption With Oracle Graph
Graphing Grifters: Identify & Display Patterns of Corruption With Oracle GraphJim Czuprynski
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL OverviewMarc Smith
 
Bridging data analysis and interactive visualization
Bridging data analysis and interactive visualizationBridging data analysis and interactive visualization
Bridging data analysis and interactive visualizationNacho Caballero
 
On the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsOn the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsMarkus Strohmaier
 
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...Dataconomy Media
 
Node XL - features and demo
Node XL - features and demoNode XL - features and demo
Node XL - features and demoMayank Mohan
 
Measurement and modeling of the web and related data sets
Measurement and modeling of the web and related data setsMeasurement and modeling of the web and related data sets
Measurement and modeling of the web and related data setsMark J. Feldman
 
Live Social Semantics @ ESWC2010
Live Social Semantics @ ESWC2010Live Social Semantics @ ESWC2010
Live Social Semantics @ ESWC2010Martin Szomszor
 
Term Paper Presentation
Term Paper PresentationTerm Paper Presentation
Term Paper PresentationShubham Singh
 
But is it Art(ificial Intelligence)?
But is it Art(ificial Intelligence)? But is it Art(ificial Intelligence)?
But is it Art(ificial Intelligence)? Alan Sardella
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Amit Sheth
 
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Amit Sheth
 

Similar to Visualising data: Seeing is Believing - CS Forum 2012 (20)

Safeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data PerspectivesSafeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data Perspectives
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Anonymization of centralized and distributed social networks by sequential cl...
Anonymization of centralized and distributed social networks by sequential cl...Anonymization of centralized and distributed social networks by sequential cl...
Anonymization of centralized and distributed social networks by sequential cl...
 
Graphing Grifters: Identify & Display Patterns of Corruption With Oracle Graph
Graphing Grifters: Identify & Display Patterns of Corruption With Oracle GraphGraphing Grifters: Identify & Display Patterns of Corruption With Oracle Graph
Graphing Grifters: Identify & Display Patterns of Corruption With Oracle Graph
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL Overview
 
Bridging data analysis and interactive visualization
Bridging data analysis and interactive visualizationBridging data analysis and interactive visualization
Bridging data analysis and interactive visualization
 
On the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsOn the Navigability of Social Tagging Systems
On the Navigability of Social Tagging Systems
 
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
 
Node XL - features and demo
Node XL - features and demoNode XL - features and demo
Node XL - features and demo
 
Semantic web Santhosh N Basavarajappa
Semantic web   Santhosh N BasavarajappaSemantic web   Santhosh N Basavarajappa
Semantic web Santhosh N Basavarajappa
 
Measurement and modeling of the web and related data sets
Measurement and modeling of the web and related data setsMeasurement and modeling of the web and related data sets
Measurement and modeling of the web and related data sets
 
Live Social Semantics @ ESWC2010
Live Social Semantics @ ESWC2010Live Social Semantics @ ESWC2010
Live Social Semantics @ ESWC2010
 
1 brokerageB.pdf
1 brokerageB.pdf1 brokerageB.pdf
1 brokerageB.pdf
 
Term Paper Presentation
Term Paper PresentationTerm Paper Presentation
Term Paper Presentation
 
Big Data and IOT
Big Data and IOTBig Data and IOT
Big Data and IOT
 
Python networkx library quick start guide
Python networkx library quick start guidePython networkx library quick start guide
Python networkx library quick start guide
 
But is it Art(ificial Intelligence)?
But is it Art(ificial Intelligence)? But is it Art(ificial Intelligence)?
But is it Art(ificial Intelligence)?
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
 
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
 

More from Richard Ingram

Designing self-service support content – SD Expo Europe 2019
Designing self-service support content – SD Expo Europe 2019Designing self-service support content – SD Expo Europe 2019
Designing self-service support content – SD Expo Europe 2019Richard Ingram
 
Pair writing: how to collaborate effectively with subject matter experts – Co...
Pair writing: how to collaborate effectively with subject matter experts – Co...Pair writing: how to collaborate effectively with subject matter experts – Co...
Pair writing: how to collaborate effectively with subject matter experts – Co...Richard Ingram
 
Pair writing: how to collaborate closely on content with subject matter exper...
Pair writing: how to collaborate closely on content with subject matter exper...Pair writing: how to collaborate closely on content with subject matter exper...
Pair writing: how to collaborate closely on content with subject matter exper...Richard Ingram
 
Mapping the Way Forward IA Summit 2015
Mapping the Way Forward IA Summit 2015Mapping the Way Forward IA Summit 2015
Mapping the Way Forward IA Summit 2015Richard Ingram
 
Mapping the Way Forward Content Day 2014
Mapping the Way Forward Content Day 2014Mapping the Way Forward Content Day 2014
Mapping the Way Forward Content Day 2014Richard Ingram
 
Mapping the Way Forward UX Cambridge 2014
Mapping the Way Forward UX Cambridge 2014Mapping the Way Forward UX Cambridge 2014
Mapping the Way Forward UX Cambridge 2014Richard Ingram
 
Mapping the Way Forward UX Scotland 2014
Mapping the Way Forward UX Scotland 2014Mapping the Way Forward UX Scotland 2014
Mapping the Way Forward UX Scotland 2014Richard Ingram
 
The Importance of Visualisation – Mapping the Way Forward
The Importance of Visualisation – Mapping the Way ForwardThe Importance of Visualisation – Mapping the Way Forward
The Importance of Visualisation – Mapping the Way ForwardRichard Ingram
 
How did we all get here?
How did we all get here?How did we all get here?
How did we all get here?Richard Ingram
 

More from Richard Ingram (9)

Designing self-service support content – SD Expo Europe 2019
Designing self-service support content – SD Expo Europe 2019Designing self-service support content – SD Expo Europe 2019
Designing self-service support content – SD Expo Europe 2019
 
Pair writing: how to collaborate effectively with subject matter experts – Co...
Pair writing: how to collaborate effectively with subject matter experts – Co...Pair writing: how to collaborate effectively with subject matter experts – Co...
Pair writing: how to collaborate effectively with subject matter experts – Co...
 
Pair writing: how to collaborate closely on content with subject matter exper...
Pair writing: how to collaborate closely on content with subject matter exper...Pair writing: how to collaborate closely on content with subject matter exper...
Pair writing: how to collaborate closely on content with subject matter exper...
 
Mapping the Way Forward IA Summit 2015
Mapping the Way Forward IA Summit 2015Mapping the Way Forward IA Summit 2015
Mapping the Way Forward IA Summit 2015
 
Mapping the Way Forward Content Day 2014
Mapping the Way Forward Content Day 2014Mapping the Way Forward Content Day 2014
Mapping the Way Forward Content Day 2014
 
Mapping the Way Forward UX Cambridge 2014
Mapping the Way Forward UX Cambridge 2014Mapping the Way Forward UX Cambridge 2014
Mapping the Way Forward UX Cambridge 2014
 
Mapping the Way Forward UX Scotland 2014
Mapping the Way Forward UX Scotland 2014Mapping the Way Forward UX Scotland 2014
Mapping the Way Forward UX Scotland 2014
 
The Importance of Visualisation – Mapping the Way Forward
The Importance of Visualisation – Mapping the Way ForwardThe Importance of Visualisation – Mapping the Way Forward
The Importance of Visualisation – Mapping the Way Forward
 
How did we all get here?
How did we all get here?How did we all get here?
How did we all get here?
 

Recently uploaded

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Recently uploaded (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Visualising data: Seeing is Believing - CS Forum 2012

  • 1. Visualising Data Seeing is Believing Richard Ingram @richardjingram
  • 3. Observe Analyse
  • 4. It’s worked to our adaptive advantage in competition with other animal species.
  • 5. We’ve developed a greater variety of ways to communicate and express ourselves.
  • 6.
  • 7. Paleolithic painting of giant elk Lascaux, southwestern France (c. 15,300 BC)
  • 8.
  • 9. Upper Paleolithic painting of bison Altamira cave, northern Spain (c. 23,000 BC – c. 33,000 BC)
  • 10.
  • 11. Da Vinci’s Vitruvian Man Pen and ink on paper (c. 1487)
  • 12.
  • 13. Florence Nightingale’s coxcomb diagram Showed the causes of death in the British Army in the Crimea (1858)
  • 14.
  • 15. Copernicus’s heliocentric universe diagram He literally moved heaven and earth to draw it (c. 1543)
  • 16.
  • 17. Ptolemy’s theory of a heliocentric universe Everything revolved around us, and that suited just fine.
  • 18.
  • 19. On the Revolutions of the Celestial Spheres Contains thousands of years worth of astrological data (c. 1543)
  • 20.
  • 21.
  • 22. The data deluge Power to the people
  • 23.
  • 24. “The release of data is a cornerstone of how to strengthen the role of citizens and government, and recast the relationship between the two.” ~ Sir Tim Berners-Lee Interview with The Guardian (2010)
  • 25.
  • 26. “Having the data is not enough.You have to show it in ways that people both enjoy and understand.” ~ Dr. Hans Rosling The Joy Of Stats (BBC Television, 2011)
  • 27. Free tools for data extraction, exploration and visualisation Gephi (gephi.org) Google Chart Tools (developers.google.com/chart/) Google Fusion Tables (google.com/fusiontables/) Google Refine (code.google.com/p/google-refine/) Lucid Chart (lucidchart.com) ManyEyes (www-958.ibm.com) OpenStreetMap (openstreetmap.org) ScraperWiki (scraperwiki.com) Tableau Public (tableausoftware.com/public/)
  • 28. Tableau Public Lucid Chart tableausoftware.com/public/ lucidchart.com Google Chart Tools ManyEyes developers.google.com/chart/ www-958.ibm.com
  • 29.
  • 31.
  • 32. OpenStreetMap Google Fusion Tables openstreetmap.org google.com/fusiontables/
  • 33.
  • 35. Visualising data... ...allows us to quickly explore and analyse possible relationships between things and how they vary together.
  • 36. Visualising data... ...can be an effective way to amplify and simplify the communication of our recommendations and arguments to other audiences.
  • 37.
  • 38. Visualising data The hidden people networks
  • 39. Internal author interviews Who to speak to? Org charts can help.
  • 40. Internal author interviews How do we reveal those hidden relationships?
  • 41. Sociogram Visualises the structure and patterns of group interactions. Connection
  • 42.  (Edge) Person
  • 44. Sociogram Visualises the structure and patterns of group interactions. Connection
  • 45.  (Edge) Person
  • 47. Sociogram Can reveal community clusters and calculate network science parameters like degrees of separation. More
  • 48.   connections
  • 49.  =
  • 50.   Higher
  • 52. Sociogram Can reveal community clusters and calculate network science parameters like degrees of separation. More
  • 53.   connections
  • 54.  =
  • 55.   Higher
  • 57. Sociogram Can reveal community clusters and calculate network science parameters like betweenness centrality. Higher
  • 59.   importance
  • 60.   =
  • 63. Sociogram Can reveal community clusters and calculate network science parameters like betweenness centrality. Higher
  • 65.   importance
  • 66.   =
  • 69. Log file entries Whoa, what in the name of thunder does this all mean? #Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query sc-status cs(User-Agent) 2012-09-03 00:10:19 XXX.XXX.X.211 clarke_n XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=84 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:13:20 XXX.XXX.X.29 arecchi_f XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=168 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:14:38 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=73 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav)
  • 70. Log file entries: a closer look Important elements in bold: date/time stamp, client ip address, username, and content accessed: 2012-09-03 00:09:53 XXX.XXX.X.104 russell_g XXX.XXX.X.103 80 GET / admin/pages/content.php?id=12 Cmd=contents 200 Mozilla/4.76+[en]+ (X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • 71. Log file entries: a closer look Important elements in bold: date/time stamp, client ip address, username, and content accessed: 2012-09-03 00:09:53 XXX.XXX.X.104 russell_g XXX.XXX.X.103 80 GET / admin/pages/content.php?id=12 Cmd=contents 200 Mozilla/4.76+[en]+ (X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • 72. Log file entries: a closer look Important elements in bold: date/time stamp, client ip address, username, and content accessed: 2012-09-03 00:09:53 XXX.XXX.X.104 russell_g XXX.XXX.X.103 80 GET / admin/pages/content.php?id=12 Cmd=contents 200 Mozilla/4.76+[en]+ (X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • 73. Log file entries: a closer look Important elements in bold: date/time stamp, client ip address, username, and content accessed: 2012-09-03 00:09:53 XXX.XXX.X.104 russell_g XXX.XXX.X.103 80 GET / admin/pages/content.php?id=12 Cmd=contents 200 Mozilla/4.76+[en]+ (X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • 74. Log file entries: a closer look Important elements in bold: date/time stamp, client ip address, username, and content accessed: 2012-09-03 00:09:53 XXX.XXX.X.104 russell_g XXX.XXX.X.103 80 GET / admin/pages/content.php?id=12 Cmd=contents 200 Mozilla/4.76+[en]+ (X11;+U;+Linux+2.4.9-ac7+i686;+Nav)
  • 75. Log file entries We want to locate authors accessing the same content within a certain timeframe. #Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query sc-status cs(User-Agent) 2012-09-03 00:10:19 XXX.XXX.X.211 clarke_n XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=84 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:13:20 XXX.XXX.X.29 arecchi_f XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=168 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:14:38 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=73 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav)
  • 76. Log file entries We want to locate authors accessing the same content within a certain timeframe. 2012-09-03 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav) 2012-09-03 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/pages/ content.php?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- ac7+i686;+Nav)
  • 77. Log file entries We want to locate authors accessing the same content within a certain timeframe. 00:10:39 XXX.XXX.X.17 olson_b XXX.XXX.X.103 80 GET /admin/page ?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- av) 00:11:12 XXX.XXX.X.40 zajac_s XXX.XXX.X.103 80 GET /admin/page ?id=37 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- av) 00:13:50 XXX.XXX.X.107 chalmers_s XXX.XXX.X.103 80 GET /admin/ ?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- av) 00:13:52 XXX.XXX.X.178 harding_a XXX.XXX.X.103 80 GET /admin/p ?id=174 Cmd=contents 200 Mozilla/4.76+[en]+(X11;+U;+Linux+2.4.9- av)
  • 78. Log file entries We return to the relative safety of the spreadsheet.
  • 79. Log file entries We return to the relative safety of the spreadsheet.
  • 80. Log file entries We return to the relative safety of the spreadsheet.
  • 81.
  • 83.
  • 84.
  • 85.
  • 86. Big mess Let’s run a layout algorithm.
  • 87.
  • 88.
  • 89.
  • 90. That’s better... But who are the most influential?
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100. So, what did we learn?
  • 101. We were able to easily spot the different community clusters to which people were connected.
  • 102. Visualising the log file data helped us create an alternative org chart.
  • 103. Using network science parameters we saw which individuals held the most influence over multiple groups.
  • 104.
  • 106. For many of us, published web content is increasingly formed of flexible modules scattered across documents and locations, but they are seldom read that way.
  • 107.
  • 108. Frequently trodden paths Mapping the movement of users between web pages.
  • 109. Frequently trodden paths Weightier nodes and edges indicate key paths and stops.
  • 110. Frequently trodden paths Weightier nodes and edges indicate key paths and stops. Page
  • 112. Frequently trodden paths Weightier nodes and edges indicate key paths and stops. User
  • 113.  flow (Edge) Page
  • 115. Frequently trodden paths Weightier nodes and edges indicate key paths and stops. User
  • 116.  flow (Edge) Page
  • 118. Frequently trodden paths Weightier nodes and edges indicate key paths and stops. User
  • 119.  flow (Edge) Page
  • 121.
  • 123. Web server logs Store information on each request we make. Referring Resource New request (Previous page) (Next page) Source: doriantaylor.com/visualizing-paths-through-the-web
  • 124. Web server logs Haven’t we been here before? Not quite. XX.XXX.XXX.86 [30/Aug/2012:11:09:27 +0100] GET /contact-us/ HTTP/1.1 200 14728 - http://www.crosscountrycoaches.com/destinations/ Mozilla/5.0 (Windows NT 6.0; rv: 14.0) Gecko/20100101 Firefox/14.0.1 XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100] GET / HTTP/1.1 200 12007 - http:// www.crosscountrycoaches.com/destinations/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1 XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100] GET /contact-us/view-your-ticket/ HTTP/1.1 200 14084 - http://www.crosscountrycoaches.com/ Mozilla/5.0 (Windows NT 6.0; rv: 14.0) Gecko/20100101 Firefox/14.0.1 XX.XXX.XXX.86 [30/Aug/2012:11:09:37 +0100] GET /services/ HTTP/1.1 200 13428 - http://www.crosscountrycoaches.com/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/ 20100101 Firefox/14.0.1 XX.XXX.XXX.86 [30/Aug/2012:11:09:38 +0100] GET /login/ HTTP/1.1 200 17284 - http:// www.crosscountrycoaches.com/services/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/ 20100101 Firefox/14.0.1 XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100] GET /reprint-your-ticket/ HTTP/1.1 200 27788 - http://www.crosscountrycoaches.com/services/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1 XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100] GET /services/terms-and-conditions/ HTTP/ 1.1 200 11638 - http://www.crosscountrycoaches.com/reprint-your-ticket/ Mozilla/ 5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0.1
  • 125. Web server logs: a closer look Important elements in bold: client ip address, next page and previous page: XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100] GET /contact-us/view-your-ticket/ HTTP/1.1 200 14084 - http://www.crosscountrycoaches.com/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/ 20100101 Firefox/14.0.1 XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100] GET /reprint-your-ticket/ HTTP/1.1 200 27788 - http://www.crosscountrycoaches.com/services/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/ 20100101 Firefox/14.0.1
  • 126. Web server logs: a closer look Important elements in bold: client ip address, next page and previous page: XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100] GET /contact-us/view-your-ticket/ HTTP/1.1 200 14084 - http://www.crosscountrycoaches.com/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/ 20100101 Firefox/14.0.1 XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100] GET /reprint-your-ticket/ HTTP/1.1 200 27788 - http://www.crosscountrycoaches.com/services/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/ 20100101 Firefox/14.0.1
  • 127. Web server logs: a closer look Important elements in bold: client ip address, next page and previous page: XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100] GET /contact-us/view-your-ticket/ HTTP/1.1 200 14084 - http://www.crosscountrycoaches.com/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/ 20100101 Firefox/14.0.1 XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100] GET /reprint-your-ticket/ HTTP/1.1 200 27788 - http://www.crosscountrycoaches.com/services/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/ 20100101 Firefox/14.0.1
  • 128. Web server logs: a closer look Important elements in bold: client ip address, next page and previous page: XX.XXX.XXX.86 [30/Aug/2012:11:09:29 +0100] GET /contact-us/view-your-ticket/ HTTP/1.1 200 14084 - http://www.crosscountrycoaches.com/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/ 20100101 Firefox/14.0.1 XX.XXX.XXX.86 [30/Aug/2012:11:09:42 +0100] GET /reprint-your-ticket/ HTTP/1.1 200 27788 - http://www.crosscountrycoaches.com/services/ Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/ 20100101 Firefox/14.0.1
  • 129. Web server logs Cleaning up the data. Most non-human activity can be removed by searching for: Bingbot (google.com/bot.html) Googlebot (bing.com/bingbot.htm) Any mentions of ‘bots’, ‘spiders’, and ‘crawlers’
  • 130. Web server logs Extract the URL data and add to a spreadsheet.
  • 131. Web server logs Extract the URL data and add to a spreadsheet.
  • 132. Web server logs Extract the URL data and add to a spreadsheet.
  • 133. Cross Country Coaches The entirely fictitious UK-wide bus and coach operator. Airports Attractions Holiday camps and Direct to terminal amusement parks Day trips Events Towns and cities Sports and music
  • 134. Cross Country Coaches The entirely fictitious UK-wide bus and coach operator. Airports Attractions Holiday camps and Direct to terminal amusement parks Journey Planner Purchasing e-tickets Day trips Events Towns and cities Sports and music
  • 135.
  • 137.
  • 138.
  • 139.
  • 140. An even bigger mess Let’s run a layout algorithm.
  • 141.
  • 142.
  • 143.
  • 144. That’s better... Let’s pick out the influential pages.
  • 145.
  • 146.
  • 147.
  • 148.
  • 149.
  • 150.
  • 151.
  • 152.
  • 153.
  • 154. Export node data This will come in handy later
  • 155. So, what did we learn?
  • 156. Visualising the web server log data provided us with a different perspective of the stories we’re telling our audiences.
  • 157. Filtering the data took away the visual complexity and revealed to us the key paths and stops.
  • 158. Exporting the node data generated a page- level inventory of the most frequently accessed content.
  • 159.
  • 161. Utilising internal data Bringing data from your own investigations into play.
  • 162. Exported node table Manually feed in data from your own investigations.
  • 163. Exported node table Manually feed in data from your own investigations.
  • 164. Exported node table Manually feed in data from your own investigations.
  • 165. Exported node table Manually feed in data from your own investigations.
  • 166. Exported node table Manually feed in data from your own investigations.
  • 167. Exported node table Manually feed in data from your own investigations.
  • 168. Exported node table Automatically feed in data from your own investigations. Exported node table + Content inventory = Extended node table
  • 169.
  • 171. What can we do with this additional data?
  • 172.
  • 174. Q. Is there a perceived difference in the quality of content maintained in-house and through external partners?
  • 175.
  • 176.
  • 177.
  • 178.
  • 179.
  • 180.
  • 181. Cross Country Coaches The entirely fictitious UK-wide bus and coach operator. Airports Attractions Holiday camps and Direct to terminal amusement parks Day trips Events Towns and cities Sports and music
  • 182. Cross Country Coaches The entirely fictitious UK-wide bus and coach operator. Airports Attractions Holiday camps and Direct to terminal amusement parks Day trips Events Towns and cities Sports and music
  • 183.
  • 184.
  • 185.
  • 186.
  • 187.
  • 188.
  • 189. Utilising external data Bringing public data into play.
  • 190.
  • 192.
  • 193. Regional breakdown of tourism value Only available in handy PDF format [!].
  • 194.
  • 195. PDF to Excel Online Zamzar pdftoexcelonline.com zamzar.com
  • 196.
  • 198.
  • 199.
  • 200.
  • 201.
  • 202.
  • 203.
  • 204.
  • 205.
  • 206.
  • 207.
  • 209. So, what did we learn?
  • 210. Furnishing our exported node table with data from our own investigations allowed for deeper dives into our sample.
  • 211. Importing external data provided us with possible new angles and ideas.
  • 212. Adding interactive elements helped us develop a basic narrative around our data.
  • 213.
  • 214. Visualising data You can do it!
  • 215. Don’t be afraid of data. Treat it as something to play with and explore.
  • 216. Don't worry about not having all the answers. Uncertainly can be a great way of raising new questions.
  • 217. Create something that could be of use to others and you might find you get help and cooperation back.
  • 219. Recommended reading Semiology of Graphics: Diagrams, Networks, Maps by Jacques Bertin The Data Journalism Handbook FREE! edited by Jonathan Gray, Liliana Bounegru, and Lucy Chambers Designing Data Visualizations by Noah Iliinsky and Julie Steele Information is Beautiful by David McCandless Envisioning Information by Edward R. Tufte
  • 220. Image sources and credits Mount Teide (palmstorys.com/bilder/Englishversions/Tenerife.html) Stratovolcano cross-section by Woudloper (http://commons.wikimedia.org/wiki/ File:Stratovolcano_cross-section.svg) The Expression of the Emotions in Man and Animals (various) (http:// commons.wikimedia.org/wiki/The_Expression_of_the_Emotions_in_Man_and_Animals) Paleolithic painting of giant elk (http://commons.wikimedia.org/wiki/ File:Lascaus,_Megaloceros.jpg) Upper Paleolithic painting of bison by Ramessos (http://commons.wikimedia.org/wiki/ File:AltamiraBison.jpg) Vitruvian Man (http://commons.wikimedia.org/wiki/ File:Da_Vinci_Vitruve_Luc_Viatour.jpg) Diagram of the causes of mortality in the army in the East (http:// commons.wikimedia.org/wiki/File:Nightingale-mortality.jpg)
  • 221. Image sources and credits cont... On the revolutions of the celestial spheres (http://ads.harvard.edu/books/ 1543droc.book/) Geocentric model of the solar system (http://commons.wikimedia.org/wiki/ File:Ptolemy_Sky.jpg) Tim Berners-Lee by Silvio Tanaka (http://commons.wikimedia.org/wiki/File:Tim_Berners- Lee_CP.jpg) Hans Rosling (http://novartisfoundation.org/platform/content/element/3967/2336.jpg) Nicolaus Copernicus (http://commons.wikimedia.org/wiki/File:Nikolaus_Kopernikus.jpg) Isaac Newton (http://commons.wikimedia.org/wiki/ File:Sir_Isaac_Newton_by_Sir_Godfrey_Kneller,_Bt.jpg)

Editor's Notes

  1. \n
  2. humans -> observe + analyse -> contents -> world around us -> unique + astonishing\n\n*BUT* form + communicate -> verbal + visual concepts\n
  3. principle factor -> adaptive advantage -> competition -> other animal species\n
  4. one respect -> superior -> other animals -> we've developed -> greater variety -> systems of communication\n\n*AND* one = art\n
  5. earliest known examples -> human expression -> demonstrate -> our ability -> chaotic + complex environments -> under control -> magic of art\n\n*BECAUSE* depict something = transform it -> whatever form/shape -> want\n
  6. *THOUGH* no one knows for sure -> real purpose? = cave paintings = experts believe -> primarily wild animals (cows, deer, horses, elk + bison) -> attempt to control or ‘tame’ them\n\n*INTERESTING* domesticated -> 1000s years\n
  7. if art + other forms -> creative expression = transform + interpret -> science = identifier + unifier\n\nfew better collisions -> 2 cultures = diagram\n
  8. world = full of famous + recognisable diagram\n\nmost potent -> ability -> express complex ideas simply -> intellectual + artistic beauty -> power -> shift perspectives -> change minds\n
  9. 500+ years ago -> (little known) Polish cleric -> Nicolaus Copernicus -> came to revolutionise -> we see -> our place -> universe\n
  10. 1000s years -> scholars + religious scriptures -> steadfast -> belief -> we (earth) = static centrepiece of universe = everything revolved around us\n\n*SO* against weight -> established teachings + opinions -> *HOW* back up theory?\n
  11. scan pages -> life's work = contains pages of tabular data and calculations\n\n*AS WELL AS* using = own astrological observations = recalculate planetary positions -> 1000s year's worth data from past astronomers = underpinned diagram\n\n
  12. despite weight (foundations) = beauty lies -> simplicity -> you or I could draw it\n
  13. for someone -> benefited -> data from diff sources -> would've welcomed -> data revolution\n
  14. for someone -> benefited -> data from diff sources -> would've welcomed -> data revolution\n
  15. for someone -> benefited -> data from diff sources -> would've welcomed -> data revolution\n
  16. for someone -> benefited -> data from diff sources -> would've welcomed -> data revolution\n
  17. barriers (lowered) -> ourselves -> rich datasets -> information -> communities + politics + governments\n\n*IF* access to -> numbers + statistics = what's really going on = we can start make things better -> local + national + Inter'l level\n
  18. barriers (lowered) -> ourselves -> rich datasets -> information -> communities + politics + governments\n\n*IF* access to -> numbers + statistics = what's really going on = we can start make things better -> local + national + Inter'l level\n
  19. barriers (lowered) -> ourselves -> rich datasets -> information -> communities + politics + governments\n\n*IF* access to -> numbers + statistics = what's really going on = we can start make things better -> local + national + Inter'l level\n
  20. barriers (lowered) -> ourselves -> rich datasets -> information -> communities + politics + governments\n\n*IF* access to -> numbers + statistics = what's really going on = we can start make things better -> local + national + Inter'l level\n
  21. offshoots -> emergence -> new + powerful tools = interrogating + presenting data.\n\nallow us = communicate findings -> in a manner -> audiences = can understand + act upon\n
  22. offshoots -> emergence -> new + powerful tools = interrogating + presenting data.\n\nallow us = communicate findings -> in a manner -> audiences = can understand + act upon\n
  23. offshoots -> emergence -> new + powerful tools = interrogating + presenting data.\n\nallow us = communicate findings -> in a manner -> audiences = can understand + act upon\n
  24. offshoots -> emergence -> new + powerful tools = interrogating + presenting data.\n\nallow us = communicate findings -> in a manner -> audiences = can understand + act upon\n
  25. These tools (basic level) = free to access and use\n
  26. From -> graphics software = produce interactive visualisations\n
  27. …tools = visualising + manipulating -> data -> graph form\n
  28. …tools = mapping location data\n
  29. …tools for scraping + retrieving + converting data\n
  30. I've found -> visualising data -> effective way -> test buoyancy theories\n\n*ALSO* gain deeper understanding -> hidden processes -> exist (organisations)\n
  31. *ALSO* I've found -> effective method -> amplifying + simplifying -> communication -> my recommendations + arguments -> people effect change in organisations\n
  32. \n
  33. \n
  34. \n
  35. \n
  36. (one of) effective methods employed -> discover + analyse -> lifecycle -> organisation's content -> conduct series 1-on-1 interviews -> key members -> authoring team\n\n*BUT* locating -> right people -> multi-departmental organisation = isn't always easy\n
  37. consulting -> hierarchal org charts = useful\n*BUT* don't (always) reveal -> hidden relationships -> forged by content\n*NOR* always show -> real influence lies -> organisation\n
  38. *THIS* a sociogram -> put simply = visualised social network\n\n*IT CAN* powerful tool = discovering deeper meanings behind -> relationships + communities -> within a network\n
  39. *SO* each node (bubble) = person within network\n\n*AND* each edge (line) = connection between two people\n
  40. used = reveal community clusters = might not always be reflected -> org chart\n\n*AND* used = calculate network science parameters = degrees of separation\n\n*SO* the more connections -> individual has = higher their degree\n
  41. *ALSO* used = calculate betweenness centrality\n\n*OR* I prefer to call it = influence\n\n*SO* more connections -> individual has -> different community clusters = higher their betweenness\n
  42. most CMSs + intranet servers -> generate + store -> sets log files = record activity (users)\n\n*WHEN* opened = might look like this\n
  43. focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  44. focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  45. focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  46. focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  47. focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  48. focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  49. focus on -> single entry\n\n*SO* data we're interested in (bold) = date + time, client ip, username, content accessed\n
  50. *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  51. *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  52. *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  53. *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  54. *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  55. *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  56. *BACK TO LOG FILE* we're looking for = same content accessed by different authors -> within (certain timeframe) = here are 2\n\ns_zajac came along 1/2 hour later from b_olson to access the same content = there’s one connection\n\n*SO* we extract these usernames…\n
  57. …add them -> 2 columned spreadsheet\n\nfirst usernames -> under ‘Source’ -> second -> under ‘Target’\n\n*FINALLY* save as .csv\n
  58. …add them -> 2 columned spreadsheet\n\nfirst usernames -> under ‘Source’ -> second -> under ‘Target’\n\n*FINALLY* save as .csv\n
  59. …add them -> 2 columned spreadsheet\n\nfirst usernames -> under ‘Source’ -> second -> under ‘Target’\n\n*FINALLY* save as .csv\n
  60. import our data in Gephi = tool = visualising + manipulating -> data -> graph form\n
  61. we have a big mess = randomly distributed nodes and edges\n\nwe’ll run a layout algorithm, which positions our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
  62. we have a big mess = randomly distributed nodes and edges\n\nwe’ll run a layout algorithm, which positions our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
  63. we have a big mess = randomly distributed nodes and edges\n\nwe’ll run a layout algorithm, which positions our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
  64. this looks better! = formed of defined community clusters\n\nTo help pick out the most important individuals in the network, we need to rank this by ‘Degrees of separation’\n
  65. this looks better! = formed of defined community clusters\n\nTo help pick out the most important individuals in the network, we need to rank this by ‘Degrees of separation’\n
  66. this looks better! = formed of defined community clusters\n\nTo help pick out the most important individuals in the network, we need to rank this by ‘Degrees of separation’\n
  67. we’ve ranked degrees by colour. The ‘greenest’ nodes have the highest degree\n\nNow let’s find out who are the most influential by ranking the network by ‘Betweenness Centrality’\n
  68. we’ve ranked degrees by colour. The ‘greenest’ nodes have the highest degree\n\nNow let’s find out who are the most influential by ranking the network by ‘Betweenness Centrality’\n
  69. we’ve ranked degrees by colour. The ‘greenest’ nodes have the highest degree\n\nNow let’s find out who are the most influential by ranking the network by ‘Betweenness Centrality’\n
  70. Despite the colour and size of nodes helping us to see who are the most influential in this network, it is still too busy\n\nso we’ll filter out the weaker points of the network\n
  71. Despite the colour and size of nodes helping us to see who are the most influential in this network, it is still too busy\n\nso we’ll filter out the weaker points of the network\n
  72. Despite the colour and size of nodes helping us to see who are the most influential in this network, it is still too busy\n\nso we’ll filter out the weaker points of the network\n
  73. These look like the most influential people we should be talking to\n\nThey might have been deep on an org chart but in this context they are very important to us.\n
  74. \n
  75. we (able) see = different community clusters -> these people belonged to -> identified -> our graph\n
  76. in effect we created = alternative org chart\n\n*AND* revealed = another power structure within an organisation\n
  77. studying sociograms might = enable large organisations -> understand which individuals -> connecting remote communities\n\n*AND* very useful = those of us -> air-dropped into organisations -> limited time to learn -> people + culture\n
  78. Let's look = how content relationships -> visualised\n
  79. Let's look = how content relationships -> visualised\n
  80. Let's look = how content relationships -> visualised\n
  81. Let's look = how content relationships -> visualised\n
  82. though many of us = prepare web content -> isolated docs + locations -> ready for assembly -> they are = rarely read that way\n\n*SO* fully appreciate -> story we're telling (audiences) = we might -> take a look from above...\n
  83. ...like this this graph = rendering = most frequently trodden paths -> through a website\n\nby studying it = possible glean significant information -> our content + relationships between it\n\n*BUT* before we make one = me explain what we're looking at\n
  84. ...like this this graph = rendering = most frequently trodden paths -> through a website\n\nby studying it = possible glean significant information -> our content + relationships between it\n\n*BUT* before we make one = me explain what we're looking at\n
  85. ...like this this graph = rendering = most frequently trodden paths -> through a website\n\nby studying it = possible glean significant information -> our content + relationships between it\n\n*BUT* before we make one = me explain what we're looking at\n
  86. ...here’s a simplified example:\n\n*SO* each node (bubble) = web page\n\n*AND* each edge (line) = flow of users between pages (traffic)\n
  87. ...here’s a simplified example:\n\n*SO* each node (bubble) = web page\n\n*AND* each edge (line) = flow of users between pages (traffic)\n
  88. *SO* larger nodes = higher PageRank = (essentially) higher importance -> relation other pages (set)\n\n*AND* thicker edges = frequency of flow -> between pages -> this direction\n
  89. *SO* larger nodes = higher PageRank = (essentially) higher importance -> relation other pages (set)\n\n*AND* thicker edges = frequency of flow -> between pages -> this direction\n
  90. Indebted = Dorian Taylor -> this technique\n\nread everything he's ever written, particularly his essay for contents magazine\n
  91. In his article = "visualising paths through the web" -> Dorian points out -> web servers = log information -> each available referring resource (prev. page) + each new request we make (next page)\n
  92. let's take a look -> sample web server log = looks a lot like = log file sample we saw earlier *BUT* = subtle differences\n
  93. focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
  94. focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
  95. focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
  96. focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
  97. focus on -> two entries\n\n*SO* we want -> locate each referent-referrer connection (next + previous page) = come under matching client ip addresses\n
  98. cleaning data = case of stripping it -> all non-human visits -> web crawlers = Google bot, Bing Bot and others\n\nfind and replace = your friend and ally\n
  99. …add them -> 2 columned spreadsheet -> like before\n\nreferring URLs -> under ‘Source’ -> referent URLs -> under ‘Target’\n\n*FINALLY* save as .csv\n
  100. …add them -> 2 columned spreadsheet -> like before\n\nreferring URLs -> under ‘Source’ -> referent URLs -> under ‘Target’\n\n*FINALLY* save as .csv\n
  101. …add them -> 2 columned spreadsheet -> like before\n\nreferring URLs -> under ‘Source’ -> referent URLs -> under ‘Target’\n\n*FINALLY* save as .csv\n
  102. before continue -> introduce -> working example -> remainder (this talk)\n\ncross country coaches = fictitious UK bus + coach operator = generous with their data!\n\nmain services include: airport runs -> day trips to major towns + cities -> attractions -> sporting + music events\n
  103. all roads (should) -> journey planner = buying e-tickets\n
  104. *AGAIN* import our data in Gephi = tool = visualising + manipulating -> data -> graph form\n
  105. we have a BIGGER mess = randomly distributed nodes and edges\n\nwe’ll run the same layout algorithm as before to position our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
  106. we have a BIGGER mess = randomly distributed nodes and edges\n\nwe’ll run the same layout algorithm as before to position our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
  107. we have a BIGGER mess = randomly distributed nodes and edges\n\nwe’ll run the same layout algorithm as before to position our nodes in an aesthetically pleasing way\n\nI use Force Atlas 2\n
  108. this looks better! = formed of defined clusters\n\nTo help pick out the most important pages/content in the network, we need to rank this by ‘PageRank’\n
  109. this looks better! = formed of defined clusters\n\nTo help pick out the most important pages/content in the network, we need to rank this by ‘PageRank’\n
  110. this looks better! = formed of defined clusters\n\nTo help pick out the most important pages/content in the network, we need to rank this by ‘PageRank’\n
  111. Despite the colour and size of nodes helping us to see what is the most influential content in this network, it is still too busy\n\nso, again, we’ll filter out the weaker points of the network\n
  112. All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
  113. All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
  114. All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
  115. All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
  116. All roads should be leading to Journey Finder\n\nBut many are choosing to leave this process of buying an e-ticket to ‘Contact’ Cross Country Coaches - that’s interesting = something to investigate\n\n\n
  117. Nevertheless, what we have ourselves is a sample of the most frequently-trodden paths this website = a very useful starting point\n\nSo we’ll export the node data as a .csv as this will come in handy later\n
  118. \n
  119. visualising the referring/referent data -> web server logs = provided different perspective -> stories -> we're telling our audiences\n
  120. by filtering data = removed visual complexity = to reveal key paths + stops users making\n
  121. *AND* exporting node data = generated sample most frequently accessed site content\n
  122. \n
  123. \n
  124. \n
  125. \n
  126. we can add -> further value + depth -> visualisations = informing them -> data from our own investigations\n
  127. Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  128. Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  129. Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  130. Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  131. Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  132. Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  133. Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  134. Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  135. Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  136. Let’s export -> node table -> in addition to page title/descriptions = (manually) add set new columns -> extracted from recent audit\n\ndept responsibility -> individual responsibility -> sourced from (external partners?) -> last updated -> criteria for measuring content quality\n
  137. *OR* could use = Google Fusion Tables = help quickly merge data across 2 spreadsheets = pairing up 1 (or more) columns -> same values\n
  138. now we = download CSV -> take it where we please\n
  139. now we = download CSV -> take it where we please\n
  140. now we = download CSV -> take it where we please\n
  141. What could we do -> exported node table -> now we have = additional data?\n
  142. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  143. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  144. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  145. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  146. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  147. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  148. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  149. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  150. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  151. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  152. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  153. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  154. we could use = Tableau Public -> dive deeper into our data\n\ndesktop data visualisation app = quickly create charts + graphs\n\n*BUT* PC software only\n
  155. let's pose a question\n
  156. here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  157. here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  158. here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  159. here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  160. here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  161. here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  162. here we have = set of bar charts -> tell us -> what each content source scored (out of 5) = Actionability, Accuracy, Usability -> the content -> responsible for\n\n*IF* single out Tourism Partners = score below average for each measurement\n\n*SO* what content = they responsible for?\n
  163. *SO* let’s return to our intro = Cross Country Coaches\n\nTourism Partners responsible for = content for each town/city destination\n
  164. by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
  165. by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
  166. by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
  167. by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
  168. by filtering data (further) -> quickly generate (Tableau Public) = breakdown of the scores -> each town + city destination\n\n*AS* these destinations = scored lowest = first in line for review?\n
  169. could external data = help us prioritise our efforts?\n\nlet's bring some into play\n
  170. *SO* I downloaded data = domestic tourism stats for 09-11 = VisitBritain\n
  171. only problem was = data = only available PDF format\n\nthis happens a lot!\n
  172. *BUT* you can save hours -> re-keying + checking = using tools like these\n\nthese free tools = convert PDF-to-Excel\n
  173. we can use = Google Fusion Tables = merge our data -> each town + city destination -> with downloaded VisitBritain data\n\nonly inherit the data -> corresponds -> our towns + cities \n\n*SO* let's map the data\n
  174. we can use = Google Fusion Tables = merge our data -> each town + city destination -> with downloaded VisitBritain data\n\nonly inherit the data -> corresponds -> our towns + cities \n\n*SO* let's map the data\n
  175. here's a map of British Isles -> generated via Tableau Public\n\nincluded are = bubbles marking -> each town + city -> from our data\n\nTableau public automatically recognised this as location data\n
  176. *SO* bubble sizes represent = total domestic trips 09-11 (millions)\n\n*AND* colour intensity represents = total spend 09-11 (millions)\n
  177. *SO* bubble sizes represent = total domestic trips 09-11 (millions)\n\n*AND* colour intensity represents = total spend 09-11 (millions)\n
  178. *SO* bubble sizes represent = total domestic trips 09-11 (millions)\n\n*AND* colour intensity represents = total spend 09-11 (millions)\n
  179. *SO* bubble sizes represent = total domestic trips 09-11 (millions)\n\n*AND* colour intensity represents = total spend 09-11 (millions)\n
  180. as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  181. as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  182. as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  183. as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  184. as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  185. as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  186. as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  187. as expected -> London = largest + darkest bubble = remove\n\nlet's use sliders -> filter out destinations = scored highest for Actionability, Accuracy, Usability\n\nnow we're left = Birmingham + Edinburgh = highest amount of trips + spend = priorities?\n
  188. \n
  189. filtering + partitioning -> data -> own investigations = took us deeper into -> data\n\n
  190. importing -> external data = provided possible new angles + ideas\n
  191. adding interactive elements = helped develop narrative around -> data\n\n*AND* invites our audiences = delve deeper still\n
  192. \n
  193. \n
  194. \n
  195. \n
  196. play + experiment -> your data\n\nhave fun with it -> ask questions of it -> don't fear it\n\n*THEN* = often yield its secrets + stories = surprising ease\n
  197. we (often) associate numbers = authority and certainty\n\nbut uncertainly = great way -> raising new questions -> sharing them with others\n
  198. by creating + sharing -> your work -> with others = you might find -> get help + co-operation back\n
  199. \n
  200. \n
  201. \n
  202. \n