Visualising Activity DataTony HirstDept of Communication and Systems,The Open UniversityScattered puzzle pieces next to solved fragment by HoriaVarlan
Today’s link shortener is bit.lyRead:		 [ jlKwGq ]as:		 http://bit.ly/jlKwGq
Visual Analysisvs.Presentation Graphics
This is NOT a presentation about: data discovery
 data preparation
 data cleansingBUT…
ScraperWiki[ aGhJtK ]
Search and replace……add regular expressions and you have search and replace “on steroids”
Google Refine[ aq1jUE ]Example: walkthrough (@jenit) [ awGQPT ]Example: merging two tables by column [ pWK3C0 ]
DataWrangler[ gmE3yz ]
Data has shape and structure
Hierarchical Data
Many Eyes[ qY5786 ]Treemaps
plot srcfile using ($1):(column(focusCar) -$2) with lines title "VET", srcfileusing ($1):(column(focusCar) -$3) with lines title "WEB", srcfileusing ($1):(column(focusCar) -$4) with lines title "HAM", srcfileusing ($1):(column(focusCar) -$5) with lines title "BUT", srcfileusing ($1):(column(focusCar) -$6) with lines title "ALO", srcfileusing ($1):(column(focusCar) -$7) with lines title "MAS", srcfileusing ($1):(column(focusCar) -$8) with lines title "SCH", srcfileusing ($1):(column(focusCar) -$9) with lines title "ROS", …
Or heatmaps in R:[ qXmPgs ]
Text processing with Unix tools[ m5tz63 ] [ lOVySX ]Count number of lines in a file: wc-l L2sample.csvView first few lines in a file: head L2sample.csv or head -n 4 L2sample.csv View last few lines in a file: tail L2sample.csv or tail -n 15 L2sample.csvSample contiguous rows from start or end of file:head -n 1 L2sample.csv > headers.csv	tail -n 20 L2sample.csv > subSample.csv	cat headers.csvsubSample.csv > subSampleWithHeaders.csvSample contiguous rows from middle of file:head -n 15 L2sample.csv | tail -n 6 > middleSample.csvSplit large file into smaller files:split -l 15 L2sample.csv subSamplesSearch for lines containing a term:grepmendeley L2sample.csvgrepEBSCO L2sample.csv > rowsContainingEBSCO.csv
More text processing tricksExtract columns:cut -f 3 L2sample.csv	cut -f 1,2,14,17 L2sample.csv > columnSample.csvSort data in a column:	cut -f 40 L2sample.csv | sortIdentify distinct entries in a column:	cut -f 40 L2sample.csv | sort | uniqCount how many times each distinct term appears in a column:	cut -f 40 L2sample.csv | sort | uniq –cSort can also sort by column (-k), reverse order (-r):cut -f 40 L2_2011-04.csv | sort | uniq -c | sort -k 1 -r > uniqueSID.csv
[ dAdIo3 ]
Time series data
aka “seasonal subseries”[ j3HODr ]
matplotlibTrends[ qSIcrV ]#time series data in d#first differencefd=np.diff(d)Autocorrelation
Graphs and Networks
Graphvizdigraph test {CSV [shape=box]KML [shape=box]JSON [shape=box]XML [shape=box]RDF [shape=box]HTML [shape=box]GoogleSpreadsheet[shape=Msquare]RDFTripleStore [shape=Msquare]"[SPARQL]" [shape=diamond]"[YQL]" [shape=diamond]"[GoogleVizDataAPI]" [shape=diamond]"<GoogleGadgets>" [shape=doubleoctagon]"<GoogleVizDataCharts>" [shape=doubleoctagon]"<GoogleMaps>" [shape=doubleoctagon]"<GoogleEarth>" [shape=doubleoctagon]"<JQueryCharts_etc>" [shape=doubleoctagon]"[SPARQL]"->RDF;"[SPARQL]"->XML;"[SPARQL]"->CSV;"[SPARQL]"->JSON;JSON-> "<JQueryCharts_etc>";CSV->"{GoogleRefine}"CSV->ScraperWikiJSON->ScraperWiki"[YQL]"->ScraperWikiScraperWiki->CSVHTML->ScraperWikiHTML->"[YQL]""[SPARQL]"->"[YQL]""{GoogleRefine}"->CSV [style=dashed]CSV->"<Gephi>" [style=dashed]"<Gephi>"->CSV [style=dashed]RDF->"[YQL]”}
Gephi
[ nKoB4b]
[ nKoB4b]
Statistical Graphs
R
Graphics Libraries
Protovis
Processing

Jiscad viz

  • 1.
    Visualising Activity DataTonyHirstDept of Communication and Systems,The Open UniversityScattered puzzle pieces next to solved fragment by HoriaVarlan
  • 2.
    Today’s link shorteneris bit.lyRead: [ jlKwGq ]as: http://bit.ly/jlKwGq
  • 3.
  • 5.
    This is NOTa presentation about: data discovery
  • 6.
  • 7.
  • 8.
  • 9.
    Search and replace……addregular expressions and you have search and replace “on steroids”
  • 10.
    Google Refine[ aq1jUE]Example: walkthrough (@jenit) [ awGQPT ]Example: merging two tables by column [ pWK3C0 ]
  • 11.
  • 12.
    Data has shapeand structure
  • 13.
  • 14.
  • 19.
    plot srcfile using($1):(column(focusCar) -$2) with lines title "VET", srcfileusing ($1):(column(focusCar) -$3) with lines title "WEB", srcfileusing ($1):(column(focusCar) -$4) with lines title "HAM", srcfileusing ($1):(column(focusCar) -$5) with lines title "BUT", srcfileusing ($1):(column(focusCar) -$6) with lines title "ALO", srcfileusing ($1):(column(focusCar) -$7) with lines title "MAS", srcfileusing ($1):(column(focusCar) -$8) with lines title "SCH", srcfileusing ($1):(column(focusCar) -$9) with lines title "ROS", …
  • 20.
    Or heatmaps inR:[ qXmPgs ]
  • 21.
    Text processing withUnix tools[ m5tz63 ] [ lOVySX ]Count number of lines in a file: wc-l L2sample.csvView first few lines in a file: head L2sample.csv or head -n 4 L2sample.csv View last few lines in a file: tail L2sample.csv or tail -n 15 L2sample.csvSample contiguous rows from start or end of file:head -n 1 L2sample.csv > headers.csv tail -n 20 L2sample.csv > subSample.csv cat headers.csvsubSample.csv > subSampleWithHeaders.csvSample contiguous rows from middle of file:head -n 15 L2sample.csv | tail -n 6 > middleSample.csvSplit large file into smaller files:split -l 15 L2sample.csv subSamplesSearch for lines containing a term:grepmendeley L2sample.csvgrepEBSCO L2sample.csv > rowsContainingEBSCO.csv
  • 22.
    More text processingtricksExtract columns:cut -f 3 L2sample.csv cut -f 1,2,14,17 L2sample.csv > columnSample.csvSort data in a column: cut -f 40 L2sample.csv | sortIdentify distinct entries in a column: cut -f 40 L2sample.csv | sort | uniqCount how many times each distinct term appears in a column: cut -f 40 L2sample.csv | sort | uniq –cSort can also sort by column (-k), reverse order (-r):cut -f 40 L2_2011-04.csv | sort | uniq -c | sort -k 1 -r > uniqueSID.csv
  • 23.
  • 25.
  • 26.
  • 28.
    matplotlibTrends[ qSIcrV ]#timeseries data in d#first differencefd=np.diff(d)Autocorrelation
  • 29.
  • 30.
    Graphvizdigraph test {CSV[shape=box]KML [shape=box]JSON [shape=box]XML [shape=box]RDF [shape=box]HTML [shape=box]GoogleSpreadsheet[shape=Msquare]RDFTripleStore [shape=Msquare]"[SPARQL]" [shape=diamond]"[YQL]" [shape=diamond]"[GoogleVizDataAPI]" [shape=diamond]"<GoogleGadgets>" [shape=doubleoctagon]"<GoogleVizDataCharts>" [shape=doubleoctagon]"<GoogleMaps>" [shape=doubleoctagon]"<GoogleEarth>" [shape=doubleoctagon]"<JQueryCharts_etc>" [shape=doubleoctagon]"[SPARQL]"->RDF;"[SPARQL]"->XML;"[SPARQL]"->CSV;"[SPARQL]"->JSON;JSON-> "<JQueryCharts_etc>";CSV->"{GoogleRefine}"CSV->ScraperWikiJSON->ScraperWiki"[YQL]"->ScraperWikiScraperWiki->CSVHTML->ScraperWikiHTML->"[YQL]""[SPARQL]"->"[YQL]""{GoogleRefine}"->CSV [style=dashed]CSV->"<Gephi>" [style=dashed]"<Gephi>"->CSV [style=dashed]RDF->"[YQL]”}
  • 31.
  • 32.
  • 33.
  • 38.
  • 39.
  • 41.
  • 42.
  • 43.

Editor's Notes

  • #26 Change the basis… eg in OU, might consider different presentations (“years”) of the same course (“month”).