Successfully reported this slideshow.

Visualization Lifecycle

7,698 views

Published on

Published in: Education, Technology
  • Be the first to comment

Visualization Lifecycle

  1. 1. Visualization Lifecycledatainsight San Francisco 2011 Raffael Marty
  2. 2. “Transform a dataset into a captive story.” ‣ Assess Youʼre on your own Art ‣ Parse ‣ Clean ‣ Visualize Visualization Tools and Librariespixlcloud | collect. visualize. understand. Copyright (c) 2011
  3. 3. Audience Expert Fun Technical Overview Boring Beginnerpixlcloud | collect. visualize. understand. Copyright (c) 2011
  4. 4. Visualization Process Contextual Data iterations Data Sources (Data Store) Structured Data Visual Representation visualization parsing feature selection files database filtering aggregation cleansingpixlcloud | collect. visualize. understand. Copyright (c) 2011
  5. 5. Data Sources ‣ File XML, JSON, CSV, TSV ‣Database mysql -u root -p mydatabase < dump.sql ‣ API curl ‘http://freebase.com/api/service/ ‣Factual search?query=al+gore&indent=1’ ‣Freebase ‣Infochimps ‣OpenStreetMappixlcloud | collect. visualize. understand. Copyright (c) 2011
  6. 6. Explore Data ‣ What is the data about? ‣ What are the data features/columns? ‣ Is there a common structure in the data? ‣ What are the data types? Nov 7 09:14:46 fwbox kernel: DROPPED IN=eth0 OUT= MAC=00:0c:29:e3:45:bd:00:0c: 29:b5:5c:ee:08:00 SRC=10.1.222.31 DST=10.1.222.202 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=63849 DF PROTO=TCP SPT=58485 DPT=9111 WINDOW=5840 RES=0x00 SYN URGP=0 May 25 20:24:20 ram-laptop kernel: BLOCK any in: IN=eth1 OUT= MAC=00:13:02:ac:d8:ea:00:09:5b:3d:df:00:08:00 SRC=213.175.90.24 DST=192.168.0.15 LEN=576 TOS=0x00 PREC=0x00 TTL=115 ID=23513 PROTO=TCP SPT=9030 DPT=56772 WINDOW=65535 RES=0x00 ACK URGP=0pixlcloud | collect. visualize. understand. Copyright (c) 2011
  7. 7. Parsing and Normalization ‣ Parsing ‣ extraction of entities / features ‣ imposing structure Oct 13 20:00:43.874401 rule 193/0(match): block in on xl0: 212.251.89.126.3859 >: S 1818630320:1818630320(0) win 65535 <mss 1460,nop,nop,sackOK> (DF) ‣ often use regexes Oct 13 20:00:43 fwbox local4:warn|warning fw07 %PIX-4-106023: Deny tcp src internet: 212.251.89.126/3859 dst 212.254.110.98/135 by access- group "internet_access_in" ‣ Normalize Oct 13 20:00:43 fwbox kernel: DROPPED IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:0f:cc:81:40:94:08:00 SRC=212.251.89.126 DST=212.254.110.98 LEN=576 TOS=0x00 PREC=0x00 TTL=255 ID=8624 PROTO=TCP SPT=3859 DPT=135 LEN=556 ‣ field normalization ‣ term normalization: block, deny, dropped ‣ Generate a common output format for vis-tools (e.g., CSV)pixlcloud | collect. visualize. understand. Copyright (c) 2011
  8. 8. Parser Oct 13 20:00:38.018152 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 62.2.32.250.53: 34388 [1au][|domain] (DF)Raw Oct 13 20:00:38.115862 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 192.134.0.49.53: 49962 [1au][|domain] (DF) Oct 13 20:00:38.157238 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 194.25.2.133.53: 14434 [1au][|domain] (DF) (.*) rule ([-d]+/d+)(.*?): (pass|block) (in|out) on (w+): (d+.d+.d+.d+).?(d*) [<>]Regex / Parser (d+.d+.d+.d+).?(d*): (.*) Oct 13 20:00:38.018152,57/0,match,pass,in,xl1,195.141.69.45,1030,62.2.32.250,53,34388 [1au][|domain] (DF)Normalized Oct 13 20:00:38.115862,57/0,match,pass,in,xl1,195.141.69.45,1030,192.134.0.49,53,49962 [1au][|domain] (DF)(CSV) Oct 13 20:00:38.157238,57/0,match,pass,in,xl1,195.141.69.45,1030,194.25.2.133,53,14434 [1au][|domain] (DF)pixlcloud | collect. visualize. understand. Copyright (c) 2011
  9. 9. UNIX Tools ‣ grep ‣cat file | grep –v “foo” ‣ awk ‣awk –F, ‘{printf(“%s,%sn”,$2,$1);}’ ‣awk -F, -v OFS=, ‘{print $2,$1}’ ‣ sed ‣sed -e s/fubar/foobar/g filenamepixlcloud | collect. visualize. understand. Copyright (c) 2011
  10. 10. Regular Expression Resources ‣ http://regexlib.com ‣ http://www.regular-expressions.info ‣ http://gskinner.com/RegExrpixlcloud | collect. visualize. understand. Copyright (c) 2011
  11. 11. Data Cleansing ‣ Filter ‣ Normalize (see earlier) ‣ Aggregationpixlcloud | collect. visualize. understand. Copyright (c) 2011
  12. 12. Load CSV into Database # mysql -u <user> -p Sometimes you just load your data into a tool, and you can omit this mysql> create database data; step mysql> create table set1 (id int, address varchar(20), ...); mysql> LOAD DATA LOCAL INFILE input_file INTO TABLE set1 FIELDS TERMINATED BY , LINES TERMINATED BY n;pixlcloud | collect. visualize. understand. Copyright (c) 2011
  13. 13. Contextual Data ‣ Either dump into DB or use via API calls to augment ‣ IP -> Geo mapping ‣ Information about countries ‣ Port number -> service namepixlcloud | collect. visualize. understand. Copyright (c) 2011
  14. 14. Feature Selection ‣ What are the fields you are interested in? ‣ Compute new fields ‣start time, end time -> duration ‣IP subnets [ 10.2.4.2 -> 10.0.0.0/8 or 192.168.1.2 -> 192.168.1.0/24 ] ‣ Entropy: H ( X ) = E ( I ( X ) ) ‣ Dimensionality reduction ‣See Bryan’s talk!pixlcloud | collect. visualize. understand. Copyright (c) 2011
  15. 15. Choose Your Poisonpixlcloud | collect. visualize. understand. Copyright (c) 2011
  16. 16. Ode to the Piepixlcloud | collect. visualize. understand. Copyright (c) 2011
  17. 17. A Good Visual ‣ Chose the right graph ‣ Simultaneous views ‣ Reduce non-data ink ‣ Interactivitypixlcloud | collect. visualize. understand. Copyright (c) 2011
  18. 18. Visual Transformations ‣ keep iterating on visual transformations, change ‣color ‣shape ‣features display ‣ add new fields? ‣ add more context? ‣ is the output expressive? ‣ capture output and prettify it for presentationpixlcloud | collect. visualize. understand. Copyright (c) 2011
  19. 19. Data Visualization Toolsand Libraries
  20. 20. Tools and Libraries ‣ http://datainsightsf.com/resources/ ‣Choose what’s appropriate! ‣ Data Analysis and Visualization LInuX ‣davix.secviz.org ‣ GraphViz ‣graphviz.org ‣ AfterGlow (CSV -> DOT) ‣afterglow.sf.netpixlcloud | collect. visualize. understand. Copyright (c) 2011
  21. 21. Libraries ‣ Reporting Libraries ‣Visualization Libraries ‣HighCharts ‣TheJIT ‣Flot ‣Graphael ‣Google Chart API ‣Protovis ‣Open Flash Chart ‣ProcessingJS ‣JQuery Sparklines ‣Flare ‣Polymaps ‣D3 -pixlcloud | collect. visualize. understand. Copyright (c) 2011
  22. 22. HighCharts ‣ Click-Through ‣ On load ‣near real-time updates ‣ Zoom www.highcharts.compixlcloud | collect. visualize. understand. Copyright (c) 2011
  23. 23. Google Visualization API http://code.google.com/apis/visualization/interactive_charts.html ‣ JavaScript ‣ Based on DataTables() ‣ Many graphs ‣ Playground ‣ http://code.google.com/apis/ajax/playgroundpixlcloud | collect. visualize. understand. Copyright (c) 2011
  24. 24. ProtoVis ‣ JavaScript based visualization library ‣ Charting ‣ Treemaps ‣ BoxPlots ‣ Parallel Coordinates ‣ etc. http://vis.stanford.edu/protovis/pixlcloud | collect. visualize. understand. Copyright (c) 2011
  25. 25. TheJIT http://thejit.org/ ‣ JavaScript InfoVis Toolkit ‣ Interactive ‣ Link Graphspixlcloud | collect. visualize. understand. Copyright (c) 2011
  26. 26. Processing ‣ Visualization library ‣ Java based ‣ Interactive (event handling) ‣ Number of libraries to ‣ draw in OpenGL ‣ read XML files ‣ Processing JS ‣ JavaScript ‣ HTML 5 Canvas ‣ WebGL http://processingjs.org/ ‣ Web IDE http://processing.org/pixlcloud | collect. visualize. understand. Copyright (c) 2011
  27. 27. Visualization Tools ‣ Gephi ‣R ‣ Matlab ‣ Mondrian ‣ PicViz ‣ Treemap 4.1 ‣ Google Earthpixlcloud | collect. visualize. understand. Copyright (c) 2011
  28. 28. Gephi http://gephi.org ‣ reads: CSV, DOT, etc. ‣ graph analysis algorithms ‣ highly interactivepixlcloud | collect. visualize. understand. Copyright (c) 2011
  29. 29. PicViz http://www.wallinfire.net/picviz/pixlcloud | collect. visualize. understand. Copyright (c) 2011
  30. 30. Treemap 4.1 http://www.cs.umd.edu/hcil/treemap/pixlcloud | collect. visualize. understand. Copyright (c) 2011
  31. 31. Google Earth • KML data format for encoding datapixlcloud | collect. visualize. understand. Copyright (c) 2011
  32. 32. pixlcloud buy nowcollect. visualize. understand. @raffaelmarty

×