Your SlideShare is downloading. ×
0
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Visualization Lifecycle
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Visualization Lifecycle

6,854

Published on

Published in: Education, Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,854
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
81
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Visualization Lifecycledatainsight San Francisco 2011 Raffael Marty
  • 2. “Transform a dataset into a captive story.” ‣ Assess Youʼre on your own Art ‣ Parse ‣ Clean ‣ Visualize Visualization Tools and Librariespixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 3. Audience Expert Fun Technical Overview Boring Beginnerpixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 4. Visualization Process Contextual Data iterations Data Sources (Data Store) Structured Data Visual Representation visualization parsing feature selection files database filtering aggregation cleansingpixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 5. Data Sources ‣ File XML, JSON, CSV, TSV ‣Database mysql -u root -p mydatabase < dump.sql ‣ API curl ‘http://freebase.com/api/service/ ‣Factual search?query=al+gore&indent=1’ ‣Freebase ‣Infochimps ‣OpenStreetMappixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 6. Explore Data ‣ What is the data about? ‣ What are the data features/columns? ‣ Is there a common structure in the data? ‣ What are the data types? Nov 7 09:14:46 fwbox kernel: DROPPED IN=eth0 OUT= MAC=00:0c:29:e3:45:bd:00:0c: 29:b5:5c:ee:08:00 SRC=10.1.222.31 DST=10.1.222.202 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=63849 DF PROTO=TCP SPT=58485 DPT=9111 WINDOW=5840 RES=0x00 SYN URGP=0 May 25 20:24:20 ram-laptop kernel: BLOCK any in: IN=eth1 OUT= MAC=00:13:02:ac:d8:ea:00:09:5b:3d:df:00:08:00 SRC=213.175.90.24 DST=192.168.0.15 LEN=576 TOS=0x00 PREC=0x00 TTL=115 ID=23513 PROTO=TCP SPT=9030 DPT=56772 WINDOW=65535 RES=0x00 ACK URGP=0pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 7. Parsing and Normalization ‣ Parsing ‣ extraction of entities / features ‣ imposing structure Oct 13 20:00:43.874401 rule 193/0(match): block in on xl0: 212.251.89.126.3859 >: S 1818630320:1818630320(0) win 65535 <mss 1460,nop,nop,sackOK> (DF) ‣ often use regexes Oct 13 20:00:43 fwbox local4:warn|warning fw07 %PIX-4-106023: Deny tcp src internet: 212.251.89.126/3859 dst 212.254.110.98/135 by access- group "internet_access_in" ‣ Normalize Oct 13 20:00:43 fwbox kernel: DROPPED IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:0f:cc:81:40:94:08:00 SRC=212.251.89.126 DST=212.254.110.98 LEN=576 TOS=0x00 PREC=0x00 TTL=255 ID=8624 PROTO=TCP SPT=3859 DPT=135 LEN=556 ‣ field normalization ‣ term normalization: block, deny, dropped ‣ Generate a common output format for vis-tools (e.g., CSV)pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 8. Parser Oct 13 20:00:38.018152 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 62.2.32.250.53: 34388 [1au][|domain] (DF)Raw Oct 13 20:00:38.115862 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 192.134.0.49.53: 49962 [1au][|domain] (DF) Oct 13 20:00:38.157238 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 194.25.2.133.53: 14434 [1au][|domain] (DF) (.*) rule ([-d]+/d+)(.*?): (pass|block) (in|out) on (w+): (d+.d+.d+.d+).?(d*) [<>]Regex / Parser (d+.d+.d+.d+).?(d*): (.*) Oct 13 20:00:38.018152,57/0,match,pass,in,xl1,195.141.69.45,1030,62.2.32.250,53,34388 [1au][|domain] (DF)Normalized Oct 13 20:00:38.115862,57/0,match,pass,in,xl1,195.141.69.45,1030,192.134.0.49,53,49962 [1au][|domain] (DF)(CSV) Oct 13 20:00:38.157238,57/0,match,pass,in,xl1,195.141.69.45,1030,194.25.2.133,53,14434 [1au][|domain] (DF)pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 9. UNIX Tools ‣ grep ‣cat file | grep –v “foo” ‣ awk ‣awk –F, ‘{printf(“%s,%sn”,$2,$1);}’ ‣awk -F, -v OFS=, ‘{print $2,$1}’ ‣ sed ‣sed -e s/fubar/foobar/g filenamepixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 10. Regular Expression Resources ‣ http://regexlib.com ‣ http://www.regular-expressions.info ‣ http://gskinner.com/RegExrpixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 11. Data Cleansing ‣ Filter ‣ Normalize (see earlier) ‣ Aggregationpixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 12. Load CSV into Database # mysql -u <user> -p Sometimes you just load your data into a tool, and you can omit this mysql> create database data; step mysql> create table set1 (id int, address varchar(20), ...); mysql> LOAD DATA LOCAL INFILE input_file INTO TABLE set1 FIELDS TERMINATED BY , LINES TERMINATED BY n;pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 13. Contextual Data ‣ Either dump into DB or use via API calls to augment ‣ IP -> Geo mapping ‣ Information about countries ‣ Port number -> service namepixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 14. Feature Selection ‣ What are the fields you are interested in? ‣ Compute new fields ‣start time, end time -> duration ‣IP subnets [ 10.2.4.2 -> 10.0.0.0/8 or 192.168.1.2 -> 192.168.1.0/24 ] ‣ Entropy: H ( X ) = E ( I ( X ) ) ‣ Dimensionality reduction ‣See Bryan’s talk!pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 15. Choose Your Poisonpixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 16. Ode to the Piepixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 17. A Good Visual ‣ Chose the right graph ‣ Simultaneous views ‣ Reduce non-data ink ‣ Interactivitypixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 18. Visual Transformations ‣ keep iterating on visual transformations, change ‣color ‣shape ‣features display ‣ add new fields? ‣ add more context? ‣ is the output expressive? ‣ capture output and prettify it for presentationpixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 19. Data Visualization Toolsand Libraries
  • 20. Tools and Libraries ‣ http://datainsightsf.com/resources/ ‣Choose what’s appropriate! ‣ Data Analysis and Visualization LInuX ‣davix.secviz.org ‣ GraphViz ‣graphviz.org ‣ AfterGlow (CSV -> DOT) ‣afterglow.sf.netpixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 21. Libraries ‣ Reporting Libraries ‣Visualization Libraries ‣HighCharts ‣TheJIT ‣Flot ‣Graphael ‣Google Chart API ‣Protovis ‣Open Flash Chart ‣ProcessingJS ‣JQuery Sparklines ‣Flare ‣Polymaps ‣D3 -pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 22. HighCharts ‣ Click-Through ‣ On load ‣near real-time updates ‣ Zoom www.highcharts.compixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 23. Google Visualization API http://code.google.com/apis/visualization/interactive_charts.html ‣ JavaScript ‣ Based on DataTables() ‣ Many graphs ‣ Playground ‣ http://code.google.com/apis/ajax/playgroundpixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 24. ProtoVis ‣ JavaScript based visualization library ‣ Charting ‣ Treemaps ‣ BoxPlots ‣ Parallel Coordinates ‣ etc. http://vis.stanford.edu/protovis/pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 25. TheJIT http://thejit.org/ ‣ JavaScript InfoVis Toolkit ‣ Interactive ‣ Link Graphspixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 26. Processing ‣ Visualization library ‣ Java based ‣ Interactive (event handling) ‣ Number of libraries to ‣ draw in OpenGL ‣ read XML files ‣ Processing JS ‣ JavaScript ‣ HTML 5 Canvas ‣ WebGL http://processingjs.org/ ‣ Web IDE http://processing.org/pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 27. Visualization Tools ‣ Gephi ‣R ‣ Matlab ‣ Mondrian ‣ PicViz ‣ Treemap 4.1 ‣ Google Earthpixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 28. Gephi http://gephi.org ‣ reads: CSV, DOT, etc. ‣ graph analysis algorithms ‣ highly interactivepixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 29. PicViz http://www.wallinfire.net/picviz/pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 30. Treemap 4.1 http://www.cs.umd.edu/hcil/treemap/pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 31. Google Earth • KML data format for encoding datapixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 32. pixlcloud buy nowcollect. visualize. understand. @raffaelmarty

×