SlideShare a Scribd company logo
1 of 32
Download to read offline
Visualization
  Lifecycle

datainsight
 San Francisco 2011
     Raffael Marty
“Transform a dataset into a captive story.”



              ‣ Assess                        Youʼre on your own              Art
              ‣ Parse

              ‣ Clean

              ‣ Visualize



                                          Visualization Tools and Libraries

pixlcloud | collect. visualize. understand.                                         Copyright (c) 2011
Audience
                                                        Expert

                                                                  Fun

                                Technical                               Overview

                                              Boring




                                                       Beginner

pixlcloud | collect. visualize. understand.                                        Copyright (c) 2011
Visualization Process
                                Contextual Data

                                                                                                     iterations




      Data Sources                  (Data Store)             Structured Data                   Visual Representation


                                                                               visualization

                                                   parsing
                                                                               feature selection

                                    files
                                    database
                                                              filtering
                                                              aggregation
                                                              cleansing



pixlcloud | collect. visualize. understand.                                                                       Copyright (c) 2011
Data Sources
      ‣ File                                             XML, JSON, CSV, TSV

      ‣Database                                 mysql -u root -p mydatabase < dump.sql

      ‣ API
                                                curl ‘http://freebase.com/api/service/
         ‣Factual                                   search?query=al+gore&indent=1’

         ‣Freebase

         ‣Infochimps

         ‣OpenStreetMap




pixlcloud | collect. visualize. understand.                                    Copyright (c) 2011
Explore Data
      ‣ What          is the data about?
      ‣ What          are the data features/columns?
      ‣ Is    there a common structure in the data?
      ‣ What          are the data types?
                Nov 7 09:14:46 fwbox kernel: DROPPED IN=eth0 OUT= MAC=00:0c:29:e3:45:bd:00:0c:
                29:b5:5c:ee:08:00 SRC=10.1.222.31 DST=10.1.222.202 LEN=60 TOS=0x00 PREC=0x00
                TTL=64 ID=63849 DF PROTO=TCP SPT=58485 DPT=9111 WINDOW=5840 RES=0x00 SYN URGP=0

                May 25 20:24:20 ram-laptop kernel: BLOCK any in: IN=eth1 OUT=
                MAC=00:13:02:ac:d8:ea:00:09:5b:3d:df:00:08:00 SRC=213.175.90.24 DST=192.168.0.15
                LEN=576 TOS=0x00 PREC=0x00 TTL=115 ID=23513 PROTO=TCP SPT=9030 DPT=56772
                WINDOW=65535 RES=0x00 ACK URGP=0



pixlcloud | collect. visualize. understand.                                                  Copyright (c) 2011
Parsing and Normalization
     ‣ Parsing
        ‣ extraction of entities / features

        ‣ imposing structure
                                              Oct 13 20:00:43.874401 rule 193/0(match): block in on xl0:
                                              212.251.89.126.3859 >: S 1818630320:1818630320(0) win 65535 <mss
                                              1460,nop,nop,sackOK> (DF)

        ‣ often use regexes                   Oct 13 20:00:43 fwbox local4:warn|warning fw07 %PIX-4-106023: Deny tcp
                                              src internet: 212.251.89.126/3859 dst 212.254.110.98/135 by access-
                                              group "internet_access_in"

     ‣ Normalize                              Oct 13 20:00:43 fwbox kernel: DROPPED IN=eth0 OUT=
                                              MAC=ff:ff:ff:ff:ff:ff:00:0f:cc:81:40:94:08:00 SRC=212.251.89.126
                                              DST=212.254.110.98 LEN=576 TOS=0x00 PREC=0x00 TTL=255 ID=8624
                                              PROTO=TCP SPT=3859 DPT=135 LEN=556
        ‣ field normalization

        ‣ term normalization: block, deny, dropped

     ‣ Generate              a common output format for vis-tools (e.g., CSV)

pixlcloud | collect. visualize. understand.                                                          Copyright (c) 2011
Parser
                        Oct 13 20:00:38.018152 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 62.2.32.250.53:    34388 [1au][|domain] (DF)

Raw                     Oct 13 20:00:38.115862 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 192.134.0.49.53:   49962 [1au][|domain] (DF)

                        Oct 13 20:00:38.157238 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 194.25.2.133.53:   14434 [1au][|domain] (DF)




                                      (.*) rule ([-d]+/d+)(.*?): (pass|block) (in|out) on (w+):
                                                    (d+.d+.d+.d+).?(d*) [<>]
Regex / Parser                                       (d+.d+.d+.d+).?(d*): (.*)



                        Oct 13 20:00:38.018152,57/0,match,pass,in,xl1,195.141.69.45,1030,62.2.32.250,53,34388 [1au][|domain] (DF)
Normalized              Oct 13 20:00:38.115862,57/0,match,pass,in,xl1,195.141.69.45,1030,192.134.0.49,53,49962 [1au][|domain] (DF)
(CSV)                   Oct 13 20:00:38.157238,57/0,match,pass,in,xl1,195.141.69.45,1030,194.25.2.133,53,14434 [1au][|domain] (DF)




pixlcloud | collect. visualize. understand.                                                                                        Copyright (c) 2011
UNIX Tools
     ‣ grep
        ‣cat file | grep –v “foo”

     ‣ awk
        ‣awk –F, ‘{printf(“%s,%sn”,$2,$1);}’

        ‣awk -F, -v OFS=, ‘{print $2,$1}’

     ‣ sed
        ‣sed -e 's/fubar/foobar/g' filename




pixlcloud | collect. visualize. understand.                Copyright (c) 2011
Regular Expression Resources
     ‣   http://regexlib.com
     ‣   http://www.regular-expressions.info
     ‣   http://gskinner.com/RegExr




pixlcloud | collect. visualize. understand.    Copyright (c) 2011
Data Cleansing
     ‣ Filter




     ‣ Normalize                  (see earlier)



     ‣ Aggregation



pixlcloud | collect. visualize. understand.             Copyright (c) 2011
Load CSV into Database
    # mysql -u <user> -p                          Sometimes you just load
                                                  your data into a tool,
                                                  and you can omit this
    mysql> create database data;                  step


    mysql> create table set1 (id int, address
           varchar(20), ...);
    mysql> LOAD DATA LOCAL INFILE 'input_file' INTO
                        TABLE set1 FIELDS TERMINATED BY ',' LINES
                        TERMINATED BY 'n';



pixlcloud | collect. visualize. understand.                        Copyright (c) 2011
Contextual Data
     ‣ Either          dump into DB or use via API calls to augment



     ‣ IP    -> Geo mapping
     ‣ Information                    about countries
     ‣ Port       number -> service name


pixlcloud | collect. visualize. understand.                     Copyright (c) 2011
Feature Selection
     ‣ What          are the fields you are interested in?
     ‣ Compute                 new fields
        ‣start time, end time -> duration

        ‣IP subnets [ 10.2.4.2 -> 10.0.0.0/8 or 192.168.1.2 -> 192.168.1.0/24 ]
        ‣ Entropy: H ( X ) = E ( I ( X ) )

     ‣ Dimensionality                         reduction
        ‣See Bryan’s talk!




pixlcloud | collect. visualize. understand.                             Copyright (c) 2011
Choose Your Poison




pixlcloud | collect. visualize. understand.      Copyright (c) 2011
Ode to the Pie




pixlcloud | collect. visualize. understand.               Copyright (c) 2011
A Good Visual
     ‣ Chose        the right graph            ‣ Simultaneous   views




     ‣ Reduce         non-data ink                         ‣ Interactivity




pixlcloud | collect. visualize. understand.                                  Copyright (c) 2011
Visual Transformations
     ‣ keep         iterating on visual transformations, change
        ‣color

        ‣shape

        ‣features display

     ‣ add        new fields?
     ‣ add        more context?
     ‣ is   the output expressive?
     ‣ capture             output and prettify it for presentation
pixlcloud | collect. visualize. understand.                          Copyright (c) 2011
Data Visualization Tools
and Libraries
Tools and Libraries
      ‣ http://datainsightsf.com/resources/
         ‣Choose what’s appropriate!

      ‣ Data         Analysis and Visualization LInuX
         ‣davix.secviz.org

      ‣ GraphViz
         ‣graphviz.org

      ‣ AfterGlow                 (CSV -> DOT)
         ‣afterglow.sf.net


pixlcloud | collect. visualize. understand.             Copyright (c) 2011
Libraries
     ‣ Reporting                 Libraries         ‣Visualization Libraries
        ‣HighCharts                                 ‣TheJIT
        ‣Flot                                       ‣Graphael
        ‣Google Chart API                           ‣Protovis
        ‣Open Flash Chart                           ‣ProcessingJS
        ‣JQuery Sparklines                          ‣Flare
        ‣Polymaps                                   ‣D3


                                                    -

pixlcloud | collect. visualize. understand.                              Copyright (c) 2011
HighCharts



 ‣ Click-Through

 ‣ On      load
    ‣near real-time updates

 ‣ Zoom
                                                           www.highcharts.com

pixlcloud | collect. visualize. understand.                             Copyright (c) 2011
Google Visualization API


     http://code.google.com/apis/visualization/interactive_charts.html

      ‣ JavaScript

      ‣ Based          on DataTables()
      ‣ Many          graphs
      ‣ Playground
         ‣   http://code.google.com/apis/ajax/playground

pixlcloud | collect. visualize. understand.                              Copyright (c) 2011
ProtoVis
     ‣ JavaScript               based visualization library
     ‣ Charting

     ‣ Treemaps

     ‣ BoxPlots

     ‣ Parallel           Coordinates
     ‣ etc.


                                                   http://vis.stanford.edu/protovis/
pixlcloud | collect. visualize. understand.                                  Copyright (c) 2011
TheJIT   http://thejit.org/

     ‣ JavaScript               InfoVis Toolkit
     ‣ Interactive

     ‣ Link        Graphs




pixlcloud | collect. visualize. understand.                      Copyright (c) 2011
Processing
     ‣   Visualization library
     ‣   Java based
     ‣   Interactive (event handling)
     ‣   Number of libraries to
         ‣ draw    in OpenGL
         ‣ read    XML files
     ‣   Processing JS
         ‣ JavaScript
         ‣ HTML 5 Canvas
         ‣ WebGL                                   http://processingjs.org/
         ‣ Web IDE                                 http://processing.org/

pixlcloud | collect. visualize. understand.                                   Copyright (c) 2011
Visualization Tools
     ‣ Gephi

     ‣R

     ‣ Matlab

     ‣ Mondrian

     ‣ PicViz

     ‣ Treemap                 4.1
     ‣ Google             Earth
pixlcloud | collect. visualize. understand.         Copyright (c) 2011
Gephi   http://gephi.org


     ‣ reads:           CSV, DOT, etc.
     ‣ graph           analysis algorithms
     ‣ highly           interactive




pixlcloud | collect. visualize. understand.                    Copyright (c) 2011
PicViz




                                                   http://www.wallinfire.net/picviz/

pixlcloud | collect. visualize. understand.                               Copyright (c) 2011
Treemap 4.1




                                                    http://www.cs.umd.edu/hcil/treemap/
pixlcloud | collect. visualize. understand.                                  Copyright (c) 2011
Google Earth
 • KML data format for
   encoding data




pixlcloud | collect. visualize. understand.   Copyright (c) 2011
pixlcloud                       buy now



collect. visualize. understand.



                 @raffaelmarty

More Related Content

What's hot

Power bi-dashboard-in-a-day-diad-mumbai-2019
Power bi-dashboard-in-a-day-diad-mumbai-2019Power bi-dashboard-in-a-day-diad-mumbai-2019
Power bi-dashboard-in-a-day-diad-mumbai-2019Priyanka Khanadali
 
Business Intelligence - Intro
Business Intelligence - IntroBusiness Intelligence - Intro
Business Intelligence - IntroDavid Hubbard
 
Introduction to Microsoft Power BI
Introduction to Microsoft Power BIIntroduction to Microsoft Power BI
Introduction to Microsoft Power BIExilesoft
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI OverviewJames Serra
 
PowerBI - Porto.Data - 20150219
PowerBI - Porto.Data - 20150219PowerBI - Porto.Data - 20150219
PowerBI - Porto.Data - 20150219Rui Romano
 
Sisense Introduction PPT
Sisense Introduction PPTSisense Introduction PPT
Sisense Introduction PPTKhirod Sahu
 
Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterpr...
Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterpr...Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterpr...
Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterpr...Data Con LA
 
Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesSlideTeam
 
Business intelligence ppt
Business intelligence pptBusiness intelligence ppt
Business intelligence pptsujithkylm007
 
Microsoft Dynamics 365 - Intelligent Business Applications
Microsoft Dynamics 365 - Intelligent Business ApplicationsMicrosoft Dynamics 365 - Intelligent Business Applications
Microsoft Dynamics 365 - Intelligent Business ApplicationsDavid J Rosenthal
 
Ideas & Inspiration: Getting Started & Driving Success With Power Platform At...
Ideas & Inspiration: Getting Started & Driving Success With Power Platform At...Ideas & Inspiration: Getting Started & Driving Success With Power Platform At...
Ideas & Inspiration: Getting Started & Driving Success With Power Platform At...Richard Harbridge
 
Introduction to Power BI and Data Visualization
Introduction to Power BI and Data VisualizationIntroduction to Power BI and Data Visualization
Introduction to Power BI and Data VisualizationSwapnil Jadhav
 

What's hot (20)

Power bi
Power biPower bi
Power bi
 
Power bi-dashboard-in-a-day-diad-mumbai-2019
Power bi-dashboard-in-a-day-diad-mumbai-2019Power bi-dashboard-in-a-day-diad-mumbai-2019
Power bi-dashboard-in-a-day-diad-mumbai-2019
 
Business Intelligence - Intro
Business Intelligence - IntroBusiness Intelligence - Intro
Business Intelligence - Intro
 
Dax & sql in power bi
Dax & sql in power biDax & sql in power bi
Dax & sql in power bi
 
Power BI
Power BIPower BI
Power BI
 
Introduction to Microsoft Power BI
Introduction to Microsoft Power BIIntroduction to Microsoft Power BI
Introduction to Microsoft Power BI
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
PowerBI - Porto.Data - 20150219
PowerBI - Porto.Data - 20150219PowerBI - Porto.Data - 20150219
PowerBI - Porto.Data - 20150219
 
Power bi overview
Power bi overview Power bi overview
Power bi overview
 
Sisense Introduction PPT
Sisense Introduction PPTSisense Introduction PPT
Sisense Introduction PPT
 
Big Data and Advanced Analytics
Big Data and Advanced AnalyticsBig Data and Advanced Analytics
Big Data and Advanced Analytics
 
Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterpr...
Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterpr...Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterpr...
Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterpr...
 
Power BI for Developers
Power BI for DevelopersPower BI for Developers
Power BI for Developers
 
Power BI Overview
Power BI Overview Power BI Overview
Power BI Overview
 
Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation Slides
 
Business intelligence ppt
Business intelligence pptBusiness intelligence ppt
Business intelligence ppt
 
Microsoft Dynamics 365 - Intelligent Business Applications
Microsoft Dynamics 365 - Intelligent Business ApplicationsMicrosoft Dynamics 365 - Intelligent Business Applications
Microsoft Dynamics 365 - Intelligent Business Applications
 
Power bi
Power biPower bi
Power bi
 
Ideas & Inspiration: Getting Started & Driving Success With Power Platform At...
Ideas & Inspiration: Getting Started & Driving Success With Power Platform At...Ideas & Inspiration: Getting Started & Driving Success With Power Platform At...
Ideas & Inspiration: Getting Started & Driving Success With Power Platform At...
 
Introduction to Power BI and Data Visualization
Introduction to Power BI and Data VisualizationIntroduction to Power BI and Data Visualization
Introduction to Power BI and Data Visualization
 

Viewers also liked

Cyber Security – How Visual Analytics Unlock Insight
Cyber Security – How Visual Analytics Unlock InsightCyber Security – How Visual Analytics Unlock Insight
Cyber Security – How Visual Analytics Unlock InsightRaffael Marty
 
Security Insights at Scale
Security Insights at ScaleSecurity Insights at Scale
Security Insights at ScaleRaffael Marty
 
Workshop: Big Data Visualization for Security
Workshop: Big Data Visualization for SecurityWorkshop: Big Data Visualization for Security
Workshop: Big Data Visualization for SecurityRaffael Marty
 

Viewers also liked (6)

Analytic Journeys from Predictive Analytics World
Analytic Journeys from Predictive Analytics WorldAnalytic Journeys from Predictive Analytics World
Analytic Journeys from Predictive Analytics World
 
Cyber Security – How Visual Analytics Unlock Insight
Cyber Security – How Visual Analytics Unlock InsightCyber Security – How Visual Analytics Unlock Insight
Cyber Security – How Visual Analytics Unlock Insight
 
AfterGlow
AfterGlowAfterGlow
AfterGlow
 
Security Insights at Scale
Security Insights at ScaleSecurity Insights at Scale
Security Insights at Scale
 
Workshop: Big Data Visualization for Security
Workshop: Big Data Visualization for SecurityWorkshop: Big Data Visualization for Security
Workshop: Big Data Visualization for Security
 
Gephi Quick Start
Gephi Quick StartGephi Quick Start
Gephi Quick Start
 

Similar to Visualization Lifecycle Data Insight

Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit
 
breed_python_tx_redacted
breed_python_tx_redactedbreed_python_tx_redacted
breed_python_tx_redactedRyan Breed
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustSpark Summit
 
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0Gareth Chapman
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache SparkIndicThreads
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305mjfrankli
 
Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesBobby Curtis
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and SharkYahooTechConference
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012Amazon Web Services
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...DataWorks Summit
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Stefan Urbanek
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Databricks
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingDatabricks
 
Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17Michele Orselli
 
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...InfluxData
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoopfvanvollenhoven
 

Similar to Visualization Lifecycle Data Insight (20)

Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
breed_python_tx_redacted
breed_python_tx_redactedbreed_python_tx_redacted
breed_python_tx_redacted
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 
Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail Files
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and Streaming
 
Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17
 
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoop
 

More from Raffael Marty

Exploring the Defender's Advantage
Exploring the Defender's AdvantageExploring the Defender's Advantage
Exploring the Defender's AdvantageRaffael Marty
 
Extended Detection and Response (XDR) An Overhyped Product Category With Ulti...
Extended Detection and Response (XDR)An Overhyped Product Category With Ulti...Extended Detection and Response (XDR)An Overhyped Product Category With Ulti...
Extended Detection and Response (XDR) An Overhyped Product Category With Ulti...Raffael Marty
 
How To Drive Value with Security Data
How To Drive Value with Security DataHow To Drive Value with Security Data
How To Drive Value with Security DataRaffael Marty
 
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?Raffael Marty
 
Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?Raffael Marty
 
Understanding the "Intelligence" in AI
Understanding the "Intelligence" in AIUnderstanding the "Intelligence" in AI
Understanding the "Intelligence" in AIRaffael Marty
 
AI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are DangerousAI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are DangerousRaffael Marty
 
AI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are DangerousAI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are DangerousRaffael Marty
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationRaffael Marty
 
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't ChangedAI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't ChangedRaffael Marty
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationRaffael Marty
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationRaffael Marty
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big DataRaffael Marty
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data VisualizationRaffael Marty
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?Raffael Marty
 
Visualization for Security
Visualization for SecurityVisualization for Security
Visualization for SecurityRaffael Marty
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?Raffael Marty
 
DAVIX - Data Analysis and Visualization Linux
DAVIX - Data Analysis and Visualization LinuxDAVIX - Data Analysis and Visualization Linux
DAVIX - Data Analysis and Visualization LinuxRaffael Marty
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big DataRaffael Marty
 

More from Raffael Marty (20)

Exploring the Defender's Advantage
Exploring the Defender's AdvantageExploring the Defender's Advantage
Exploring the Defender's Advantage
 
Extended Detection and Response (XDR) An Overhyped Product Category With Ulti...
Extended Detection and Response (XDR)An Overhyped Product Category With Ulti...Extended Detection and Response (XDR)An Overhyped Product Category With Ulti...
Extended Detection and Response (XDR) An Overhyped Product Category With Ulti...
 
How To Drive Value with Security Data
How To Drive Value with Security DataHow To Drive Value with Security Data
How To Drive Value with Security Data
 
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?
 
Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?
 
Understanding the "Intelligence" in AI
Understanding the "Intelligence" in AIUnderstanding the "Intelligence" in AI
Understanding the "Intelligence" in AI
 
Security Chat 5.0
Security Chat 5.0Security Chat 5.0
Security Chat 5.0
 
AI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are DangerousAI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are Dangerous
 
AI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are DangerousAI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are Dangerous
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
 
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't ChangedAI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big Data
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?
 
Visualization for Security
Visualization for SecurityVisualization for Security
Visualization for Security
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?
 
DAVIX - Data Analysis and Visualization Linux
DAVIX - Data Analysis and Visualization LinuxDAVIX - Data Analysis and Visualization Linux
DAVIX - Data Analysis and Visualization Linux
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big Data
 

Recently uploaded

Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 

Recently uploaded (20)

Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 

Visualization Lifecycle Data Insight

  • 1. Visualization Lifecycle datainsight San Francisco 2011 Raffael Marty
  • 2. “Transform a dataset into a captive story.” ‣ Assess Youʼre on your own Art ‣ Parse ‣ Clean ‣ Visualize Visualization Tools and Libraries pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 3. Audience Expert Fun Technical Overview Boring Beginner pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 4. Visualization Process Contextual Data iterations Data Sources (Data Store) Structured Data Visual Representation visualization parsing feature selection files database filtering aggregation cleansing pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 5. Data Sources ‣ File XML, JSON, CSV, TSV ‣Database mysql -u root -p mydatabase < dump.sql ‣ API curl ‘http://freebase.com/api/service/ ‣Factual search?query=al+gore&indent=1’ ‣Freebase ‣Infochimps ‣OpenStreetMap pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 6. Explore Data ‣ What is the data about? ‣ What are the data features/columns? ‣ Is there a common structure in the data? ‣ What are the data types? Nov 7 09:14:46 fwbox kernel: DROPPED IN=eth0 OUT= MAC=00:0c:29:e3:45:bd:00:0c: 29:b5:5c:ee:08:00 SRC=10.1.222.31 DST=10.1.222.202 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=63849 DF PROTO=TCP SPT=58485 DPT=9111 WINDOW=5840 RES=0x00 SYN URGP=0 May 25 20:24:20 ram-laptop kernel: BLOCK any in: IN=eth1 OUT= MAC=00:13:02:ac:d8:ea:00:09:5b:3d:df:00:08:00 SRC=213.175.90.24 DST=192.168.0.15 LEN=576 TOS=0x00 PREC=0x00 TTL=115 ID=23513 PROTO=TCP SPT=9030 DPT=56772 WINDOW=65535 RES=0x00 ACK URGP=0 pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 7. Parsing and Normalization ‣ Parsing ‣ extraction of entities / features ‣ imposing structure Oct 13 20:00:43.874401 rule 193/0(match): block in on xl0: 212.251.89.126.3859 >: S 1818630320:1818630320(0) win 65535 <mss 1460,nop,nop,sackOK> (DF) ‣ often use regexes Oct 13 20:00:43 fwbox local4:warn|warning fw07 %PIX-4-106023: Deny tcp src internet: 212.251.89.126/3859 dst 212.254.110.98/135 by access- group "internet_access_in" ‣ Normalize Oct 13 20:00:43 fwbox kernel: DROPPED IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:0f:cc:81:40:94:08:00 SRC=212.251.89.126 DST=212.254.110.98 LEN=576 TOS=0x00 PREC=0x00 TTL=255 ID=8624 PROTO=TCP SPT=3859 DPT=135 LEN=556 ‣ field normalization ‣ term normalization: block, deny, dropped ‣ Generate a common output format for vis-tools (e.g., CSV) pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 8. Parser Oct 13 20:00:38.018152 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 62.2.32.250.53: 34388 [1au][|domain] (DF) Raw Oct 13 20:00:38.115862 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 192.134.0.49.53: 49962 [1au][|domain] (DF) Oct 13 20:00:38.157238 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 194.25.2.133.53: 14434 [1au][|domain] (DF) (.*) rule ([-d]+/d+)(.*?): (pass|block) (in|out) on (w+): (d+.d+.d+.d+).?(d*) [<>] Regex / Parser (d+.d+.d+.d+).?(d*): (.*) Oct 13 20:00:38.018152,57/0,match,pass,in,xl1,195.141.69.45,1030,62.2.32.250,53,34388 [1au][|domain] (DF) Normalized Oct 13 20:00:38.115862,57/0,match,pass,in,xl1,195.141.69.45,1030,192.134.0.49,53,49962 [1au][|domain] (DF) (CSV) Oct 13 20:00:38.157238,57/0,match,pass,in,xl1,195.141.69.45,1030,194.25.2.133,53,14434 [1au][|domain] (DF) pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 9. UNIX Tools ‣ grep ‣cat file | grep –v “foo” ‣ awk ‣awk –F, ‘{printf(“%s,%sn”,$2,$1);}’ ‣awk -F, -v OFS=, ‘{print $2,$1}’ ‣ sed ‣sed -e 's/fubar/foobar/g' filename pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 10. Regular Expression Resources ‣ http://regexlib.com ‣ http://www.regular-expressions.info ‣ http://gskinner.com/RegExr pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 11. Data Cleansing ‣ Filter ‣ Normalize (see earlier) ‣ Aggregation pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 12. Load CSV into Database # mysql -u <user> -p Sometimes you just load your data into a tool, and you can omit this mysql> create database data; step mysql> create table set1 (id int, address varchar(20), ...); mysql> LOAD DATA LOCAL INFILE 'input_file' INTO TABLE set1 FIELDS TERMINATED BY ',' LINES TERMINATED BY 'n'; pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 13. Contextual Data ‣ Either dump into DB or use via API calls to augment ‣ IP -> Geo mapping ‣ Information about countries ‣ Port number -> service name pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 14. Feature Selection ‣ What are the fields you are interested in? ‣ Compute new fields ‣start time, end time -> duration ‣IP subnets [ 10.2.4.2 -> 10.0.0.0/8 or 192.168.1.2 -> 192.168.1.0/24 ] ‣ Entropy: H ( X ) = E ( I ( X ) ) ‣ Dimensionality reduction ‣See Bryan’s talk! pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 15. Choose Your Poison pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 16. Ode to the Pie pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 17. A Good Visual ‣ Chose the right graph ‣ Simultaneous views ‣ Reduce non-data ink ‣ Interactivity pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 18. Visual Transformations ‣ keep iterating on visual transformations, change ‣color ‣shape ‣features display ‣ add new fields? ‣ add more context? ‣ is the output expressive? ‣ capture output and prettify it for presentation pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 20. Tools and Libraries ‣ http://datainsightsf.com/resources/ ‣Choose what’s appropriate! ‣ Data Analysis and Visualization LInuX ‣davix.secviz.org ‣ GraphViz ‣graphviz.org ‣ AfterGlow (CSV -> DOT) ‣afterglow.sf.net pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 21. Libraries ‣ Reporting Libraries ‣Visualization Libraries ‣HighCharts ‣TheJIT ‣Flot ‣Graphael ‣Google Chart API ‣Protovis ‣Open Flash Chart ‣ProcessingJS ‣JQuery Sparklines ‣Flare ‣Polymaps ‣D3 - pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 22. HighCharts ‣ Click-Through ‣ On load ‣near real-time updates ‣ Zoom www.highcharts.com pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 23. Google Visualization API http://code.google.com/apis/visualization/interactive_charts.html ‣ JavaScript ‣ Based on DataTables() ‣ Many graphs ‣ Playground ‣ http://code.google.com/apis/ajax/playground pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 24. ProtoVis ‣ JavaScript based visualization library ‣ Charting ‣ Treemaps ‣ BoxPlots ‣ Parallel Coordinates ‣ etc. http://vis.stanford.edu/protovis/ pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 25. TheJIT http://thejit.org/ ‣ JavaScript InfoVis Toolkit ‣ Interactive ‣ Link Graphs pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 26. Processing ‣ Visualization library ‣ Java based ‣ Interactive (event handling) ‣ Number of libraries to ‣ draw in OpenGL ‣ read XML files ‣ Processing JS ‣ JavaScript ‣ HTML 5 Canvas ‣ WebGL http://processingjs.org/ ‣ Web IDE http://processing.org/ pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 27. Visualization Tools ‣ Gephi ‣R ‣ Matlab ‣ Mondrian ‣ PicViz ‣ Treemap 4.1 ‣ Google Earth pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 28. Gephi http://gephi.org ‣ reads: CSV, DOT, etc. ‣ graph analysis algorithms ‣ highly interactive pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 29. PicViz http://www.wallinfire.net/picviz/ pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 30. Treemap 4.1 http://www.cs.umd.edu/hcil/treemap/ pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 31. Google Earth • KML data format for encoding data pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 32. pixlcloud buy now collect. visualize. understand. @raffaelmarty