SlideShare a Scribd company logo
1 of 49
Download to read offline
Data Visualization for
Big Data
Rosa Romero Gómez, Ph.D
rosaromerogomez.com
@87rromero
Experiences from the Front Line
[Georgia Tech campus, Klaus Advanced Computing building, May 27th 2016]
Data Visualization
Why? What? How?
Why?
The greatest value of a picture is
when it forces us to notice what we
never expected to see
[John W. Tukey. (1981) Exploratory Data Analysis]
The greatest value of a picture is
when it forces us to notice what we
never expected to see
[John W. Tukey. (1981) Exploratory Data Analysis]
Let me put you a
simple example…
[Sample data sets recreated from Francis J. Anscombe (1973). Graphs in statistical analysis.
Source: Andy Kirk. (2012) Data visualization: A successful design process]
[Source: http://commons. wikimedia.org/wiki/File:Anscombe%27s_quartet_3.svg]
Data visualization addresses…
…Information Scalability
…Visual Scalability
…Human Scalability
Data visualization addresses…
…Human Scalability
• It enhances the recognition of patterns
• It increases our efficiency to explore large datasets
• It supports decisions
• It expands our working memory to solve problems
What?
Data visualization is the use of interactive
visual representations of data to amplify
cognition
[Stuart Card. (2008) Information visualization]
Data visualization is the use of interactive
visual representations of data to amplify
cognition
[Stuart Card. (2008) Information visualization]
Data visualization is the use of interactive
visual representations of data to amplify
cognition
[Stuart Card. (2008) Information visualization]
Data visualization is the use of interactive
visual representations of data to amplify
cognition
[Stuart Card. (2008) Information visualization]
Data visualization is the use of interactive
visual representations of data to amplify
cognition
[Stuart Card. (2008) Information visualization]
Data visualization
is not…
[The Starry Night. (1889) Vincent Van Gogh. Source: https://en.wikipedia.org/wiki/The_Starry_Night#/
media/File:Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg ]
[Source: http://elpais.com/elpais/2016/10/28/media/1477669343_348572.html]
[Source: http://viz.wtf/]
How?
Why are we doing this
visualization project?
Even more important…
Case Study:
Visualization of the IPv4 address space
for network threat investigation
Network Threat
Analyst
Computer Network Data Collection Point
Get to know the context…
User
CMD Tools
Websites
Logs
Physical &
Task
Context
Technical Context
Let me tell you a story…
Step 1: Identify relevant visualization tasks
•Find suspicious IPs blocks
•Find domain names associated with specific IPs
•Examine the presence of domain names on blacklists
•Examine the relation of domain names with malware
•Identify the geographical location of IPs
•Identify the ownership of domain names
•Find suspicious Autonomous Systems
The more accessible your visualization,
the greater your audience and your impact
[Scott Murray. (2013) Interactive Data Visualization for the Web]
Step 2: Choose a library
Step 2: Choose a library
•Functionality: Does it support the visualizations I
need?
•License: open source or commercial?
•Active support and development
•Browser compatibility
•Dependencies (e.g. React.js)
Step 2: Choose a library
Building a
visualization
with charting
libraries such
as Chart.js,
Tableau…
Step 2: Choose a library
Building a
visualization
with D3.js
•D3 is not really a “visualization library”; it does not
draw visualizations
•D3 = “Data-Driven Documents”; it associates data with
DOM elements and manages the results
•D3.js provides with tools such as layout, scales,
shapes that you can use to build visualizations
Step 2: Choose a library
Step 3: Data transformations
{"date":"20160408","qname":"*.3rdandmonster.com.","qtype":1,"rdata":
{"string":"66.96.161.142"},"ttl":null,"authority_ips":"216.239.36.109","count":1,"hours":
1048576,"source":"gt","sensor":"active-dns"}

{"date":"20160408","qname":"*.aavxxnbm.org.","qtype":1,"rdata":{"string":"213.184.126.162"},"ttl":{"int":
604800},"authority_ips":"213.184.126.162","count":10,"hours":5543209,"source":"gt","sensor":"active-dns"}

{"date":"20160408","qname":"*.aenhfat.info.","qtype":1,"rdata":{"string":"213.184.126.162"},"ttl":{"int":
604800},"authority_ips":"213.184.126.162","count":4,"hours":8397064,"source":"gt","sensor":"active-dns"}

{"date":"20160408","qname":"*.agzksjhrmf.info.","qtype":1,"rdata":{"string":"213.184.126.162"},"ttl":{"int":
604800},"authority_ips":"213.184.126.162","count":5,"hours":4329736,"source":"gt","sensor":"active-dns"}
[Fragment of Active DNS resolution queries in deserialized Avro format - JSON format,
https://www.activednsproject.org]
Pre-processed data Domain Name
IP address
Step 3: Data transformations
Guided by the Visual Information-Seeking Mantra:
“Overview first,
Zoom and Filter, and then
Details-on-Demand”
[Shneiderman. (1996) The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations]
Step 3: Data transformations
{
"date": "dateValue",
"children": [{
"name": “/8Name",
"size": “numberOfIPs/8",
"color": “numberOfBlacklistedDomainNames/8",
"children":
[
{
"name": "/16Name",
"size": "numberOfIPs/16",
"color": "numberOfBlacklistedDomainNamesper/16",
"children": [
….
]
}
….
]
}
Nested JSON
format template
(JSON file per day)
Nested IPs in the following format:
/8 >> /16 >> /24 >> /32
Visual variables
Step 3: Data transformations
{
"date": "dateValue",
"children": [{
"name": “Continent",
"size": “numberOfIPsContinent",
"color": “numberOfBlacklistedDomainNamesperContinent",
"children":
[
{
"name": "Country",
"size": "numberOfIPscOuntry",
"color": "numberOfBlacklistedDomainNamesperCountry",
"children": [
….
]
}
….
]
}
Nested JSON
format template
(JSON file per day)
Alternative nesting options:
Continent >> Country >> State >> City
Step 3: Data transformations
> JSON files of 70
Mb
Nested JSON
format template
(JSON file per day)
Triple hierarchy!!!
Step 3: Data transformations
Split into
IPhierarchy.json
GeographicalHierarchy.json
AS.json
Nested JSON
format template
(JSON file per day)
Step 4: Data binding
Step 4: Data binding
Step 5: User Experience Breadcrumbs User-adjustable
visual settings
[Astrolavos Team during S&P 2017 deadline, November 11th 2016.
Source: https://twitter.com/mAntonakakis?lang=es]
Data Visualization for
Big Data
Rosa Romero Gómez, Ph.D
rosaromerogomez.com
@87rromero
Experiences from the Front Line

More Related Content

What's hot

Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...DataWorks Summit
 
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption StrategiesJoshua R Nicholson
 
A Picture is Worth 1,000 Rows
A Picture is Worth 1,000 RowsA Picture is Worth 1,000 Rows
A Picture is Worth 1,000 RowsNeo4j
 
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Markus Harrer
 
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELKThreat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELKElasticsearch
 
What Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryWhat Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryNeo4j
 
Graphs in Life Sciences
Graphs in Life SciencesGraphs in Life Sciences
Graphs in Life SciencesNeo4j
 
GraphTour London 2020 - Graphs for AI, Amy Hodler
GraphTour London 2020  - Graphs for AI, Amy HodlerGraphTour London 2020  - Graphs for AI, Amy Hodler
GraphTour London 2020 - Graphs for AI, Amy HodlerNeo4j
 
Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science Neo4j
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Gabriel Moreira
 
Big data and computing grid
Big data and computing gridBig data and computing grid
Big data and computing gridThang Nguyen
 
Data Analytics in Real World (May 2016)
Data Analytics in Real World (May 2016)Data Analytics in Real World (May 2016)
Data Analytics in Real World (May 2016)geetachauhan
 
Graph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisGraph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisNeo4j
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonMicrosoft Azure for Research
 
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingData-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingAlex Pinto
 
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Alex Pinto
 

What's hot (20)

Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
 
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
 
A Picture is Worth 1,000 Rows
A Picture is Worth 1,000 RowsA Picture is Worth 1,000 Rows
A Picture is Worth 1,000 Rows
 
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
 
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELKThreat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
 
What Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryWhat Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS Library
 
Graphs in Life Sciences
Graphs in Life SciencesGraphs in Life Sciences
Graphs in Life Sciences
 
GraphTour London 2020 - Graphs for AI, Amy Hodler
GraphTour London 2020  - Graphs for AI, Amy HodlerGraphTour London 2020  - Graphs for AI, Amy Hodler
GraphTour London 2020 - Graphs for AI, Amy Hodler
 
Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
Big data and computing grid
Big data and computing gridBig data and computing grid
Big data and computing grid
 
Data Analytics in Real World (May 2016)
Data Analytics in Real World (May 2016)Data Analytics in Real World (May 2016)
Data Analytics in Real World (May 2016)
 
Graph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisGraph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysis
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 
Security Chat 5.0
Security Chat 5.0Security Chat 5.0
Security Chat 5.0
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis Gannon
 
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingData-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
 
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
 

Similar to Data Visualization for Big Data: Experience from the Front Line

Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...Craig Knoblock
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Srinath Perera
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalstelligence
 
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data TutorialESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorialeswcsummerschool
 
New Forms of Data for e-Research
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-ResearchDavid De Roure
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
 
An Overview of the Emerging Graph Landscape (Oct 2013)
An Overview of the Emerging Graph Landscape (Oct 2013)An Overview of the Emerging Graph Landscape (Oct 2013)
An Overview of the Emerging Graph Landscape (Oct 2013)Emil Eifrem
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSara-Jayne Terp
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltoolssuresh sood
 
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...Roman Atachiants
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)Zenodia Charpy
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-sharestelligence
 

Similar to Data Visualization for Big Data: Experience from the Front Line (20)

Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
 
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data TutorialESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
 
New Forms of Data for e-Research
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-Research
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
Why Data Science is a Science
Why Data Science is a ScienceWhy Data Science is a Science
Why Data Science is a Science
 
An Overview of the Emerging Graph Landscape (Oct 2013)
An Overview of the Emerging Graph Landscape (Oct 2013)An Overview of the Emerging Graph Landscape (Oct 2013)
An Overview of the Emerging Graph Landscape (Oct 2013)
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
 

Recently uploaded

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Data Visualization for Big Data: Experience from the Front Line

  • 1. Data Visualization for Big Data Rosa Romero Gómez, Ph.D rosaromerogomez.com @87rromero Experiences from the Front Line
  • 2. [Georgia Tech campus, Klaus Advanced Computing building, May 27th 2016]
  • 5. The greatest value of a picture is when it forces us to notice what we never expected to see [John W. Tukey. (1981) Exploratory Data Analysis]
  • 6. The greatest value of a picture is when it forces us to notice what we never expected to see [John W. Tukey. (1981) Exploratory Data Analysis]
  • 7. Let me put you a simple example…
  • 8. [Sample data sets recreated from Francis J. Anscombe (1973). Graphs in statistical analysis. Source: Andy Kirk. (2012) Data visualization: A successful design process]
  • 10. Data visualization addresses… …Information Scalability …Visual Scalability …Human Scalability
  • 11. Data visualization addresses… …Human Scalability • It enhances the recognition of patterns • It increases our efficiency to explore large datasets • It supports decisions • It expands our working memory to solve problems
  • 12. What?
  • 13. Data visualization is the use of interactive visual representations of data to amplify cognition [Stuart Card. (2008) Information visualization]
  • 14. Data visualization is the use of interactive visual representations of data to amplify cognition [Stuart Card. (2008) Information visualization]
  • 15. Data visualization is the use of interactive visual representations of data to amplify cognition [Stuart Card. (2008) Information visualization]
  • 16. Data visualization is the use of interactive visual representations of data to amplify cognition [Stuart Card. (2008) Information visualization]
  • 17. Data visualization is the use of interactive visual representations of data to amplify cognition [Stuart Card. (2008) Information visualization]
  • 19. [The Starry Night. (1889) Vincent Van Gogh. Source: https://en.wikipedia.org/wiki/The_Starry_Night#/ media/File:Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg ]
  • 22. How?
  • 23. Why are we doing this visualization project? Even more important…
  • 24. Case Study: Visualization of the IPv4 address space for network threat investigation
  • 25. Network Threat Analyst Computer Network Data Collection Point Get to know the context… User CMD Tools Websites Logs Physical & Task Context Technical Context
  • 26. Let me tell you a story…
  • 27. Step 1: Identify relevant visualization tasks •Find suspicious IPs blocks •Find domain names associated with specific IPs •Examine the presence of domain names on blacklists •Examine the relation of domain names with malware •Identify the geographical location of IPs •Identify the ownership of domain names •Find suspicious Autonomous Systems
  • 28.
  • 29. The more accessible your visualization, the greater your audience and your impact [Scott Murray. (2013) Interactive Data Visualization for the Web] Step 2: Choose a library
  • 30. Step 2: Choose a library •Functionality: Does it support the visualizations I need? •License: open source or commercial? •Active support and development •Browser compatibility •Dependencies (e.g. React.js)
  • 31. Step 2: Choose a library Building a visualization with charting libraries such as Chart.js, Tableau…
  • 32. Step 2: Choose a library Building a visualization with D3.js
  • 33. •D3 is not really a “visualization library”; it does not draw visualizations •D3 = “Data-Driven Documents”; it associates data with DOM elements and manages the results •D3.js provides with tools such as layout, scales, shapes that you can use to build visualizations Step 2: Choose a library
  • 34.
  • 35. Step 3: Data transformations {"date":"20160408","qname":"*.3rdandmonster.com.","qtype":1,"rdata": {"string":"66.96.161.142"},"ttl":null,"authority_ips":"216.239.36.109","count":1,"hours": 1048576,"source":"gt","sensor":"active-dns"} {"date":"20160408","qname":"*.aavxxnbm.org.","qtype":1,"rdata":{"string":"213.184.126.162"},"ttl":{"int": 604800},"authority_ips":"213.184.126.162","count":10,"hours":5543209,"source":"gt","sensor":"active-dns"} {"date":"20160408","qname":"*.aenhfat.info.","qtype":1,"rdata":{"string":"213.184.126.162"},"ttl":{"int": 604800},"authority_ips":"213.184.126.162","count":4,"hours":8397064,"source":"gt","sensor":"active-dns"} {"date":"20160408","qname":"*.agzksjhrmf.info.","qtype":1,"rdata":{"string":"213.184.126.162"},"ttl":{"int": 604800},"authority_ips":"213.184.126.162","count":5,"hours":4329736,"source":"gt","sensor":"active-dns"} [Fragment of Active DNS resolution queries in deserialized Avro format - JSON format, https://www.activednsproject.org] Pre-processed data Domain Name IP address
  • 36. Step 3: Data transformations Guided by the Visual Information-Seeking Mantra: “Overview first, Zoom and Filter, and then Details-on-Demand” [Shneiderman. (1996) The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations]
  • 37. Step 3: Data transformations { "date": "dateValue", "children": [{ "name": “/8Name", "size": “numberOfIPs/8", "color": “numberOfBlacklistedDomainNames/8", "children": [ { "name": "/16Name", "size": "numberOfIPs/16", "color": "numberOfBlacklistedDomainNamesper/16", "children": [ …. ] } …. ] } Nested JSON format template (JSON file per day) Nested IPs in the following format: /8 >> /16 >> /24 >> /32 Visual variables
  • 38. Step 3: Data transformations { "date": "dateValue", "children": [{ "name": “Continent", "size": “numberOfIPsContinent", "color": “numberOfBlacklistedDomainNamesperContinent", "children": [ { "name": "Country", "size": "numberOfIPscOuntry", "color": "numberOfBlacklistedDomainNamesperCountry", "children": [ …. ] } …. ] } Nested JSON format template (JSON file per day) Alternative nesting options: Continent >> Country >> State >> City
  • 39. Step 3: Data transformations > JSON files of 70 Mb Nested JSON format template (JSON file per day) Triple hierarchy!!!
  • 40. Step 3: Data transformations Split into IPhierarchy.json GeographicalHierarchy.json AS.json Nested JSON format template (JSON file per day)
  • 41.
  • 42. Step 4: Data binding
  • 43. Step 4: Data binding
  • 44.
  • 45. Step 5: User Experience Breadcrumbs User-adjustable visual settings
  • 46.
  • 47.
  • 48. [Astrolavos Team during S&P 2017 deadline, November 11th 2016. Source: https://twitter.com/mAntonakakis?lang=es]
  • 49. Data Visualization for Big Data Rosa Romero Gómez, Ph.D rosaromerogomez.com @87rromero Experiences from the Front Line