SlideShare a Scribd company logo
CHOOSING A DATA
VISUALIZATION TOOL
FOR DATA SCIENTISTS
H E A T H E R G I L L E Y
D E C E M B E R 2 0 1 5
1
INTRODUCTION
• Newly established business intelligence (BI) office needs a software stack to support
their data scientists
• The software stack needs a platform for data storage, data transformation, and data
visualization
• Current role supports the data visualization effort
2
Defining the Model
• Identified the strategic objective and goals for the software
stack project
• Elicited the decision maker to define specific objectives for the
data visualization tool in the stack
• Conducted an affinity diagram exercise to identify the functional
objectives and map the measures to those objectives
• Collaborated with the teammates to apply the swing weight
method
• Built and applied model to the 6 alternatives
3
Key Product Features
Explanatory
Interactive
Exploratory
Static
Infographic
ReportDashboard
Interactive
Chart
4
Data Scientists Profiles
Domain Data Scientist
Features:
1.Knowledgeable of the subject
matter and is able to add context to
the analysis for insightful findings
2.General analysis (regression,
correlation, frequency distributions)
3.Uses built-in tools for analysis
Mathematical & Statistician Data
Scientist
Features:
1.Knowledgeable about complex statistical
modeling and analysis (ex. customer opinion
modeling, classification, text analysis, natural
language processing, etc.)
2.Builds, tests, and analyzes models utilizing
statistical programming languages such as,
python and R
3.Uses built-in tools and statistical programming
language libraries to build visualizations
Developer Data Scientist
Features:
1.Knowledgeable in programming,
computer science, and databases
2.Creates connections between the data
and the tools
3.Transforms data to enable profiles 1
and 2 to perform analysis and
communicate results
4.Creates highly customized interactive
solutions
5
Alternatives
D3.js A JavaScript library that enables developers to create complex,
custom data visualizations on the web
RShiny A R library and server that enables R data visualizations to be
interactive and available via a HTML framework
Bokeh A data visualization for python that creates charts from D3
visuals and the python data
Plot.ly A web application that automatically creates visualizations from
a variety of files types and programming languages
Tableau A data visualization tool that offers an easy-to-use user interface
to create complex graphics and charts
Kibana An open source data visualization and dashboarding tool that
connects to the NoSQL database, elastic search
6
Functional Objectives and Measures
Measures:
1. Analytical Capability
2. Charting Capability
3. Programming Capability
4. Design Capability
5. Number of Supported Programming Languages
6. GUI
7. Interactive Product Capability
8. Number of Supported File Types
9. Data Connectors
10. Access Control
11. Cost
12. Data Size
Functional Objectives:
1. Be flexible enough to accommodate
different product types
2. Enable statistical analysis and discovery
3. Enables highly customized solutions
4. High Usability
5. Scales with Big Data Projects
7
Mapping Objectives and Measures to Data Scientist Profiles
Domain Data Scientist
Features:
1. Knowledgeable of the subject matter and is able to
add context to the analysis for insightful findings
2. General analysis (regression, correlation, frequency
distributions)
3. Uses built-in tools for analysis
Mathematical & Statistician Data
Scientist
Features:
1. Knowledgeable about complex statistical modeling
and analysis (ex. customer opinion modeling,
classification, text analysis, natural language
processing, etc.)
2. Builds, tests, and analyzes models utilizing statistical
programming languages such as, python and R
3. Uses built-in tools and statistical programming
language libraries to build visualizations
Developer Data Scientist
Features:
1. Knowledgeable in programming, computer science,
and databases
2. Creates connections between the data and the tools
3. Transforms data to enable profiles 1 and 2 to perform
analysis and communicate results
4. Creates highly customized interactive solutions
Data Scientist Profiles
Functional Objective: Be flexible
enough to accommodate different
product types
Functional Objective: Enable
statistical analysis and discovery
Functional Objective: Enables
highly customized solutions
Functional Objective: High
Usability
Functional Objective: Scales
with big data projects
Functional Objectives
Measure: Analytical Capability
Measure: Charting Capability
Measure: Number of supported
programming languages
Measure: Graphical User Interface (GUI)
Measure: Design Capability
Measure: Programming Capability
Measure: Interactive Product Capability
Measure: Number of supported file types
Measure: Access Control
Measure: Data Connectors
Measures
8
Strategic Objective: Choose a data visualization
tool or tools that best enables data scientists to
manipulate, analyze, and interpret data
Functional Objective: Enable
statistical analysis and
discovery
Measure: Analytical Capability
Measure: Charting Capability
Functional Objective:
High Usability
Measure: Graphical
User Interface (GUI)
Measure: Number of
supported file types
Functional Objective:
Enables highly customized
solutions
Measure: Design Capability
Measure: Programming
Capability
Measure: Number of supported
programming languages
Functional Objective: Be flexible
enough to accommodate
different product types
Measure: Interactive Product
Capability
Functional Objective:
Scales with big data
projects
Measure: Access Control
Measure: Data
Connectors
Measure: Annual Total
Cost per User
Measure: Data Size
Decision Model Structure
9
• Alternative information identified through testing and
research
• Determining Weights
–Applied the Swing Weight Method
–Identified the worst and best alternatives
–Elicited the project team to rank the measures
10
Building the Model
ANALYTICAL RESULTS
11
• Two pairs of alternatives
scored similarly
• Tradeoffs associated with
each tool
• Model could be refined to
discern those differences
PLOT.LY VS. TABLEAU
• Both capable of creating
interactive plots
• Tableau scales to Big Data
• Plot.ly supports multiple
programming languages
in a collaborative
environment
12
RSHINY VS. BOKEH
• RShiny is supported by over
500 R statistical
programming libraries
• Bokeh is supported by
approximately 80 python
statistical programming
libraries
• Bokeh offers more control
over the design elements
• RShiny requires a CSS file to
alter the design elements
13
DOMAIN DATA SCIENTIST
• Tableau and Plot.ly scored the
highest across all data scientists
• Both offer intuitive UIs with the
ability to quickly create highly
interactive data visualization
products
• Plot.ly chosen as the best
option for Domain Data
Scientists based on the tool’s
collaborative ability
14
MATHEMATICAL & STATISTICIAN
DATA SCIENTIST
• Data size capacity is a key
feature for Mathematical
Data Scientists
• Kibana scored highly
because of it’s ability to
handle Big Data sets
• Plot.ly’s inability to scale to
large datasets prevent it
from being the number one
choice
15
DEVELOPER DATA SCIENTIST
• Tableau offers the ability to
connect to over 40
streaming data sources
• Scalability is an important
functional objective for
Developer Data Scientists
16
SENSITIVITY ANALYSIS
• Interactive Product Capability is
the most influential variable
• Data size is the least influential
variable in the model
17
0
{0}
0.495839701
{5}
0.515707251
{5}
0.509084734
{5}
0.55544235
{5}
0.495839701
{0}
0.5753099
{5}
0.569536424
{40}
0.595177449
{4}
0.495839701
{5}
0.495839701
{5}
0.621667516
{5}
0.528952284
{5}
0
{0}
0.469349635
{1}
0.475972151
{1}
0.456104602
{1}
0.495839701
{1}
0.422992019
{1999}
0.495839701
{1}
0.476821192
{1}
0.495839701
{1}
0.389879436
{1}
0.38325692
{1}
0.495839701
{1}
0.396501953
{1}
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Score
Data Size (DS)
Number of Supported File Types (FT)
Design Capability (DE)
GUI (G)
Cost (C)
Access Control (AC)
Data Connectors (DC)
Number of Supported Programming Languages (PL)
Programming Capability (PC)
Charting Capability (CH)
Analytical Capability (AN)
Interactive Product Capability (IP)
SUPPORT TO DECISION MAKING
• Tableau was chosen by the Overall Objective,
Mathematical Data Scientist, and the Developer Data
Scientist
• Plot.ly was chosen by the Domain Data Scientist
• Additional resources have been identified to alter and
refine the model
18

More Related Content

What's hot

Informatica data quality online training
Informatica data quality online trainingInformatica data quality online training
Informatica data quality online trainingDivya Shree
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics
Nandita Nityanandam
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
anicewick
 
Informatica data quality[IDQ] 9
Informatica data quality[IDQ] 9Informatica data quality[IDQ] 9
Informatica data quality[IDQ] 9RISLGLOBAL
 
Data profiling-best-practices
Data profiling-best-practicesData profiling-best-practices
Data profiling-best-practices
Blaise Cheuteu
 
Agile collaborative practices
Agile collaborative practicesAgile collaborative practices
Agile collaborative practices
Sreejith Madhavan
 
Data visualization 2
Data visualization 2Data visualization 2
Data visualization 2
ManokamnaKochar1
 
Elements of Data Documentation
Elements of Data DocumentationElements of Data Documentation
Elements of Data Documentation
ssri-duke
 
IT7113 research project_group_4
IT7113 research project_group_4IT7113 research project_group_4
IT7113 research project_group_4
ethanlchandler
 
MS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining toolsMS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining tools
DataminingTools Inc
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
astronish
 
B040101007012
B040101007012B040101007012
B040101007012
ijceronline
 
Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3
Haoran Du
 
Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...
Merlien Institute
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case study
Nandita Nityanandam
 
SAP BO Web Intelligence Basics
SAP BO Web Intelligence BasicsSAP BO Web Intelligence Basics
SAP BO Web Intelligence Basics
Kiran Joy
 
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Simplilearn
 

What's hot (18)

Informatica data quality online training
Informatica data quality online trainingInformatica data quality online training
Informatica data quality online training
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
 
Informatica data quality[IDQ] 9
Informatica data quality[IDQ] 9Informatica data quality[IDQ] 9
Informatica data quality[IDQ] 9
 
Data profiling-best-practices
Data profiling-best-practicesData profiling-best-practices
Data profiling-best-practices
 
Agile collaborative practices
Agile collaborative practicesAgile collaborative practices
Agile collaborative practices
 
Data visualization 2
Data visualization 2Data visualization 2
Data visualization 2
 
Elements of Data Documentation
Elements of Data DocumentationElements of Data Documentation
Elements of Data Documentation
 
IT7113 research project_group_4
IT7113 research project_group_4IT7113 research project_group_4
IT7113 research project_group_4
 
MS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining toolsMS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining tools
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 
B040101007012
B040101007012B040101007012
B040101007012
 
Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3
 
Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...
 
IntelligentEnterprise
IntelligentEnterpriseIntelligentEnterprise
IntelligentEnterprise
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case study
 
SAP BO Web Intelligence Basics
SAP BO Web Intelligence BasicsSAP BO Web Intelligence Basics
SAP BO Web Intelligence Basics
 
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
 

Similar to Choosing a Data Visualization Tool for Data Scientists_Final

Overview of tools for data analysis and visualisation (2021)
Overview of tools for data analysis and visualisation (2021)Overview of tools for data analysis and visualisation (2021)
Overview of tools for data analysis and visualisation (2021)
Marié Roux
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Andy Lathrop
 
Overview data analyis and visualisation tools 2020
Overview data analyis and visualisation tools 2020Overview data analyis and visualisation tools 2020
Overview data analyis and visualisation tools 2020
Marié Roux
 
Feb.2016 Demystifying Digital Humanities - Workshop 3
Feb.2016 Demystifying Digital Humanities - Workshop 3Feb.2016 Demystifying Digital Humanities - Workshop 3
Feb.2016 Demystifying Digital Humanities - Workshop 3
Paige Morgan
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Tomasz Bednarz
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
Uwe Printz
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 
Integer8 - Visual Integration on Hadoop
Integer8 - Visual Integration on HadoopInteger8 - Visual Integration on Hadoop
Integer8 - Visual Integration on Hadoop
Ahmet Yavuz Barutcu
 
Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021
Mobcoder
 
Analytical tools
Analytical toolsAnalytical tools
Analytical tools
Aniket Joshi
 
Toolboxes for data scientists
Toolboxes for data scientistsToolboxes for data scientists
Toolboxes for data scientists
Sudipto Krishna Dutta
 
Future.ready().watson dataplatform 01
Future.ready().watson dataplatform 01Future.ready().watson dataplatform 01
Future.ready().watson dataplatform 01
Redazione InnovaPuglia
 
Data Science Tools and Technologies: A Comprehensive Overview
Data Science Tools and Technologies: A Comprehensive OverviewData Science Tools and Technologies: A Comprehensive Overview
Data Science Tools and Technologies: A Comprehensive Overview
saniakhan8105
 
Extending Power BI Functionality with R
Extending Power BI Functionality with RExtending Power BI Functionality with R
Extending Power BI Functionality with R
Senturus
 
Big Data projects.pdf
Big Data projects.pdfBig Data projects.pdf
Big Data projects.pdf
ssuserf0a206
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Debraj GuhaThakurta
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo
 

Similar to Choosing a Data Visualization Tool for Data Scientists_Final (20)

Overview of tools for data analysis and visualisation (2021)
Overview of tools for data analysis and visualisation (2021)Overview of tools for data analysis and visualisation (2021)
Overview of tools for data analysis and visualisation (2021)
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
 
Overview data analyis and visualisation tools 2020
Overview data analyis and visualisation tools 2020Overview data analyis and visualisation tools 2020
Overview data analyis and visualisation tools 2020
 
Feb.2016 Demystifying Digital Humanities - Workshop 3
Feb.2016 Demystifying Digital Humanities - Workshop 3Feb.2016 Demystifying Digital Humanities - Workshop 3
Feb.2016 Demystifying Digital Humanities - Workshop 3
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Integer8 - Visual Integration on Hadoop
Integer8 - Visual Integration on HadoopInteger8 - Visual Integration on Hadoop
Integer8 - Visual Integration on Hadoop
 
Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021
 
Analytical tools
Analytical toolsAnalytical tools
Analytical tools
 
Toolboxes for data scientists
Toolboxes for data scientistsToolboxes for data scientists
Toolboxes for data scientists
 
Future.ready().watson dataplatform 01
Future.ready().watson dataplatform 01Future.ready().watson dataplatform 01
Future.ready().watson dataplatform 01
 
Data Science Tools and Technologies: A Comprehensive Overview
Data Science Tools and Technologies: A Comprehensive OverviewData Science Tools and Technologies: A Comprehensive Overview
Data Science Tools and Technologies: A Comprehensive Overview
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Extending Power BI Functionality with R
Extending Power BI Functionality with RExtending Power BI Functionality with R
Extending Power BI Functionality with R
 
Resume_VipinKP
Resume_VipinKPResume_VipinKP
Resume_VipinKP
 
Big Data projects.pdf
Big Data projects.pdfBig Data projects.pdf
Big Data projects.pdf
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
Supreet Resume
Supreet ResumeSupreet Resume
Supreet Resume
 

Choosing a Data Visualization Tool for Data Scientists_Final

  • 1. CHOOSING A DATA VISUALIZATION TOOL FOR DATA SCIENTISTS H E A T H E R G I L L E Y D E C E M B E R 2 0 1 5 1
  • 2. INTRODUCTION • Newly established business intelligence (BI) office needs a software stack to support their data scientists • The software stack needs a platform for data storage, data transformation, and data visualization • Current role supports the data visualization effort 2
  • 3. Defining the Model • Identified the strategic objective and goals for the software stack project • Elicited the decision maker to define specific objectives for the data visualization tool in the stack • Conducted an affinity diagram exercise to identify the functional objectives and map the measures to those objectives • Collaborated with the teammates to apply the swing weight method • Built and applied model to the 6 alternatives 3
  • 5. Data Scientists Profiles Domain Data Scientist Features: 1.Knowledgeable of the subject matter and is able to add context to the analysis for insightful findings 2.General analysis (regression, correlation, frequency distributions) 3.Uses built-in tools for analysis Mathematical & Statistician Data Scientist Features: 1.Knowledgeable about complex statistical modeling and analysis (ex. customer opinion modeling, classification, text analysis, natural language processing, etc.) 2.Builds, tests, and analyzes models utilizing statistical programming languages such as, python and R 3.Uses built-in tools and statistical programming language libraries to build visualizations Developer Data Scientist Features: 1.Knowledgeable in programming, computer science, and databases 2.Creates connections between the data and the tools 3.Transforms data to enable profiles 1 and 2 to perform analysis and communicate results 4.Creates highly customized interactive solutions 5
  • 6. Alternatives D3.js A JavaScript library that enables developers to create complex, custom data visualizations on the web RShiny A R library and server that enables R data visualizations to be interactive and available via a HTML framework Bokeh A data visualization for python that creates charts from D3 visuals and the python data Plot.ly A web application that automatically creates visualizations from a variety of files types and programming languages Tableau A data visualization tool that offers an easy-to-use user interface to create complex graphics and charts Kibana An open source data visualization and dashboarding tool that connects to the NoSQL database, elastic search 6
  • 7. Functional Objectives and Measures Measures: 1. Analytical Capability 2. Charting Capability 3. Programming Capability 4. Design Capability 5. Number of Supported Programming Languages 6. GUI 7. Interactive Product Capability 8. Number of Supported File Types 9. Data Connectors 10. Access Control 11. Cost 12. Data Size Functional Objectives: 1. Be flexible enough to accommodate different product types 2. Enable statistical analysis and discovery 3. Enables highly customized solutions 4. High Usability 5. Scales with Big Data Projects 7
  • 8. Mapping Objectives and Measures to Data Scientist Profiles Domain Data Scientist Features: 1. Knowledgeable of the subject matter and is able to add context to the analysis for insightful findings 2. General analysis (regression, correlation, frequency distributions) 3. Uses built-in tools for analysis Mathematical & Statistician Data Scientist Features: 1. Knowledgeable about complex statistical modeling and analysis (ex. customer opinion modeling, classification, text analysis, natural language processing, etc.) 2. Builds, tests, and analyzes models utilizing statistical programming languages such as, python and R 3. Uses built-in tools and statistical programming language libraries to build visualizations Developer Data Scientist Features: 1. Knowledgeable in programming, computer science, and databases 2. Creates connections between the data and the tools 3. Transforms data to enable profiles 1 and 2 to perform analysis and communicate results 4. Creates highly customized interactive solutions Data Scientist Profiles Functional Objective: Be flexible enough to accommodate different product types Functional Objective: Enable statistical analysis and discovery Functional Objective: Enables highly customized solutions Functional Objective: High Usability Functional Objective: Scales with big data projects Functional Objectives Measure: Analytical Capability Measure: Charting Capability Measure: Number of supported programming languages Measure: Graphical User Interface (GUI) Measure: Design Capability Measure: Programming Capability Measure: Interactive Product Capability Measure: Number of supported file types Measure: Access Control Measure: Data Connectors Measures 8
  • 9. Strategic Objective: Choose a data visualization tool or tools that best enables data scientists to manipulate, analyze, and interpret data Functional Objective: Enable statistical analysis and discovery Measure: Analytical Capability Measure: Charting Capability Functional Objective: High Usability Measure: Graphical User Interface (GUI) Measure: Number of supported file types Functional Objective: Enables highly customized solutions Measure: Design Capability Measure: Programming Capability Measure: Number of supported programming languages Functional Objective: Be flexible enough to accommodate different product types Measure: Interactive Product Capability Functional Objective: Scales with big data projects Measure: Access Control Measure: Data Connectors Measure: Annual Total Cost per User Measure: Data Size Decision Model Structure 9
  • 10. • Alternative information identified through testing and research • Determining Weights –Applied the Swing Weight Method –Identified the worst and best alternatives –Elicited the project team to rank the measures 10 Building the Model
  • 11. ANALYTICAL RESULTS 11 • Two pairs of alternatives scored similarly • Tradeoffs associated with each tool • Model could be refined to discern those differences
  • 12. PLOT.LY VS. TABLEAU • Both capable of creating interactive plots • Tableau scales to Big Data • Plot.ly supports multiple programming languages in a collaborative environment 12
  • 13. RSHINY VS. BOKEH • RShiny is supported by over 500 R statistical programming libraries • Bokeh is supported by approximately 80 python statistical programming libraries • Bokeh offers more control over the design elements • RShiny requires a CSS file to alter the design elements 13
  • 14. DOMAIN DATA SCIENTIST • Tableau and Plot.ly scored the highest across all data scientists • Both offer intuitive UIs with the ability to quickly create highly interactive data visualization products • Plot.ly chosen as the best option for Domain Data Scientists based on the tool’s collaborative ability 14
  • 15. MATHEMATICAL & STATISTICIAN DATA SCIENTIST • Data size capacity is a key feature for Mathematical Data Scientists • Kibana scored highly because of it’s ability to handle Big Data sets • Plot.ly’s inability to scale to large datasets prevent it from being the number one choice 15
  • 16. DEVELOPER DATA SCIENTIST • Tableau offers the ability to connect to over 40 streaming data sources • Scalability is an important functional objective for Developer Data Scientists 16
  • 17. SENSITIVITY ANALYSIS • Interactive Product Capability is the most influential variable • Data size is the least influential variable in the model 17 0 {0} 0.495839701 {5} 0.515707251 {5} 0.509084734 {5} 0.55544235 {5} 0.495839701 {0} 0.5753099 {5} 0.569536424 {40} 0.595177449 {4} 0.495839701 {5} 0.495839701 {5} 0.621667516 {5} 0.528952284 {5} 0 {0} 0.469349635 {1} 0.475972151 {1} 0.456104602 {1} 0.495839701 {1} 0.422992019 {1999} 0.495839701 {1} 0.476821192 {1} 0.495839701 {1} 0.389879436 {1} 0.38325692 {1} 0.495839701 {1} 0.396501953 {1} 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Score Data Size (DS) Number of Supported File Types (FT) Design Capability (DE) GUI (G) Cost (C) Access Control (AC) Data Connectors (DC) Number of Supported Programming Languages (PL) Programming Capability (PC) Charting Capability (CH) Analytical Capability (AN) Interactive Product Capability (IP)
  • 18. SUPPORT TO DECISION MAKING • Tableau was chosen by the Overall Objective, Mathematical Data Scientist, and the Developer Data Scientist • Plot.ly was chosen by the Domain Data Scientist • Additional resources have been identified to alter and refine the model 18

Editor's Notes

  1. Trying to find their identity Includes software like S3 Amazon Storage, Elastic Search and Analytics, Adobe Data Workbench, and Tableau My current role is to support the data visualization effort by creating different data visualizations with meaningful metrics
  2. Developed user profiles to identify necessary tool features that best support the different data scientist skillsets Identified key product features to find tools that support the different data visualization requests Alternatives were identified as the 6 data visualization tools that the BI office is currently testing
  3. Data visualization products fluctuate depending on the customer, function, and requirements All fall under these 2 spectrums Exploratory to Explanatory – Does the product describe the results of an analysis? Explanatory. Does the product provide the audience with a mechanism to discover new information? Exploratory. Static to Interactive – Is the product meant to stand alone, static, or does the product allow the user to change their view, interactive?
  4. Developed based on market research, current employees, and organizational requirements Purpose: Identify the tool features that will best enable the data scientists’ skill sets 3 Profiles: Domain, Mathematical, and Developer
  5. 6 alternatives being tested by the BI office with temporary licenses Client is willing to choose more than one tool for the software stack Range from proprietary software to open source programming instances
  6. High level objectives. Client wants to review results to determine if they need a more granular level of analysis. Touches on most important aspects of the data scientists skillsets Assume that if a product is able to effectively create interactive, exploratory products then it is able to create static, explanatory products too
  7. Card sort activity to assign measures to both data scientist profiles and functional objectives Refined measures and objectives so all elements aligned (Small changes like wording)
  8. How the model is structured in logical decisions for windows
  9. Information found on the alternatives was through independent research and feedback from the BI data scientists testing the alternatives Worst and best alternatives weren’t actual options in the model, but was created to calculate the individual weights Decision maker was absent for a long period of time due to a family emergency. The DM has only recently returned to the project part-time and is currently reviewing the model.
  10. Tableau – Ingests over 4 billion rows of data Plot.ly – Limits similar to MS Excel Plot.ly supports R data, Python data, and Spreadsheets (i.e. Google Sheets, Excel) Tableau supports their own spreadsheet language
  11. Major tradeoff is that the RShiny package provides a server that will allow the BI team to share their data visualization products with a large audience; whereas, Bokeh requires the BI office to acquire additional resources to share products
  12. Domain data scientist is focused on creating different customized product types with a usable tool While Plot.ly supports multiple programming languages which increases collaboration across the team, this may not be one of the important features for the domain data scientist. This is one of the changes that can be implemented into the second phase of the decision model
  13. Mathematical & statistician data scientist is concerned with being able to conduct more complex statistical analysis on large datasets Plot.ly intends to expand their ability to ingest and process large data sets
  14. The latest version of Tableau enables the Developer Data Scientists to create custom data connections for various servers and the web Data size capacity and data connectors are the key features for the Developer Data Scientists
  15. The interactive product capability allows the user to meet the minimum requirements to be able to create the various product requirements Without the ability to communicate the results, the analysis done by the Data Scientists cannot reach an audience to inform key strategic decisions Despite data size being very important to the Mathematical and Developer Data Scientists, this variable was the least influential on the model
  16. Tableau is a more scalable option with the ability to handle over 4 billion rows of data. The products that can be created are highly customizable and are able to be imbedded within websites and content management systems such as SharePoint Plot.ly allows data scientists to use spreadsheets, R files, and python files to create interactive charts and dashboards that can also be shared via web pages, the Plot.ly web application, or exported as a image file. In-Q-Tel has conducted a commercial market study of over 50 data visualization applications with a plethora of attributes and measures that can be implemented into model. Also, recently the BI office developed their criteria for a tool’s capability for creating dashboards. Since this is an important feature identified by the BI office, this will be implemented into the model, as well.