SlideShare a Scribd company logo
1 | P a g e
Choosing Data Visualization Tools for Data Scientists
By: Heather R. Gilley
Introduction
Part of becomingan operational businessintelligence(BI) office isbeingable tocommunicatethe keyinsightsderived
fromdata acquisitionandanalysis. Toeffectivelycommunicatethese insights,the rightdatascientist,product,andtool
needtobe pairedtogetherforthe task.Currently,the BIoffice isstaffedwithdatascientistswhoregularlyreceive
requestsfordatavisualizationssuchasreports,dashboards,andanalytical updates.The challengetheyface nowisto
choose the righttool or tools.Usingthe approachoutlinedbelow,adecisionanalysiswasconducted todeterminehow
each data visualizationtool alternative scoredagainstthe objectives.
Method
1. Identifystrategicobjectivesforchoosingadatavisualizationtool byelicitingthe decisionmakerandreferencing
keydocuments
2. Developdatascientist profilestoidentifynecessarytool featuresthatsupportthe variousskillsetsassociated
withdata scientists
3. Identifythe productfeaturesthatneedtobe supportedbythe datavisualizationtool
4. Alignproductfeaturesanddatascientistskill setstothe functionalobjectives
5. Constructperformance measures thataccuratelygauge the functional objectives
6. Determine the importanceof eachobjective andmeasure
7. Analyze resultsforalternativesandconductsensitivityanalysisof the measures
To choose the right data visualizationtool,datascientistskillsandproductfeaturesneedtobe identifiedand
incorporatedintothe model. Datascientistshave varyingskillsetsanddifferentproductsshare keyfeaturesthatthe
data visualizationtool mustbe able tomanage.These skillsandfeatures are developedintothe decisionanalytical
model aspart of the requirements.Todevelopthe functional objectivesandmeasures,keystrategicdocuments, job
positionrequirements, marketresearch,anddiscussionswiththe decisionmaker were usedtoidentifydata
visualizationtool featuresthataligntothe data scientistprofilesandproduct features.i
2 | P a g e
Key Product Features
The businessintelligence office oftenistaskedwithcreating datavisualizations tocommunicate analytical results.These
data visualizationproductsfluctuatedependingonthe customer,function,andrequirements,buteachof these
productsfall somewherewithinthe spectrumof interactive tostaticandexplanatorytoexploratory. Aninteractive
productoffersthe audience the chance toviewindividual piecesof dataon the chart and filter/sorttheirviewtofind
more insights,whileastaticproducthas a single messagebeingconveyedinone image.Anexplanatoryproduct
providesthe audience withastorythat leadsthe userto the final results,while anexploratorydatavisualizations
provide the audience withaproductthatis meantto be analyzedformultiplestorylines. The followinggraphillustrates
howsome of the mostcommonlyrequestedproductsfallonthe spectrum.
Explanatory
Interactive
Exploratory
Static
Infographic
ReportDashboard
Interactive
Chart
Figure 1: Product Scope
3 | P a g e
Data Scientist Profiles
While the productsare one part of the decisionmodel,the decisioncentersonchoosingthe righttool forthe data
scientist.The purpose of the profiles istoidentifythe tool features thatwill bestenable the datascientists’ skill sets.
Many articlesrecognize thatthere are a varietyof data scientistsandskill sets1
available.The followingdatascientist
profiles were developed basedon marketresearch,currentemployees,and organizational requirements.
Alternatives
There are manyoptionsfordata visualizationtoolsandeachone seemstoserve aseparate purpose.The business
intelligence office hasidentifiedsix alternativesfordatavisualizationtools. Currently,the teamhastemporarylicenses
for all of the alternativesinordertotestthe tools’capabilitiesagainsttheirdatasets. The clientisnotopposedto
choosingmore thanone alternative dependingonthe analytical resultsof the decisionmodel. Fordetailedinformation
for eachalternative,refertothe Alternativestabof the Data VisualizationTool DecisionModel excel workbook.
Data VisualizationTool Alternatives:
D3.js A JavaScriptlibrarythatenablesdeveloperstocreate complex,customdatavisualizationsonthe web
RShiny A R libraryand serverthatenablesRdata visualizationstobe interactive andavailable viaaHTML
framework
Bokeh A data visualizationforpythonthatcreateschartsfromD3 visualsandthe pythondata
Plot.ly A webapplication thatautomaticallycreatesvisualizationsfromavarietyof filestypesand
programminglanguages
Tableau A data visualizationtool thatoffersaneasy-to-useuserinterfacetocreate complex graphicsandcharts
Kibana An opensource datavisualizationanddashboardingtool thatconnectstothe NoSQLdatabase,elastic
search
Strategic Objective
The strategicobjective wasformedusingthe documentationforthe BIprogram, VANDL. The goal of VANDLis to
developdatascience people,skills,andtoolsforthe intelligence community.Partof thisgoal includesthe development
of toolsfortheirdata storage,analytics,andvisualizationsuite.These toolshave theirownstrategicobjectivetobe
considered, thatobjective istodesignforusability,extensibility,scalability,andaffordability.The current focusisthe
1
Top skill sets for data scientists and Analyzing the Analyzers
Domain
Data Scientist
Features:
1. Knowledgeable of the subject
matter and is able to add
context to the analysis for
insightful findings
2. General analysis(regression,
correlation, frequency
distributions)
3. Uses built-in tools foranalysis
Mathematical & Statistician
Data Scientist
Features:
1. Knowledgeable about complex statistical
modeling and analysis (ex. customer opinion
modeling, classification, text analysis, natural
language processing, etc.)
2. Builds, tests, and analyzes models utilizing
statisticalprogramminglanguages suchas,
python and R
3. Uses built-in tools and statistical programming
language librariesto buildvisualizations
Developer
Data Scientist
Features:
1. Knowledgeable in programming,
computer science, anddatabases
2. Creates connections between the
data andthe tools
3. Transforms data to enable
profiles 1 and 2 to perform
analysis andcommunicate results
4. Creates highlycustomized
interactive solutions
Figure 2: Data Scientist Profiles
4 | P a g e
visualizationtool suite.Usingthe organizationsdocumentationandconversationswiththe decisionmaker,the following
strategicobjective wasidentifiedforchoosingdatavisualizationtools.
Choose a tool or tools that enable data scientists to manipulate, analyze, interpret, and visualize data
Functional Objectives
Functional objectivesare specificandmeasureablepartsof the strategicobjective.Since the productsanddatascientists
are ‘the who’and‘the what’that determine whichtool ischosen;those componentsare incorporatedintothe
functional objectives. The followingtable outlinesanddefinesthe functionalobjectives.
Table 1: Functional Objective Definitions
Functional Objectives Description
Be flexibleenoughtoaccommodate
differentproducttypes
The data visualizationscreatedfollow intoone of fourcategories,dashboards,
reports,charts,or infographics.Eachproducthas differentrequirementsthat
will be capturedinthe measures.
Enablesstatistical analysisand
discovery
It iseasiertorecognize patternsand identifyimportantinsightswhendata
scientistsare able tovisuallyanalyzethe data.Inaddition,beingable tovisually
representanalysisplaysakeyrole inidentifyingandcommunicatinganalytical
insight.
Enableshighlycustomizedsolutions Some solutionsneedmore advanceddatavisualizations,byhavingatool that
goesbeyondbasicbar charts,line charts,and pie charts the data scientistscan
create a visualizationthatmeetsthose needs
Highusability Noteveryone hasthe skill settocode solutions.Toolswithadvancedintuitive
GUIs enable datascientiststoquicklycreate datavisualizations.
Scaleswithbigdata projects The customerexperience businessintelligence office hasalarge data setthat is
rapidlygrowing,the selectedtool mustbe able toscale withthe incomingdata.
Measures
Measureswere createdtogauge how well analternative scoresagainstafunctional objective andultimatelythe
strategicobjective. These measureswere createdby reviewingexistingdocumentationand creatinganaffinitydiagram
to visuallymapobjectivesandmeasures. The scale defineshow the measure isgaugedandthe range determinesthe
scope for the scores.Measuresthatare gaugedusing a Likertscale are qualitative, scoreswere determinedby
interviewingthe datascientistswhohave testedthe alternate datavisualizationtools andbyelicitingthe decisionmaker
wheneverpossible.These measures were determinedtobe independentof eachother. The followingtable definesthe
measures,theirunits,andtheirscale.
Table 2: Measure Definition and Scale
Measure Description Scale Range
Analytical Capability Level of analysisbuiltintouser
interface
Levelsdefinedbythe Likertscale Analytical
Capability
ChartingCapability Chartingcapabilityallowsthe userto
create complex charts
Levelsdefinedbythe Likertscale Charting
Capability
ProgrammingCapability Programmingcapabilityallowsthe user
to customize the productsappearance
and functionality
Levels definedbythe Likertscale Programming
Capability
DesignCapability Capabilitytochange the appearance of
the product
Levelsdefinedbythe Likertscale Design
Capability
5 | P a g e
Measure Description Scale Range
Numberof Supported
ProgrammingLanguages
The numberof programminglanguages
the tool is able toprocess
Countof the programming
languagesthe tool isable to
support
Numberof
Supported
Programming
Languages
GUI Toolswithuserinterfacesvstoolswith
interactive developmentenvironment
Levelsdefinedbythe Likertscale GUI
Interactive Product
Capability
How well the tool enablesproductsto
be interactive
Levelsdefinedbythe Likertscale Interactive
Product
Capability
Numberof Supported
File Types
The numberof filestypesthatthe tool
allowstobe importedandexported
Levelsdefinedbythe Likertscale Numberof
Supported
File Types
Data connectors The numberof data sourcesthe tool
can use
Countof featuresthatallowsthe
tool to connectto differentdata
sources
Data
connectors
AccessControl The layersof useraccesscontrol that
can be appliedtothe productsandthe
data behindthe products
Levelsdefinedbythe Likertscale Access
Control
Cost The yearlytotal cost per userto keepa
tool
Total costper userper year Cost
Data Size The quantityof data the tool isable to
ingestandchart. Thisexactamount
variesacrossdatasets;however
differenttoolsare able toscale to
differentlevels
Levelsdefinedbythe Likertscale 1 to 5
Analytical Approach
In orderto evaluate the alternatives,measureswere appliedtothe functional objectives.These measureswere
identifiedasindicatorsof the functional objectives becausetheyoverlapwiththe featuresnecessarytoaccommodate
the differentdatascientistskillsetsandthe differentdatavisualizationrequirements. A cardsort activitywasconducted
to ensure the datascientistprofilesandproductrequirements alignedwiththe functional objectivesandmeasures.
Mapping Objectives and Measures to Data Scientist Profiles
Functional objectives andmeasures were created usingthe featuresof the datascientistprofiles.Some measures,such
as numberof supportedprogramminglanguages,were identifiedascrossprofile requirementstobe flexible enoughto
accommodate differentproduct types.The abilitytobe flexibleenoughtoaccommodate differentproducttypestakes
intoconsiderationthatdatascientistshave differentskillstosupportthe same products. The followingdiagram
indicateshowthe profiles alignedtothe functional objectivesandmeasures.
6 | P a g e
Figure 3: How Data Scientist Profiles Align to Functional Objectives
7 | P a g e
Decision Model Structure
Afteridentifyingthe strategicobjectives,the functional objectives,andthe measures, the decisionmodel forchoosingadata visualizationtool ortoolscanbe
depictedinthe followingdiagram:
Figure 4: Decision Model Hierarchy
8 | P a g e
Scoring the Alternatives
Once the model hasbeendefined,the alternativesare evaluatedandscoredagainstthe independentmeasures. The
informationfoundonthe alternativeswasthroughindependentresearchandfeedbackfromthe BIdata scientists
testingthe alternatives. Some of the measureswere identifiedasbeingmore subjective,these measureswere scoredon
a Likertscale, with1 being‘doesnothave capability’and5 being‘capabilityhighlyexceedexpectations’, tocreate
consistencybetweenscores. The remainingmeasurescouldbe quantifiedbyeithercountordollaramount.
Late intothe developmentof the decision model,itwasidentifiedthatmore in-depthinformationonalternativeswas
available throughcommercialresearchconductedbyIn-Q-Tel.Thiscompanyidentifies,adapts,anddeliversinnovative
technological solutionstothe intelligence communityandiscurrentlyconductingresearchondatavisualizationtoolsfor
data scientists.Afterthisdiscovery,the decisionmakerdeterminedthatthisinformationwillbe implementedintothe
secondphase of the decisionmodel,in the future alongwithanyotheridentifiedimprovements.
Determining Weights
To determine the weightsforthe measures,the swingweightmethodwasapplied. The firststepwastodetermine
swingweightsistoidentifythe bestandworstalternativesthatcouldexist.Nextstepwastoelicit the decisionmaker
for howthe measuresshouldbe ranked.Duringthistime the decisionmakerwasunavailable,soadditional team
memberswere consultedtodetermine howtorankeach measure. Finally, the weightswerecalculatedusingthe
identifiedranks. The followingtableshowsthe worst/bestalternative andtheircorrespondingweights.
Table 3: Swing Weights
Worst Best Rank Weight Weight
InteractiveProductCapability(IP) 1 5 1.00 0.132 Total RankWeight 7.55 WIP
AnalyticalCapability(AN) 0 581 0.95 0.126 WeightIP = 0.1325
Charting Capability(CH) 1 5 0.85 0.113
DataSize (DS) 1 5 0.80 0.106
Numberof Supported Programming
Languages(PL)
0 4 0.75 0.099
DataConnectors(DC) 2 40 0.70 0.093
Access Control(AC) 0 5 0.60 0.079
Programming Capability(PC) 1 5 0.55 0.073
GUI (G) 1 5 0.45 0.060
Design Capability(DC) 1 5 0.40 0.053
Numberof Supported FileTypes (FT) 1 5 0.30 0.040
Cost(C) 1999 0 0.20 0.026
9 | P a g e
Analysis and Computation
Whenthe buildingblocksof the decisionmodel were established,the modelwasbuiltintoExcel andLogical Decisions
for Windows. Logical DecisionsforWindowswasusedtobuildthe model forcalculatingthe subjectivegoal of choosing
a data visualizationtool thatenablesdatascientists.Excel wasusedtocalculate the resultsforeachdatascientisttype.
The followingchartshowsthe rankedresultsforeachalternativeandhow theyscore againstthe functional objectives.
Figure 5: Alternatives Ranked by Goal: Choose Data Visualization Tool
From the resultswe cansee that there are alternativeswithverysimilarscores:Tableau&Plot.lyand Bokeh&RShiny.
The followingsectionshighlightthose differencesandthe tradeoffsof choosingone tool overanother.
Comparing Alternatives
Plot.ly vs. Tableau
Figure 6: Plot.ly vs. Tableau Tornado Diagram
Plot.ly andTableauscoredvarysimilarly.Bothtoolsare capable of creatingproductswithinthe BusinessIntelligence
Office’sscope andprovide aplatformfordatascientiststoexplore variousdatasets,butwithdifferenttradeoffs.Plot.ly
allowsdatascientiststouse multipledatamanipulationtoolssuchasPython,R,andExcel to create advanced
10 | P a g e
visualizationsand conductadvancedanalyticsinacollaborative setting. Tableaurequires eachdatascientist tolearn
theirspreadsheetlanguage asopposedtousingthe skillsetstheyalreadypossess.Thisisanadvantage forPlot.lyasit
allowsdatascientistswithdesperate skillsetsto collaboratively use the same tool. However, Tableauisable toconnect
to a largernumberof data sources andis able toprocessdatasetsthat qualifyas“bigdata”. Since the governmentisone
of the largestproducersof data,thisis an importantrequirementtoconsider.
RShiny vs. Bokeh
Figure 7: RShiny vs. Bokeh Tornado Diagram
BokehandRShinyhad the same score,but inthe diagram above youcan see the tradeoffsof choosingone tool over
another. Aspart of the R library,RShinyissupportedbyamultitude of statistical programminglibraries. Also,the RShiny
package includes RShinyServer,whichisable toconnectto many differentdatasources.However,RShinyrequiresthe
userto implementaCSS file tochange the styles. Bokehallowsthe datascientisttoutilizedesign optionstoenhance
products,improve communications,andisalsosupported bymultiplestatistical programminglibraries,butnotasmany
as R.
11 | P a g e
Alternative Results for Data Scientist Profiles
Each of the data scientistprofileshave correspondingfunctional objectives,asoutlinedinthe DataScientistProfiles
section,tochoose the besttool for eachdata scientistskillset.The followingsectionsoutline the resultsforeach
alternative asitrelatestothe data scientistprofiles:
Domain Data Scientist
Figure 8: Alternatives Ranked by Domain Data Scientist Profile
The domaindata scientistisfocusedoncreatingdifferentcustomizedproducttypeswithausable tool.Tableauand
Plot.lybothscoredhighlywiththe domaindatascientist.These alternativesofferintuitiveuserinterfacesthatallowa
data scientistto quicklycreate highlyinteractive chartsthatcan be usedfor communicationsoranalysis. While Tableau
offersmore designcapabilities,Plot.ly’sabilitytosupportmultiple programminglanguagesenablesdomaindata
scientiststocollaborate withotherdatascientistsmore easily.
12 | P a g e
Mathematical & Statistician Data Scientist
Figure 9: Alternatives Ranked by Mathematical & Statistician Data Scientist Profile
The mathematical &statisticiandatascientistisconcernedwithbeing able toconductmore complex statisticalanalysis
on large datasetsandbeingable tocommunicate those results. Plot.lyisafairlynew technologythatisstill developing
theircapabilities andcurrentlyisunable tohandle datasetsthatqualifyas“big data”. Plot.lyintendstoexpandtheir
abilitytoingestandprocesslarge data sets;however,Tableaucurrentlyhasthatcapabilitybuiltintotheirsoftware.
Developer Data Scientist
Figure 10: Alternatives Ranked by Developer Data Scientist Profile
The developerdatascientistisresponsible foracquiringandtransformingthe dataintoadatasetthat is usable forother
data scientists;therefore,theyare more concernedwithscalabilityandcustomizability.Asnotedinthe mathematical &
statisticiandatascientistprofile,Tableauisthe besttool available forscalingwiththe dataquantity.
13 | P a g e
Sensitivity Analysis
The resultsof the decisionmodel are more sensitivetosome measures overothers. The followingchartshowsthe
resultsof the sensitivityanalysisforthe differentmeasuresinthe decisionmodel:
0
{0}
0.495839701
{5}
0.515707251
{5}
0.509084734
{5}
0.55544235
{5}
0.495839701
{0}
0.5753099
{5}
0.569536424
{40}
0.595177449
{4}
0.495839701
{5}
0.495839701
{5}
0.621667516
{5}
0.528952284
{5}
0
{0}
0.469349635
{1}
0.475972151
{1}
0.456104602
{1}
0.495839701
{1}
0.422992019
{1999}
0.495839701
{1}
0.476821192
{1}
0.495839701
{1}
0.389879436
{1}
0.38325692
{1}
0.495839701
{1}
0.396501953
{1}
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Score
Data Size (DS)
Number of Supported File Types (FT)
Design Capability (DE)
GUI (G)
Cost (C)
Access Control (AC)
Data Connectors (DC)
Number of Supported Programming Languages (PL)
Programming Capability (PC)
Charting Capability (CH)
Analytical Capability (AN)
Interactive Product Capability (IP)
Choose a Data Visualization Tool
14 | P a g e
Conclusion
Consistently,TableauandPlot.lyemerge ashighlyrankedalternatives.Tableauwaschosenasthe bestoptionforthe
overall objective,the mathematical &statisticiandatascientistprofile,andthe developerdatascientistprofile;while
Plot.lywaschosenasthe bestoptionfor the domaindatascientistprofile.Theseoptionshave differenttradeoffs
dependingthe datascientistneedsand the productrequirements.The datascientistprofilesare leaningtowards
Tableauandto gain a more granular insightintothe besttool dependingonproductrequirementsthe model needsto
be refinedevenfurther.Thismodel isstillfairlyhigh-level andiscurrentlyunderreview bythe decisionmaker togain
that level of granularity.
Thisdecisionmodel wasformedbyelicitingthe projectteam membersandreferringtothe project’skeystrategic
documents.Ideally,the decisionmakerwouldhave been elicitedconsistentlythroughoutthe process;however,he was
absentdue to a familyemergencyforthe majorityof the projectduration. Recently,the decisionmakerreturnedtothe
projectand he iscurrentlyreviewingthe resultsof the analysis. These changeswillbe incorporatedintothe future
model alongwithanyotheridentifiedchangesmade bythe decisionmaker.
Duringthe reviewprocessadditional resourceswere identifiedforrefiningthe decisionmodel.In-Q-Telconductedanin-
depthstudyof over50 data visualizationtoolswithnumerousattributesidentified.The decisionmakerprovideda
documentwithquantifiablemeasuresfordashboardproductrequirements.The studyandthe measureswillbe
reviewedtodetermineif theyneedtobe incorporatedintothe advanceddecisionmodelorif the resultsof thiscurrent
studyisenoughto drive a decision.
i Note, duringthe analysis process,the decision maker suddenly needed to be absent for an extended period of time due to a fa mily
emergency. The decision maker returned towards the end of the initiativeand has identified areas for further analysis.

More Related Content

What's hot

3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics
Nandita Nityanandam
 
Informatica data quality online training
Informatica data quality online trainingInformatica data quality online training
Informatica data quality online trainingDivya Shree
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
anicewick
 
Informatica data quality[IDQ] 9
Informatica data quality[IDQ] 9Informatica data quality[IDQ] 9
Informatica data quality[IDQ] 9RISLGLOBAL
 
Data profiling-best-practices
Data profiling-best-practicesData profiling-best-practices
Data profiling-best-practices
Blaise Cheuteu
 
Agile collaborative practices
Agile collaborative practicesAgile collaborative practices
Agile collaborative practices
Sreejith Madhavan
 
Data visualization 2
Data visualization 2Data visualization 2
Data visualization 2
ManokamnaKochar1
 
IT7113 research project_group_4
IT7113 research project_group_4IT7113 research project_group_4
IT7113 research project_group_4
ethanlchandler
 
Elements of Data Documentation
Elements of Data DocumentationElements of Data Documentation
Elements of Data Documentation
ssri-duke
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
astronish
 
MS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining toolsMS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining tools
DataminingTools Inc
 
B040101007012
B040101007012B040101007012
B040101007012
ijceronline
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case study
Nandita Nityanandam
 
Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3
Haoran Du
 
Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...
Merlien Institute
 
SAP BO Web Intelligence Basics
SAP BO Web Intelligence BasicsSAP BO Web Intelligence Basics
SAP BO Web Intelligence Basics
Kiran Joy
 
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Simplilearn
 

What's hot (18)

3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics
 
Informatica data quality online training
Informatica data quality online trainingInformatica data quality online training
Informatica data quality online training
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
 
Informatica data quality[IDQ] 9
Informatica data quality[IDQ] 9Informatica data quality[IDQ] 9
Informatica data quality[IDQ] 9
 
Data profiling-best-practices
Data profiling-best-practicesData profiling-best-practices
Data profiling-best-practices
 
Agile collaborative practices
Agile collaborative practicesAgile collaborative practices
Agile collaborative practices
 
Data visualization 2
Data visualization 2Data visualization 2
Data visualization 2
 
IT7113 research project_group_4
IT7113 research project_group_4IT7113 research project_group_4
IT7113 research project_group_4
 
Elements of Data Documentation
Elements of Data DocumentationElements of Data Documentation
Elements of Data Documentation
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 
MS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining toolsMS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining tools
 
B040101007012
B040101007012B040101007012
B040101007012
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case study
 
IntelligentEnterprise
IntelligentEnterpriseIntelligentEnterprise
IntelligentEnterprise
 
Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3Dallas datascienceconference jasongeng-v3
Dallas datascienceconference jasongeng-v3
 
Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...
 
SAP BO Web Intelligence Basics
SAP BO Web Intelligence BasicsSAP BO Web Intelligence Basics
SAP BO Web Intelligence Basics
 
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
 

Similar to Choosing a Data Visualization Tool for Data Scientists Report

What is the Best Data Visualization Tool: Power BI or Tableau?
What is the Best Data Visualization Tool: Power BI or Tableau?What is the Best Data Visualization Tool: Power BI or Tableau?
What is the Best Data Visualization Tool: Power BI or Tableau?
Digital Dialogue
 
Data science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptxData science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptx
NagarajanG35
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Raphael Branger
 
10 Best Big Data Management Tools
10 Best Big Data Management Tools10 Best Big Data Management Tools
10 Best Big Data Management Tools
PromptCloud
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
SpringPeople
 
How Can Business Analytics Dashboard Help Data Analysts.pdf
How Can Business Analytics Dashboard Help Data Analysts.pdfHow Can Business Analytics Dashboard Help Data Analysts.pdf
How Can Business Analytics Dashboard Help Data Analysts.pdf
Grow
 
Python para Manual de Ciência de Dados
Python para Manual de Ciência de DadosPython para Manual de Ciência de Dados
Python para Manual de Ciência de Dados
Rafael Oliveira Bitcoin
 
Slide notes for "The Rise of Self-service Business Intelligence"
Slide notes for "The Rise of Self-service Business Intelligence"Slide notes for "The Rise of Self-service Business Intelligence"
Slide notes for "The Rise of Self-service Business Intelligence"
skewdlogix
 
Intro of Key Features of SoftCAAT BI Software
Intro of Key Features of SoftCAAT BI SoftwareIntro of Key Features of SoftCAAT BI Software
Intro of Key Features of SoftCAAT BI Software
rafeq
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
shreeuva
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
SIBICHAKKARAVARTHYCM
 
The Rise of AI and Machine Learning in Power BI (1).pdf
The Rise of AI and Machine Learning in Power BI (1).pdfThe Rise of AI and Machine Learning in Power BI (1).pdf
The Rise of AI and Machine Learning in Power BI (1).pdf
Sparity1
 
Business Analytics 101: Methodology, Tools, and Career Path
Business Analytics 101: Methodology, Tools, and Career PathBusiness Analytics 101: Methodology, Tools, and Career Path
Business Analytics 101: Methodology, Tools, and Career Path
OutreachUpgradCampus
 
ow Do Data Analysis Tools Make Data Preparation Easier?
ow Do Data Analysis Tools Make Data Preparation Easier?ow Do Data Analysis Tools Make Data Preparation Easier?
ow Do Data Analysis Tools Make Data Preparation Easier?
Grow
 
Shareinsights an-end-to-end-implementation-of-the-modern-analytics-archi...
Shareinsights an-end-to-end-implementation-of-the-modern-analytics-archi...Shareinsights an-end-to-end-implementation-of-the-modern-analytics-archi...
Shareinsights an-end-to-end-implementation-of-the-modern-analytics-archi...
Accelerite
 
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptx
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptxUnlocking Insights_ The Power of Data Analytics in the Modern World.pptx
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptx
APTRON Solutions Noida
 
Exploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdfExploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdf
Stats Statswork
 
Exploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdfExploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdf
Stats Statswork
 
Deeper Insights with Alteryx
Deeper Insights with AlteryxDeeper Insights with Alteryx
Deeper Insights with Alteryx
Phil Budden
 

Similar to Choosing a Data Visualization Tool for Data Scientists Report (20)

What is the Best Data Visualization Tool: Power BI or Tableau?
What is the Best Data Visualization Tool: Power BI or Tableau?What is the Best Data Visualization Tool: Power BI or Tableau?
What is the Best Data Visualization Tool: Power BI or Tableau?
 
Data science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptxData science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptx
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
 
10 Best Big Data Management Tools
10 Best Big Data Management Tools10 Best Big Data Management Tools
10 Best Big Data Management Tools
 
Visual discovery tools
Visual discovery toolsVisual discovery tools
Visual discovery tools
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
How Can Business Analytics Dashboard Help Data Analysts.pdf
How Can Business Analytics Dashboard Help Data Analysts.pdfHow Can Business Analytics Dashboard Help Data Analysts.pdf
How Can Business Analytics Dashboard Help Data Analysts.pdf
 
Python para Manual de Ciência de Dados
Python para Manual de Ciência de DadosPython para Manual de Ciência de Dados
Python para Manual de Ciência de Dados
 
Slide notes for "The Rise of Self-service Business Intelligence"
Slide notes for "The Rise of Self-service Business Intelligence"Slide notes for "The Rise of Self-service Business Intelligence"
Slide notes for "The Rise of Self-service Business Intelligence"
 
Intro of Key Features of SoftCAAT BI Software
Intro of Key Features of SoftCAAT BI SoftwareIntro of Key Features of SoftCAAT BI Software
Intro of Key Features of SoftCAAT BI Software
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
The Rise of AI and Machine Learning in Power BI (1).pdf
The Rise of AI and Machine Learning in Power BI (1).pdfThe Rise of AI and Machine Learning in Power BI (1).pdf
The Rise of AI and Machine Learning in Power BI (1).pdf
 
Business Analytics 101: Methodology, Tools, and Career Path
Business Analytics 101: Methodology, Tools, and Career PathBusiness Analytics 101: Methodology, Tools, and Career Path
Business Analytics 101: Methodology, Tools, and Career Path
 
ow Do Data Analysis Tools Make Data Preparation Easier?
ow Do Data Analysis Tools Make Data Preparation Easier?ow Do Data Analysis Tools Make Data Preparation Easier?
ow Do Data Analysis Tools Make Data Preparation Easier?
 
Shareinsights an-end-to-end-implementation-of-the-modern-analytics-archi...
Shareinsights an-end-to-end-implementation-of-the-modern-analytics-archi...Shareinsights an-end-to-end-implementation-of-the-modern-analytics-archi...
Shareinsights an-end-to-end-implementation-of-the-modern-analytics-archi...
 
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptx
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptxUnlocking Insights_ The Power of Data Analytics in the Modern World.pptx
Unlocking Insights_ The Power of Data Analytics in the Modern World.pptx
 
Exploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdfExploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdf
 
Exploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdfExploring the impact and evolution of Advanced Analytics Tools.pdf
Exploring the impact and evolution of Advanced Analytics Tools.pdf
 
Deeper Insights with Alteryx
Deeper Insights with AlteryxDeeper Insights with Alteryx
Deeper Insights with Alteryx
 

Choosing a Data Visualization Tool for Data Scientists Report

  • 1. 1 | P a g e Choosing Data Visualization Tools for Data Scientists By: Heather R. Gilley Introduction Part of becomingan operational businessintelligence(BI) office isbeingable tocommunicatethe keyinsightsderived fromdata acquisitionandanalysis. Toeffectivelycommunicatethese insights,the rightdatascientist,product,andtool needtobe pairedtogetherforthe task.Currently,the BIoffice isstaffedwithdatascientistswhoregularlyreceive requestsfordatavisualizationssuchasreports,dashboards,andanalytical updates.The challengetheyface nowisto choose the righttool or tools.Usingthe approachoutlinedbelow,adecisionanalysiswasconducted todeterminehow each data visualizationtool alternative scoredagainstthe objectives. Method 1. Identifystrategicobjectivesforchoosingadatavisualizationtool byelicitingthe decisionmakerandreferencing keydocuments 2. Developdatascientist profilestoidentifynecessarytool featuresthatsupportthe variousskillsetsassociated withdata scientists 3. Identifythe productfeaturesthatneedtobe supportedbythe datavisualizationtool 4. Alignproductfeaturesanddatascientistskill setstothe functionalobjectives 5. Constructperformance measures thataccuratelygauge the functional objectives 6. Determine the importanceof eachobjective andmeasure 7. Analyze resultsforalternativesandconductsensitivityanalysisof the measures To choose the right data visualizationtool,datascientistskillsandproductfeaturesneedtobe identifiedand incorporatedintothe model. Datascientistshave varyingskillsetsanddifferentproductsshare keyfeaturesthatthe data visualizationtool mustbe able tomanage.These skillsandfeatures are developedintothe decisionanalytical model aspart of the requirements.Todevelopthe functional objectivesandmeasures,keystrategicdocuments, job positionrequirements, marketresearch,anddiscussionswiththe decisionmaker were usedtoidentifydata visualizationtool featuresthataligntothe data scientistprofilesandproduct features.i
  • 2. 2 | P a g e Key Product Features The businessintelligence office oftenistaskedwithcreating datavisualizations tocommunicate analytical results.These data visualizationproductsfluctuatedependingonthe customer,function,andrequirements,buteachof these productsfall somewherewithinthe spectrumof interactive tostaticandexplanatorytoexploratory. Aninteractive productoffersthe audience the chance toviewindividual piecesof dataon the chart and filter/sorttheirviewtofind more insights,whileastaticproducthas a single messagebeingconveyedinone image.Anexplanatoryproduct providesthe audience withastorythat leadsthe userto the final results,while anexploratorydatavisualizations provide the audience withaproductthatis meantto be analyzedformultiplestorylines. The followinggraphillustrates howsome of the mostcommonlyrequestedproductsfallonthe spectrum. Explanatory Interactive Exploratory Static Infographic ReportDashboard Interactive Chart Figure 1: Product Scope
  • 3. 3 | P a g e Data Scientist Profiles While the productsare one part of the decisionmodel,the decisioncentersonchoosingthe righttool forthe data scientist.The purpose of the profiles istoidentifythe tool features thatwill bestenable the datascientists’ skill sets. Many articlesrecognize thatthere are a varietyof data scientistsandskill sets1 available.The followingdatascientist profiles were developed basedon marketresearch,currentemployees,and organizational requirements. Alternatives There are manyoptionsfordata visualizationtoolsandeachone seemstoserve aseparate purpose.The business intelligence office hasidentifiedsix alternativesfordatavisualizationtools. Currently,the teamhastemporarylicenses for all of the alternativesinordertotestthe tools’capabilitiesagainsttheirdatasets. The clientisnotopposedto choosingmore thanone alternative dependingonthe analytical resultsof the decisionmodel. Fordetailedinformation for eachalternative,refertothe Alternativestabof the Data VisualizationTool DecisionModel excel workbook. Data VisualizationTool Alternatives: D3.js A JavaScriptlibrarythatenablesdeveloperstocreate complex,customdatavisualizationsonthe web RShiny A R libraryand serverthatenablesRdata visualizationstobe interactive andavailable viaaHTML framework Bokeh A data visualizationforpythonthatcreateschartsfromD3 visualsandthe pythondata Plot.ly A webapplication thatautomaticallycreatesvisualizationsfromavarietyof filestypesand programminglanguages Tableau A data visualizationtool thatoffersaneasy-to-useuserinterfacetocreate complex graphicsandcharts Kibana An opensource datavisualizationanddashboardingtool thatconnectstothe NoSQLdatabase,elastic search Strategic Objective The strategicobjective wasformedusingthe documentationforthe BIprogram, VANDL. The goal of VANDLis to developdatascience people,skills,andtoolsforthe intelligence community.Partof thisgoal includesthe development of toolsfortheirdata storage,analytics,andvisualizationsuite.These toolshave theirownstrategicobjectivetobe considered, thatobjective istodesignforusability,extensibility,scalability,andaffordability.The current focusisthe 1 Top skill sets for data scientists and Analyzing the Analyzers Domain Data Scientist Features: 1. Knowledgeable of the subject matter and is able to add context to the analysis for insightful findings 2. General analysis(regression, correlation, frequency distributions) 3. Uses built-in tools foranalysis Mathematical & Statistician Data Scientist Features: 1. Knowledgeable about complex statistical modeling and analysis (ex. customer opinion modeling, classification, text analysis, natural language processing, etc.) 2. Builds, tests, and analyzes models utilizing statisticalprogramminglanguages suchas, python and R 3. Uses built-in tools and statistical programming language librariesto buildvisualizations Developer Data Scientist Features: 1. Knowledgeable in programming, computer science, anddatabases 2. Creates connections between the data andthe tools 3. Transforms data to enable profiles 1 and 2 to perform analysis andcommunicate results 4. Creates highlycustomized interactive solutions Figure 2: Data Scientist Profiles
  • 4. 4 | P a g e visualizationtool suite.Usingthe organizationsdocumentationandconversationswiththe decisionmaker,the following strategicobjective wasidentifiedforchoosingdatavisualizationtools. Choose a tool or tools that enable data scientists to manipulate, analyze, interpret, and visualize data Functional Objectives Functional objectivesare specificandmeasureablepartsof the strategicobjective.Since the productsanddatascientists are ‘the who’and‘the what’that determine whichtool ischosen;those componentsare incorporatedintothe functional objectives. The followingtable outlinesanddefinesthe functionalobjectives. Table 1: Functional Objective Definitions Functional Objectives Description Be flexibleenoughtoaccommodate differentproducttypes The data visualizationscreatedfollow intoone of fourcategories,dashboards, reports,charts,or infographics.Eachproducthas differentrequirementsthat will be capturedinthe measures. Enablesstatistical analysisand discovery It iseasiertorecognize patternsand identifyimportantinsightswhendata scientistsare able tovisuallyanalyzethe data.Inaddition,beingable tovisually representanalysisplaysakeyrole inidentifyingandcommunicatinganalytical insight. Enableshighlycustomizedsolutions Some solutionsneedmore advanceddatavisualizations,byhavingatool that goesbeyondbasicbar charts,line charts,and pie charts the data scientistscan create a visualizationthatmeetsthose needs Highusability Noteveryone hasthe skill settocode solutions.Toolswithadvancedintuitive GUIs enable datascientiststoquicklycreate datavisualizations. Scaleswithbigdata projects The customerexperience businessintelligence office hasalarge data setthat is rapidlygrowing,the selectedtool mustbe able toscale withthe incomingdata. Measures Measureswere createdtogauge how well analternative scoresagainstafunctional objective andultimatelythe strategicobjective. These measureswere createdby reviewingexistingdocumentationand creatinganaffinitydiagram to visuallymapobjectivesandmeasures. The scale defineshow the measure isgaugedandthe range determinesthe scope for the scores.Measuresthatare gaugedusing a Likertscale are qualitative, scoreswere determinedby interviewingthe datascientistswhohave testedthe alternate datavisualizationtools andbyelicitingthe decisionmaker wheneverpossible.These measures were determinedtobe independentof eachother. The followingtable definesthe measures,theirunits,andtheirscale. Table 2: Measure Definition and Scale Measure Description Scale Range Analytical Capability Level of analysisbuiltintouser interface Levelsdefinedbythe Likertscale Analytical Capability ChartingCapability Chartingcapabilityallowsthe userto create complex charts Levelsdefinedbythe Likertscale Charting Capability ProgrammingCapability Programmingcapabilityallowsthe user to customize the productsappearance and functionality Levels definedbythe Likertscale Programming Capability DesignCapability Capabilitytochange the appearance of the product Levelsdefinedbythe Likertscale Design Capability
  • 5. 5 | P a g e Measure Description Scale Range Numberof Supported ProgrammingLanguages The numberof programminglanguages the tool is able toprocess Countof the programming languagesthe tool isable to support Numberof Supported Programming Languages GUI Toolswithuserinterfacesvstoolswith interactive developmentenvironment Levelsdefinedbythe Likertscale GUI Interactive Product Capability How well the tool enablesproductsto be interactive Levelsdefinedbythe Likertscale Interactive Product Capability Numberof Supported File Types The numberof filestypesthatthe tool allowstobe importedandexported Levelsdefinedbythe Likertscale Numberof Supported File Types Data connectors The numberof data sourcesthe tool can use Countof featuresthatallowsthe tool to connectto differentdata sources Data connectors AccessControl The layersof useraccesscontrol that can be appliedtothe productsandthe data behindthe products Levelsdefinedbythe Likertscale Access Control Cost The yearlytotal cost per userto keepa tool Total costper userper year Cost Data Size The quantityof data the tool isable to ingestandchart. Thisexactamount variesacrossdatasets;however differenttoolsare able toscale to differentlevels Levelsdefinedbythe Likertscale 1 to 5 Analytical Approach In orderto evaluate the alternatives,measureswere appliedtothe functional objectives.These measureswere identifiedasindicatorsof the functional objectives becausetheyoverlapwiththe featuresnecessarytoaccommodate the differentdatascientistskillsetsandthe differentdatavisualizationrequirements. A cardsort activitywasconducted to ensure the datascientistprofilesandproductrequirements alignedwiththe functional objectivesandmeasures. Mapping Objectives and Measures to Data Scientist Profiles Functional objectives andmeasures were created usingthe featuresof the datascientistprofiles.Some measures,such as numberof supportedprogramminglanguages,were identifiedascrossprofile requirementstobe flexible enoughto accommodate differentproduct types.The abilitytobe flexibleenoughtoaccommodate differentproducttypestakes intoconsiderationthatdatascientistshave differentskillstosupportthe same products. The followingdiagram indicateshowthe profiles alignedtothe functional objectivesandmeasures.
  • 6. 6 | P a g e Figure 3: How Data Scientist Profiles Align to Functional Objectives
  • 7. 7 | P a g e Decision Model Structure Afteridentifyingthe strategicobjectives,the functional objectives,andthe measures, the decisionmodel forchoosingadata visualizationtool ortoolscanbe depictedinthe followingdiagram: Figure 4: Decision Model Hierarchy
  • 8. 8 | P a g e Scoring the Alternatives Once the model hasbeendefined,the alternativesare evaluatedandscoredagainstthe independentmeasures. The informationfoundonthe alternativeswasthroughindependentresearchandfeedbackfromthe BIdata scientists testingthe alternatives. Some of the measureswere identifiedasbeingmore subjective,these measureswere scoredon a Likertscale, with1 being‘doesnothave capability’and5 being‘capabilityhighlyexceedexpectations’, tocreate consistencybetweenscores. The remainingmeasurescouldbe quantifiedbyeithercountordollaramount. Late intothe developmentof the decision model,itwasidentifiedthatmore in-depthinformationonalternativeswas available throughcommercialresearchconductedbyIn-Q-Tel.Thiscompanyidentifies,adapts,anddeliversinnovative technological solutionstothe intelligence communityandiscurrentlyconductingresearchondatavisualizationtoolsfor data scientists.Afterthisdiscovery,the decisionmakerdeterminedthatthisinformationwillbe implementedintothe secondphase of the decisionmodel,in the future alongwithanyotheridentifiedimprovements. Determining Weights To determine the weightsforthe measures,the swingweightmethodwasapplied. The firststepwastodetermine swingweightsistoidentifythe bestandworstalternativesthatcouldexist.Nextstepwastoelicit the decisionmaker for howthe measuresshouldbe ranked.Duringthistime the decisionmakerwasunavailable,soadditional team memberswere consultedtodetermine howtorankeach measure. Finally, the weightswerecalculatedusingthe identifiedranks. The followingtableshowsthe worst/bestalternative andtheircorrespondingweights. Table 3: Swing Weights Worst Best Rank Weight Weight InteractiveProductCapability(IP) 1 5 1.00 0.132 Total RankWeight 7.55 WIP AnalyticalCapability(AN) 0 581 0.95 0.126 WeightIP = 0.1325 Charting Capability(CH) 1 5 0.85 0.113 DataSize (DS) 1 5 0.80 0.106 Numberof Supported Programming Languages(PL) 0 4 0.75 0.099 DataConnectors(DC) 2 40 0.70 0.093 Access Control(AC) 0 5 0.60 0.079 Programming Capability(PC) 1 5 0.55 0.073 GUI (G) 1 5 0.45 0.060 Design Capability(DC) 1 5 0.40 0.053 Numberof Supported FileTypes (FT) 1 5 0.30 0.040 Cost(C) 1999 0 0.20 0.026
  • 9. 9 | P a g e Analysis and Computation Whenthe buildingblocksof the decisionmodel were established,the modelwasbuiltintoExcel andLogical Decisions for Windows. Logical DecisionsforWindowswasusedtobuildthe model forcalculatingthe subjectivegoal of choosing a data visualizationtool thatenablesdatascientists.Excel wasusedtocalculate the resultsforeachdatascientisttype. The followingchartshowsthe rankedresultsforeachalternativeandhow theyscore againstthe functional objectives. Figure 5: Alternatives Ranked by Goal: Choose Data Visualization Tool From the resultswe cansee that there are alternativeswithverysimilarscores:Tableau&Plot.lyand Bokeh&RShiny. The followingsectionshighlightthose differencesandthe tradeoffsof choosingone tool overanother. Comparing Alternatives Plot.ly vs. Tableau Figure 6: Plot.ly vs. Tableau Tornado Diagram Plot.ly andTableauscoredvarysimilarly.Bothtoolsare capable of creatingproductswithinthe BusinessIntelligence Office’sscope andprovide aplatformfordatascientiststoexplore variousdatasets,butwithdifferenttradeoffs.Plot.ly allowsdatascientiststouse multipledatamanipulationtoolssuchasPython,R,andExcel to create advanced
  • 10. 10 | P a g e visualizationsand conductadvancedanalyticsinacollaborative setting. Tableaurequires eachdatascientist tolearn theirspreadsheetlanguage asopposedtousingthe skillsetstheyalreadypossess.Thisisanadvantage forPlot.lyasit allowsdatascientistswithdesperate skillsetsto collaboratively use the same tool. However, Tableauisable toconnect to a largernumberof data sources andis able toprocessdatasetsthat qualifyas“bigdata”. Since the governmentisone of the largestproducersof data,thisis an importantrequirementtoconsider. RShiny vs. Bokeh Figure 7: RShiny vs. Bokeh Tornado Diagram BokehandRShinyhad the same score,but inthe diagram above youcan see the tradeoffsof choosingone tool over another. Aspart of the R library,RShinyissupportedbyamultitude of statistical programminglibraries. Also,the RShiny package includes RShinyServer,whichisable toconnectto many differentdatasources.However,RShinyrequiresthe userto implementaCSS file tochange the styles. Bokehallowsthe datascientisttoutilizedesign optionstoenhance products,improve communications,andisalsosupported bymultiplestatistical programminglibraries,butnotasmany as R.
  • 11. 11 | P a g e Alternative Results for Data Scientist Profiles Each of the data scientistprofileshave correspondingfunctional objectives,asoutlinedinthe DataScientistProfiles section,tochoose the besttool for eachdata scientistskillset.The followingsectionsoutline the resultsforeach alternative asitrelatestothe data scientistprofiles: Domain Data Scientist Figure 8: Alternatives Ranked by Domain Data Scientist Profile The domaindata scientistisfocusedoncreatingdifferentcustomizedproducttypeswithausable tool.Tableauand Plot.lybothscoredhighlywiththe domaindatascientist.These alternativesofferintuitiveuserinterfacesthatallowa data scientistto quicklycreate highlyinteractive chartsthatcan be usedfor communicationsoranalysis. While Tableau offersmore designcapabilities,Plot.ly’sabilitytosupportmultiple programminglanguagesenablesdomaindata scientiststocollaborate withotherdatascientistsmore easily.
  • 12. 12 | P a g e Mathematical & Statistician Data Scientist Figure 9: Alternatives Ranked by Mathematical & Statistician Data Scientist Profile The mathematical &statisticiandatascientistisconcernedwithbeing able toconductmore complex statisticalanalysis on large datasetsandbeingable tocommunicate those results. Plot.lyisafairlynew technologythatisstill developing theircapabilities andcurrentlyisunable tohandle datasetsthatqualifyas“big data”. Plot.lyintendstoexpandtheir abilitytoingestandprocesslarge data sets;however,Tableaucurrentlyhasthatcapabilitybuiltintotheirsoftware. Developer Data Scientist Figure 10: Alternatives Ranked by Developer Data Scientist Profile The developerdatascientistisresponsible foracquiringandtransformingthe dataintoadatasetthat is usable forother data scientists;therefore,theyare more concernedwithscalabilityandcustomizability.Asnotedinthe mathematical & statisticiandatascientistprofile,Tableauisthe besttool available forscalingwiththe dataquantity.
  • 13. 13 | P a g e Sensitivity Analysis The resultsof the decisionmodel are more sensitivetosome measures overothers. The followingchartshowsthe resultsof the sensitivityanalysisforthe differentmeasuresinthe decisionmodel: 0 {0} 0.495839701 {5} 0.515707251 {5} 0.509084734 {5} 0.55544235 {5} 0.495839701 {0} 0.5753099 {5} 0.569536424 {40} 0.595177449 {4} 0.495839701 {5} 0.495839701 {5} 0.621667516 {5} 0.528952284 {5} 0 {0} 0.469349635 {1} 0.475972151 {1} 0.456104602 {1} 0.495839701 {1} 0.422992019 {1999} 0.495839701 {1} 0.476821192 {1} 0.495839701 {1} 0.389879436 {1} 0.38325692 {1} 0.495839701 {1} 0.396501953 {1} 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Score Data Size (DS) Number of Supported File Types (FT) Design Capability (DE) GUI (G) Cost (C) Access Control (AC) Data Connectors (DC) Number of Supported Programming Languages (PL) Programming Capability (PC) Charting Capability (CH) Analytical Capability (AN) Interactive Product Capability (IP) Choose a Data Visualization Tool
  • 14. 14 | P a g e Conclusion Consistently,TableauandPlot.lyemerge ashighlyrankedalternatives.Tableauwaschosenasthe bestoptionforthe overall objective,the mathematical &statisticiandatascientistprofile,andthe developerdatascientistprofile;while Plot.lywaschosenasthe bestoptionfor the domaindatascientistprofile.Theseoptionshave differenttradeoffs dependingthe datascientistneedsand the productrequirements.The datascientistprofilesare leaningtowards Tableauandto gain a more granular insightintothe besttool dependingonproductrequirementsthe model needsto be refinedevenfurther.Thismodel isstillfairlyhigh-level andiscurrentlyunderreview bythe decisionmaker togain that level of granularity. Thisdecisionmodel wasformedbyelicitingthe projectteam membersandreferringtothe project’skeystrategic documents.Ideally,the decisionmakerwouldhave been elicitedconsistentlythroughoutthe process;however,he was absentdue to a familyemergencyforthe majorityof the projectduration. Recently,the decisionmakerreturnedtothe projectand he iscurrentlyreviewingthe resultsof the analysis. These changeswillbe incorporatedintothe future model alongwithanyotheridentifiedchangesmade bythe decisionmaker. Duringthe reviewprocessadditional resourceswere identifiedforrefiningthe decisionmodel.In-Q-Telconductedanin- depthstudyof over50 data visualizationtoolswithnumerousattributesidentified.The decisionmakerprovideda documentwithquantifiablemeasuresfordashboardproductrequirements.The studyandthe measureswillbe reviewedtodetermineif theyneedtobe incorporatedintothe advanceddecisionmodelorif the resultsof thiscurrent studyisenoughto drive a decision. i Note, duringthe analysis process,the decision maker suddenly needed to be absent for an extended period of time due to a fa mily emergency. The decision maker returned towards the end of the initiativeand has identified areas for further analysis.