SlideShare a Scribd company logo
1 of 10
Analysis of Data
Science Software
2020
Analysis of Data Science Software, 2020
Goal of the Analysis
The goal was to identify the closest competitors, functionally speaking,
in the data science software industry.
Our hypothesis, was that text analysis and novel nearest neighbor
algorithms could distill text based reports into a useful summary
visualization of the products in the space.
Step 1: Text analysis, Data Transformation:
Two related reports covering the range of product capabilities across
four use cases were used for source data.
Source report content was converted to numeric representations of the
text. A matrix was populated with quantitative values ranging from 1 to
5.
Scope Adjustment
The source reports did not provide full breakdown of sub dimensions
within the four use cases. As a result, many fields in the matrix had
missing values.
The estimated completion time for a full analysis on all four use cases
exceeded cost benefit metrics. The goal was narrowed to focus on the
first use case, which had four sub dimensions: access, preparation,
exploration, and automation.
Step 2: Imputing Values and Adjusting Imputed Values
A value of [3] was set as the estimate for all missing values.
Scores for each sub dimension were averaged into a total for the use
case. Result totals in the cqtinf model score table came close to the
source report result totals for the use case.
Minor adjustments to sub dimension values brought 12 of the 16
product scores into very close alignment with the source scores,
without concern of overfitting.
Alteryx [AYX] was chosen as the fixed variable for further analysis.
Step 3: Analysis Model Outputs
The cqtinf model provided two outputs:
1] a short list of AYX closest competitors, based on the number of times
a competitor is within range, where frequency represents closeness:
2] an input for a complex topological / nearest neighbor data analysis,
based on actual distance measures of competitors.
Dataiku Datawatch TIBCO SAS VDMML KNIME
4 3.3 3.3 3 3
Step 4: Nearest Neighbor Conversion
To perform this nearest neighbor analysis, the matrix score values had
to be transformed into [x, y] grid coordinates which could be plotted on
a graph. cqtinf heuristics provided the conversion.
Once the modeling was completed, the full set of DSML software
products could be positioned on a grid, for summary visualization.
Step 5: Selection of Graphic Style
Four dimensions were required, and a layout that would support a
simple representation where product nodes could straddle two
dimensions without any crisscrossing of relationships, was designed
from scratch.
Comparing two Model Outputs: The resulting TDA map varied slightly
from the simpler frequency table.
Dataiku, #1 in the frequency table, fell just outside the map inclusion
criteria. Expanding the cqtinf model’s ‘top 5’ constraint from 5 to 6
would result in Dataiku being on the map.
According to the map, Rapidminer appears to be within shortlist
distance of AYX, which is inconsistent with other arguments.
The cqtinf node positioning heuristic delivers maps quickly, and in
theory these visualizations are explicatory. Spending more time on
additional calculations may repair this ‘problem,’ but since the model is
transparent, analysts can explain strengths or weaknesses in the
underlying data, and the positioning algorithm, and we can accept that
if some outputs of the model are not perfect, they are still useful.

More Related Content

What's hot

Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
HostedbyConfluent
 

What's hot (20)

Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Time series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_finalTime series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_final
 
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
 
The Future of Data Pipelines
The Future of Data PipelinesThe Future of Data Pipelines
The Future of Data Pipelines
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
 
How to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4jHow to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4j
 
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
 
Building a Codeless Log Pipeline w/ Confluent Sink Connector | Pollyanna Vale...
Building a Codeless Log Pipeline w/ Confluent Sink Connector | Pollyanna Vale...Building a Codeless Log Pipeline w/ Confluent Sink Connector | Pollyanna Vale...
Building a Codeless Log Pipeline w/ Confluent Sink Connector | Pollyanna Vale...
 
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talk
 
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQL
 

Similar to Analysis of data science software 2020

Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
Nithin Kakkireni
 

Similar to Analysis of data science software 2020 (20)

Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
 
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
 
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...
 
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSA METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
 
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSA METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
 
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSA METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
 
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSA METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
Application-oriented ping-pong benchmarking: how to assess the real communica...
Application-oriented ping-pong benchmarking: how to assess the real communica...Application-oriented ping-pong benchmarking: how to assess the real communica...
Application-oriented ping-pong benchmarking: how to assess the real communica...
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLT
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
 
Masters Thesis
Masters ThesisMasters Thesis
Masters Thesis
 
A Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data MiningA Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data Mining
 

Recently uploaded

Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 

Recently uploaded (20)

Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 

Analysis of data science software 2020

  • 1. Analysis of Data Science Software 2020
  • 2. Analysis of Data Science Software, 2020
  • 3. Goal of the Analysis The goal was to identify the closest competitors, functionally speaking, in the data science software industry. Our hypothesis, was that text analysis and novel nearest neighbor algorithms could distill text based reports into a useful summary visualization of the products in the space.
  • 4. Step 1: Text analysis, Data Transformation: Two related reports covering the range of product capabilities across four use cases were used for source data. Source report content was converted to numeric representations of the text. A matrix was populated with quantitative values ranging from 1 to 5.
  • 5. Scope Adjustment The source reports did not provide full breakdown of sub dimensions within the four use cases. As a result, many fields in the matrix had missing values. The estimated completion time for a full analysis on all four use cases exceeded cost benefit metrics. The goal was narrowed to focus on the first use case, which had four sub dimensions: access, preparation, exploration, and automation.
  • 6. Step 2: Imputing Values and Adjusting Imputed Values A value of [3] was set as the estimate for all missing values. Scores for each sub dimension were averaged into a total for the use case. Result totals in the cqtinf model score table came close to the source report result totals for the use case. Minor adjustments to sub dimension values brought 12 of the 16 product scores into very close alignment with the source scores, without concern of overfitting. Alteryx [AYX] was chosen as the fixed variable for further analysis.
  • 7. Step 3: Analysis Model Outputs The cqtinf model provided two outputs: 1] a short list of AYX closest competitors, based on the number of times a competitor is within range, where frequency represents closeness: 2] an input for a complex topological / nearest neighbor data analysis, based on actual distance measures of competitors. Dataiku Datawatch TIBCO SAS VDMML KNIME 4 3.3 3.3 3 3
  • 8. Step 4: Nearest Neighbor Conversion To perform this nearest neighbor analysis, the matrix score values had to be transformed into [x, y] grid coordinates which could be plotted on a graph. cqtinf heuristics provided the conversion. Once the modeling was completed, the full set of DSML software products could be positioned on a grid, for summary visualization.
  • 9. Step 5: Selection of Graphic Style Four dimensions were required, and a layout that would support a simple representation where product nodes could straddle two dimensions without any crisscrossing of relationships, was designed from scratch.
  • 10. Comparing two Model Outputs: The resulting TDA map varied slightly from the simpler frequency table. Dataiku, #1 in the frequency table, fell just outside the map inclusion criteria. Expanding the cqtinf model’s ‘top 5’ constraint from 5 to 6 would result in Dataiku being on the map. According to the map, Rapidminer appears to be within shortlist distance of AYX, which is inconsistent with other arguments. The cqtinf node positioning heuristic delivers maps quickly, and in theory these visualizations are explicatory. Spending more time on additional calculations may repair this ‘problem,’ but since the model is transparent, analysts can explain strengths or weaknesses in the underlying data, and the positioning algorithm, and we can accept that if some outputs of the model are not perfect, they are still useful.