SlideShare a Scribd company logo
Architecture, Products
and Total Cost of
Ownership of the
Leading Machine
Learning Stacks
Presented by: William McKnight
“#1 Global Influencer in Big Data” Thinkers360
President, McKnight Consulting Group
A 2-time Inc. 5000 Company
linkedin.com/in/wmcknight/
www.mcknightcg.com
(214) 514-1444
Second Thursday of Every Month, at 2:00 ET
With William McKnight
TELECOMMUNICATIONS
PHARMACEUTICAL
EDUCATION
CONSUMER PRODUCTS/RETAIL FINANCIAL INSURANCE/HEALTHCARE
GOVERNMENT AND UTILITIES
OTHER
PUBLISHING
McKnight Consulting Group Partial Client List
Performance Features
• Micro-partitions
• Clustering Keys
• Clustering Depth
• Multi-Clusters
• Transparent Materialized Views
• Search Optimization Service
• Query Acceleration Service
Individual Query Performance Feature
Comparison
Improves Clustering Materialized Views Search Opt. Service
Equality searches X X X
Range searches X X X
Sort operations X X
Substring and Regex X
VARIANT searches X
Geospatial X
Extra Costs
Compute X X X
Storage X X
Usability Features
• External Tables
• Dynamic Data Masking
• Time Travel and Fail Safe
• Semi-Structured Data
• Snowpipe
• Snowsight Dashboards
• Snowpark API
6
Warehouses
• 10 sizes
• Available in Standard
and Snowpark
• New Snowpark-
optimized with 16x
memory than
Standard (open
preview)
Size
XS
S
M
L
XL
2XL
3XL
4XL
5XL
6XL
Pricing
• Watch For:
– Concurrency and
price-per-
performance
– Effective Warehouses
(Multi-clusters)
– Add-on compute:
• Automatic
Clustering
• Materialized View
Refreshes
• Search
Optimization
• Query
Acceleration
– Time travel storage
• Discounts
(A) Snowflake ML Stack
Category
Dedicated Compute Snowflake
Storage Snowflake
Data Integration AWS Glue
Streaming Kafka Confluent Cloud
Spark Analytics Amazon EMR + Kinesis Spark
Data Lake Snowflake External Tables
Business Intelligence Tableau
Machine Learning Amazon SageMaker
Identity Management Amazon IAM
Data Catalog Amazon Glue Data Catalog
(A) Snowflake Machine Learning Stack
Azure Kubernetes Services (AKS)
Front-end
E-Commerce
Website
Back-end
Cart
Profile
Products
Stock
Deployed
Recommender
ML Model Training &
Deployment
Automatic
Model deployment
Databricks Databricks
Transactional
Database
Cloud Firestore
Data Loading
Data
Processing
Cloud Data Fusion
Snowflake
Data
Transformation
Data Lake +
Historical Data
Data Marts
Cloud Storage
(data lake)
MDM
Database
Talend
Data Governance:
• Partner Solutions
• Marketplace solutions
11
Performance Features
• Redshift Advisor
• Workload Management
• Concurrency Scaling
• Transparent Materialized Views
• Short Query Acceleration
12
Usability Features
• Redshift Spectrum (External Tables)
• Automated Materialized Views (AutoMV)
• Dynamic Data Masking
• Federated Queries
• Semi-Structured and SUPER Type
• Streaming Ingest with Kinesis
• Python UDF
• Redshift ML
Provisioned Clusters vs. Serverless
Provisioned Serverless
Managed Self managed Fully managed
Compute Choose node type and cluster size Workgroup
Storage Provisioned disk capacity Namespace
WLM User configured Not applicable
Concurrent scaling User enabled Not applicable
Scale out/up/down User-initiated cluster resize Not applicable
Pause/resume Manual Automatic
Compute billing Per second when not paused
$/hour rate
Per second when workloads run
RPU-hour rate
Storage billing $ per managed storage amount $ per GB-month used
More detailed comparison: https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-console-comparison.html
Cluster Sizes
AWS Type CPU/RAM Node Range Price Per Node
dc2.large 2 / 15 GB 1 – 32 $0.25
dc2.8xlarge 32 / 244 GB 2 – 128 $4.80
ra3.xlplus 4 / 32 GB 1 – 32 $1.09
ra3.4xlarge 12 / 96 GB 2 – 32 $3.26
ra3.16xlarge 48 / 384 GB 2 – 128 $13.04
Serverless (Base & Max RPUs) ? 32 – 512 RPUs* $0.36
*Redshift Processing Units are available in units of 8 (32, 40, 48, and so on, up to 512)
Pricing
• Price-per-
performance
• Watch For:
– Concurrency
Scaling
– Serverless
RPU Usage
– SageMaker
costs for
Redshift ML
• Discounts
Redshift ML Stack
Category
Dedicated Compute Amazon Redshift RA3
Storage Amazon Redshift Managed Storage
Data Integration AWS Glue
Streaming Amazon Kinesis Data Analytics
Spark Analytics Amazon EMR + Kinesis Spark
Data Lake Amazon Redshift Spectrum
Business Intelligence Amazon Quicksight
Machine Learning Amazon SageMaker
Identity Management Amazon IAM
Data Catalog Amazon Glue Data Catalog
Amazon Elastic Kubernetes services (Amazon EKS)
Front-end
E-Commerce
Website
Back-end
Cart
Profile
Products
Stock
Deployed
Recommender
ML Model
Training &
Deployment
Automatic
Model
deployment
SageMaker
model
endpoint
Amazon
SageMaker
Transactional
Database
Amazon
Dynamo DB
Data
Loading
Amazon
Glue
Data
Processing
Amazon
Redshift
Data Lake
+ Historical
Data
S3
(data
lake)
Data Governance:
• AWS Partner Solutions
• AWS Marketplace
solutions
MDM
Database
Talend
AWS Machine Learning Stack
19
Performance Features
• Workload Management
• Estimated query plan (coming soon)
• Transparent materialized views
• Adaptive caching (recently use data on
NVMe)
• Azure Advisor
Usability Features
• Dynamic Data Masking
• External Data Sources
• Synapse Link
• SynapseML
21
Data Warehouse Units (DWU)
• Official: “a collection of analytic
resources…defined as a combination of
CPU, memory, and IO…[which] represents
an abstract, normalized measure of
compute resources and performance.”
• Increasing DWUs linearly improves
performance
DWUs
100
200
300
400
500
1000
1500
2000
2500
3000
5000
6000
7500
10000
15000
30000
Pricing
DWUs Price/hr
100 $1.20
200 $2.40
300 $3.60
400 $4.80
500 $6
1000 $12
1500 $18
2000 $24
2500 $30
3000 $36
5000 $60
6000 $72
7500 $90
10000 $120
15000 $180
30000 $360
Component Price
Serverless $5/TB processed
Dedicated $/hour >>>
1-year Reserved 37% discount
3-year Reserved 65% discount
Storage $23/TB-month
• Additional charges (per vCore-hour) for Synapse
Link, Data Explorer, and Spark Pools
• Pipelines priced by DIU-hour, runtime-hour, and
per activity run
Microsoft Synapse ML Stack
Category
Dedicated Compute Azure Synapse Analytics Workspace
Storage Azure Synapse Analytics SQL Pool
Data Integration Azure Data Factory (ADF)
Streaming
Azure Stream Analytics (for Analytics)
and Azure Event Hubs
Spark Analytics Big Data Analytics with Apache Spark
Data Lake Amazon Redshift Spectrum
Business Intelligence Amazon Quicksight
Machine Learning Amazon Sagemaker
Identity Management Amazon IAM
Data Catalog Amazon Purview
Azure Kubernetes Services (AKS)
Front-end
E-Commerce
Website
Back-end
Cart
Profile
Products
Stock
Deployed
Recommender
ML Model Runtime
Azure ML
managed online
endpoint
Azure Machine
Learning
Transactional
Database
Azure Cosmos
DB Core API
Analytical
Store (HTAP)
Azure Cosmos
DB Analytical
Store (Parquet)
Cognitive
Services
Sentiment
analysis on
product reviews
to enhance the
recommender
model
Synapse
Link
Enables
automatic
sync
to
analytical
store
(no
ETL)
Data
Processing
Azure Synapse Analytics
Data Lake +
Historical Data
ADL Gen2 Data Lake:
HTAP data, sentiment
data, historical order data
Automatic
Model
deployment
(MLOps)
Data Transformation &
ML Model Training
Azure Databricks Delta Live Tables SparkML
Microsoft
Purview
Data Management & Governance
Discover, classify, track lineage, and protect sensitive data
(customer profiles, etc.)
MDM
Database
Talend
Azure Machine Learning Stack
26
Performance Features
• BQ Architecture and Slots
• Clustering and Partitioning
• Transparent Materialized Views
• BI Engine
Usability Features
• BigQuery Omni – External Tables
• Time Travel
• Migration Service – SQL Translation
• Looker Studio
• Colab Notebooks
• BigQuery ML
28
Pricing
Compute
BigQuery Omni
On-demand $5 per TB $5 per TB
Flex $4.00/hr per
100 slots
$5.00/hr per
100 slots
Monthly
Commit*
$2.74/hr per
100 slots
$3.42/hr per
100 slots
Annual
Commit*
$2.33/hr per
100 slots
$2.91/hr per
100 slots
BI Engine $0.0416/hr per
GB
N/A
Storage1
Logical2 Physical3
Active $0.02/GB-
month
$0.04/GB-
month
Long-term4 $0.01/GB-
month
$0.02/GB-
month
Batch loading FREE
Streaming
inserts
$0.01 per 200MB
Storage API $0.025 per 1GB
1 You get to choose logical or physical billing
2 Logical = Uncompressed size (Time travel free)
3 Physical = Compressed size + Time travel
4 Table not modified in 90 days
*comes with some free BI Engine
Google BigQuery ML Stack
Category
Dedicated Compute Google BigQuery
Storage Google BigQuery Storage
Data Integration Google Dataflow (Batch)
Streaming Google Dataflow (Streaming)
Spark Analytics Google Dataproc
Data Lake Google BigQuery On-Demand Infrastructure
Business Intelligence Google BigQuery BI Engine
Machine Learning Google BigQuery ML
Identity Management Google Cloud IAM
Data Catalog Google Data Catalog
Azure Kubernetes Services (AKS)
Front-end
E-Commerce
Website
Back-end
Cart
Profile
Products
Stock
Deployed
Recommender
ML Model Training &
Deployment
Automatic
Model deployment
Vertex AI Prediction Vertex AI
Data Governance
• Google Dataplex
Transactional
Database
Cloud
Firestore
Data Loading
Data
Processing
Cloud Data Fusion
BigQuery
Data
Transformation
Data Lake +
Historical
Data
Cloud
Dataprep
Cloud Dataflow
Cloud Storage
(data lake)
MDM
Database
Talend
Google Machine Learning Stack
Technology Stack Costs
Sample Stack Cost Breakout
Line Item Pricing (AWS)
Lookup CostCenter Category Platform Product Size UnitNode
Amazon Redshift ra3.4xlarge-Infrastructure Infrastructure
01-Dedicated
Compute AWS Amazon Redshift ra3.4xlarge 1-Medium ra3.4xlarge
Amazon Redshift ra3.16xlarge-Infrastructure Infrastructure
01-Dedicated
Compute AWS Amazon Redshift ra3.16xlarge 2-Large ra3.16xlarge
Amazon Redshift Managed Storage-Storage Storage 02-Storage AWS
Amazon Redshift Managed
Storage 1-Medium GB-month
Amazon Redshift Managed Storage-Storage Storage 02-Storage AWS
Amazon Redshift Managed
Storage 2-Large GB-month
AWS Glue-Software Software 03-Data Integration AWS AWS Glue 1-Medium DPU-Hour
AWS Glue-Software Software 03-Data Integration AWS AWS Glue 2-Large DPU-Hour
Amazon Kinesis Data Analytics-Infrastructure Infrastructure 04-Streaming AWS Amazon Kinesis Data Analytics 1-Medium KPU-Hour
Amazon Kinesis Data Analytics-Infrastructure Infrastructure 04-Streaming AWS Amazon Kinesis Data Analytics 2-Large KPU-Hour
Amazon Kinesis Data Analytics-Storage Storage 04-Streaming AWS Amazon Kinesis Data Analytics 1-Medium GB-month
Amazon Kinesis Data Analytics-Storage Storage 04-Streaming AWS Amazon Kinesis Data Analytics 2-Large GB-month
Amazon EMR-Infrastructure Infrastructure 05-Spark Analytics AWS Amazon EMR 1-Medium r5.4xlarge
Amazon EMR-Software Software 05-Spark Analytics AWS Amazon EMR 1-Medium EMR on r5.4xlarge
Amazon EMR-Infrastructure Infrastructure 05-Spark Analytics AWS Amazon EMR 2-Large r5.4xlarge
Amazon EMR-Software Software 05-Spark Analytics AWS Amazon EMR 2-Large EMR on r5.4xlarge
Amazon Kinesis-Shards Shards 05-Spark Analytics AWS Amazon Kinesis 1-Medium Shard-hour
Amazon Kinesis-Shards Shards 05-Spark Analytics AWS Amazon Kinesis 2-Large Shard-hour
Amazon Redshift Spectrum-Software Software 06-Data Exploration AWS Amazon Redshift Spectrum 1-Medium TB-month
Amazon Redshift Spectrum-Software Software 06-Data Exploration AWS Amazon Redshift Spectrum 2-Large TB-month
Amazon Redshift ra3.4xlarge-Infrastructure Infrastructure 06-Data Exploration AWS Amazon Redshift ra3.4xlarge 1-Medium ra3.4xlarge
Amazon Redshift ra3.4xlarge-Infrastructure Infrastructure 06-Data Exploration AWS Amazon Redshift ra3.4xlarge 2-Large ra3.4xlarge
Amazon EMR-Infrastructure Infrastructure 07-Data Lake AWS Amazon EMR 1-Medium r5.4xlarge
Amazon EMR-Software Software 07-Data Lake AWS Amazon EMR 1-Medium EMR on r5.4xlarge
Amazon EMR-Infrastructure Infrastructure 07-Data Lake AWS Amazon EMR 2-Large r5.4xlarge
Amazon EMR-Software Software 07-Data Lake AWS Amazon EMR 2-Large EMR on r5.4xlarge
Amazon Quicksight Readers-Licenses Licenses
08-Business
Intelligence AWS Amazon Quicksight Readers 1-Medium User-month
Amazon Quicksight Readers-Licenses Licenses
08-Business
Intelligence AWS Amazon Quicksight Readers 2-Large User-month
Amazon Quicksight Authors-Licenses Licenses
08-Business
Intelligence AWS Amazon Quicksight Authors 1-Medium User-month
Amazon Quicksight Authors-Licenses Licenses
08-Business
Intelligence AWS Amazon Quicksight Authors 2-Large User-month
Amazon SageMaker-Infrastructure Infrastructure 09-Machine Learning AWS Amazon SageMaker 1-Medium ml.r5.2xlarge
Amazon SageMaker-Software Software 09-Machine Learning AWS Amazon SageMaker 1-Medium ml.r5.2xlarge
Amazon SageMaker-Infrastructure Infrastructure 09-Machine Learning AWS Amazon SageMaker 2-Large ml.r5.2xlarge
Amazon SageMaker-Software Software 09-Machine Learning AWS Amazon SageMaker 2-Large ml.r5.2xlarge
Amazon IAM-Licenses Licenses
10-Identity
Management AWS Amazon IAM 1-Medium Included
Amazon IAM-Licenses Licenses
10-Identity
Management AWS Amazon IAM 2-Large Included
AWS Glue Data Catalog-Software Software 11-Data Catalog AWS AWS Glue Data Catalog 1-Medium 100K objects
AWS Glue Data Catalog-Software Software 11-Data Catalog AWS AWS Glue Data Catalog 2-Large 100K objects
34
Stack Cost by Use Case for Medium-Sized
Enterprises
• 1st Year of Project
• 1st Large Scale ML Project
• 1.3M – 3.2M
35
Stack Cost by Use Case for Large Size
Enterprises
• 1st Year of Project
• 1st Large Scale ML Project
• 3.4M – 8.5M
36
Project ROI & TCO
37
ROI =
Benefit
TCO Infrastructure Software
+
FTE
+
Consulting
+
Summary
• For large-sized enterprise projects, the stack cost typically ranges between $3.4M-$8.5M to
ensure successful deployment of ML-based projects into production, in addition to labor
expenses.
• The total cost of ownership of cloud analytics platforms scales up as the demand for analytics
at your company grows over time.
• Snowflake adopts a usage-based or consumption-based pricing model, where users are
charged based on the amount of data processed, resulting in higher costs for higher usage
levels.
• Redshift offers both provisioned clusters and serverless options to cater to different business
requirements.
• Synapse is available for purchase in DWU, which comprises a collection of analytic resources
that can be adjusted to meet the specific needs of the organization.
• BigQuery slots operate as virtual CPUs to ensure efficient data processing and analysis.
• While there are numerous technology stacks available, the ones mentioned here are just a few
examples.
• Dedicated Compute, Storage, Data Integration, Streaming, Spark Analytics, Data Lake,
Business Intelligence, Machine Learning, Identity Management, and Data Catalog are all
essential components of a modern data management and analytics ecosystem.
• Estimating the costs of building a technology stack can be a complex task and requires careful
consideration of various factors.
• It is recommended to seek reliable performance at a predictable price to ensure the
successful implementation of data management and analytics projects.
• The true measure of project efficacy is Return on Investment (ROI), and organizations should
strive to achieve positive ROI in their data management and analytics endeavors.
Architecture, Products
and Total Cost of
Ownership of the
Leading Machine
Learning Stacks
Presented by: William McKnight
“#1 Global Influencer in Big Data” Thinkers360
President, McKnight Consulting Group
A 2-time Inc. 5000 Company
linkedin.com/in/wmcknight/
www.mcknightcg.com
(214) 514-1444
Second Thursday of Every Month, at 2:00 ET
With William McKnight

More Related Content

Similar to Architecture, Products, and Total Cost of Ownership of the Leading Machine Learning Stacks

Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute Services
Amazon Web Services
 
cloud-training-pricing-billing.pdf
cloud-training-pricing-billing.pdfcloud-training-pricing-billing.pdf
cloud-training-pricing-billing.pdf
Abhi850745
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
Amazon Web Services
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Sql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su AzureSql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su Azure
Marco Obinu
 
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEIDATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
Big Data Week
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
DATAVERSITY
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the field
Stéphane Dorrekens
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
Amazon Web Services
 
Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0
Cloudian
 
Google file system
Google file systemGoogle file system
Google file system
Ankit Thiranh
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
RightScale
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
SingleStore
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Zilliz
 
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
Amazon Web Services
 
ADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic SolutionsADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic Solutions
DATAVERSITY
 

Similar to Architecture, Products, and Total Cost of Ownership of the Leading Machine Learning Stacks (20)

Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute Services
 
cloud-training-pricing-billing.pdf
cloud-training-pricing-billing.pdfcloud-training-pricing-billing.pdf
cloud-training-pricing-billing.pdf
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Sql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su AzureSql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su Azure
 
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEIDATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the field
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0
 
Google file system
Google file systemGoogle file system
Google file system
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
 
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
 
ADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic SolutionsADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic Solutions
 

More from DATAVERSITY

Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
DATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
DATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
DATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
DATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
DATAVERSITY
 

More from DATAVERSITY (20)

Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 

Recently uploaded

tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 

Recently uploaded (20)

tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 

Architecture, Products, and Total Cost of Ownership of the Leading Machine Learning Stacks

  • 1. Architecture, Products and Total Cost of Ownership of the Leading Machine Learning Stacks Presented by: William McKnight “#1 Global Influencer in Big Data” Thinkers360 President, McKnight Consulting Group A 2-time Inc. 5000 Company linkedin.com/in/wmcknight/ www.mcknightcg.com (214) 514-1444 Second Thursday of Every Month, at 2:00 ET With William McKnight
  • 2. TELECOMMUNICATIONS PHARMACEUTICAL EDUCATION CONSUMER PRODUCTS/RETAIL FINANCIAL INSURANCE/HEALTHCARE GOVERNMENT AND UTILITIES OTHER PUBLISHING McKnight Consulting Group Partial Client List
  • 3.
  • 4. Performance Features • Micro-partitions • Clustering Keys • Clustering Depth • Multi-Clusters • Transparent Materialized Views • Search Optimization Service • Query Acceleration Service
  • 5. Individual Query Performance Feature Comparison Improves Clustering Materialized Views Search Opt. Service Equality searches X X X Range searches X X X Sort operations X X Substring and Regex X VARIANT searches X Geospatial X Extra Costs Compute X X X Storage X X
  • 6. Usability Features • External Tables • Dynamic Data Masking • Time Travel and Fail Safe • Semi-Structured Data • Snowpipe • Snowsight Dashboards • Snowpark API 6
  • 7. Warehouses • 10 sizes • Available in Standard and Snowpark • New Snowpark- optimized with 16x memory than Standard (open preview) Size XS S M L XL 2XL 3XL 4XL 5XL 6XL
  • 8. Pricing • Watch For: – Concurrency and price-per- performance – Effective Warehouses (Multi-clusters) – Add-on compute: • Automatic Clustering • Materialized View Refreshes • Search Optimization • Query Acceleration – Time travel storage • Discounts
  • 9. (A) Snowflake ML Stack Category Dedicated Compute Snowflake Storage Snowflake Data Integration AWS Glue Streaming Kafka Confluent Cloud Spark Analytics Amazon EMR + Kinesis Spark Data Lake Snowflake External Tables Business Intelligence Tableau Machine Learning Amazon SageMaker Identity Management Amazon IAM Data Catalog Amazon Glue Data Catalog
  • 10. (A) Snowflake Machine Learning Stack Azure Kubernetes Services (AKS) Front-end E-Commerce Website Back-end Cart Profile Products Stock Deployed Recommender ML Model Training & Deployment Automatic Model deployment Databricks Databricks Transactional Database Cloud Firestore Data Loading Data Processing Cloud Data Fusion Snowflake Data Transformation Data Lake + Historical Data Data Marts Cloud Storage (data lake) MDM Database Talend Data Governance: • Partner Solutions • Marketplace solutions
  • 11. 11
  • 12. Performance Features • Redshift Advisor • Workload Management • Concurrency Scaling • Transparent Materialized Views • Short Query Acceleration 12
  • 13. Usability Features • Redshift Spectrum (External Tables) • Automated Materialized Views (AutoMV) • Dynamic Data Masking • Federated Queries • Semi-Structured and SUPER Type • Streaming Ingest with Kinesis • Python UDF • Redshift ML
  • 14. Provisioned Clusters vs. Serverless Provisioned Serverless Managed Self managed Fully managed Compute Choose node type and cluster size Workgroup Storage Provisioned disk capacity Namespace WLM User configured Not applicable Concurrent scaling User enabled Not applicable Scale out/up/down User-initiated cluster resize Not applicable Pause/resume Manual Automatic Compute billing Per second when not paused $/hour rate Per second when workloads run RPU-hour rate Storage billing $ per managed storage amount $ per GB-month used More detailed comparison: https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-console-comparison.html
  • 15. Cluster Sizes AWS Type CPU/RAM Node Range Price Per Node dc2.large 2 / 15 GB 1 – 32 $0.25 dc2.8xlarge 32 / 244 GB 2 – 128 $4.80 ra3.xlplus 4 / 32 GB 1 – 32 $1.09 ra3.4xlarge 12 / 96 GB 2 – 32 $3.26 ra3.16xlarge 48 / 384 GB 2 – 128 $13.04 Serverless (Base & Max RPUs) ? 32 – 512 RPUs* $0.36 *Redshift Processing Units are available in units of 8 (32, 40, 48, and so on, up to 512)
  • 16. Pricing • Price-per- performance • Watch For: – Concurrency Scaling – Serverless RPU Usage – SageMaker costs for Redshift ML • Discounts
  • 17. Redshift ML Stack Category Dedicated Compute Amazon Redshift RA3 Storage Amazon Redshift Managed Storage Data Integration AWS Glue Streaming Amazon Kinesis Data Analytics Spark Analytics Amazon EMR + Kinesis Spark Data Lake Amazon Redshift Spectrum Business Intelligence Amazon Quicksight Machine Learning Amazon SageMaker Identity Management Amazon IAM Data Catalog Amazon Glue Data Catalog
  • 18. Amazon Elastic Kubernetes services (Amazon EKS) Front-end E-Commerce Website Back-end Cart Profile Products Stock Deployed Recommender ML Model Training & Deployment Automatic Model deployment SageMaker model endpoint Amazon SageMaker Transactional Database Amazon Dynamo DB Data Loading Amazon Glue Data Processing Amazon Redshift Data Lake + Historical Data S3 (data lake) Data Governance: • AWS Partner Solutions • AWS Marketplace solutions MDM Database Talend AWS Machine Learning Stack
  • 19. 19
  • 20. Performance Features • Workload Management • Estimated query plan (coming soon) • Transparent materialized views • Adaptive caching (recently use data on NVMe) • Azure Advisor
  • 21. Usability Features • Dynamic Data Masking • External Data Sources • Synapse Link • SynapseML 21
  • 22. Data Warehouse Units (DWU) • Official: “a collection of analytic resources…defined as a combination of CPU, memory, and IO…[which] represents an abstract, normalized measure of compute resources and performance.” • Increasing DWUs linearly improves performance DWUs 100 200 300 400 500 1000 1500 2000 2500 3000 5000 6000 7500 10000 15000 30000
  • 23. Pricing DWUs Price/hr 100 $1.20 200 $2.40 300 $3.60 400 $4.80 500 $6 1000 $12 1500 $18 2000 $24 2500 $30 3000 $36 5000 $60 6000 $72 7500 $90 10000 $120 15000 $180 30000 $360 Component Price Serverless $5/TB processed Dedicated $/hour >>> 1-year Reserved 37% discount 3-year Reserved 65% discount Storage $23/TB-month • Additional charges (per vCore-hour) for Synapse Link, Data Explorer, and Spark Pools • Pipelines priced by DIU-hour, runtime-hour, and per activity run
  • 24. Microsoft Synapse ML Stack Category Dedicated Compute Azure Synapse Analytics Workspace Storage Azure Synapse Analytics SQL Pool Data Integration Azure Data Factory (ADF) Streaming Azure Stream Analytics (for Analytics) and Azure Event Hubs Spark Analytics Big Data Analytics with Apache Spark Data Lake Amazon Redshift Spectrum Business Intelligence Amazon Quicksight Machine Learning Amazon Sagemaker Identity Management Amazon IAM Data Catalog Amazon Purview
  • 25. Azure Kubernetes Services (AKS) Front-end E-Commerce Website Back-end Cart Profile Products Stock Deployed Recommender ML Model Runtime Azure ML managed online endpoint Azure Machine Learning Transactional Database Azure Cosmos DB Core API Analytical Store (HTAP) Azure Cosmos DB Analytical Store (Parquet) Cognitive Services Sentiment analysis on product reviews to enhance the recommender model Synapse Link Enables automatic sync to analytical store (no ETL) Data Processing Azure Synapse Analytics Data Lake + Historical Data ADL Gen2 Data Lake: HTAP data, sentiment data, historical order data Automatic Model deployment (MLOps) Data Transformation & ML Model Training Azure Databricks Delta Live Tables SparkML Microsoft Purview Data Management & Governance Discover, classify, track lineage, and protect sensitive data (customer profiles, etc.) MDM Database Talend Azure Machine Learning Stack
  • 26. 26
  • 27. Performance Features • BQ Architecture and Slots • Clustering and Partitioning • Transparent Materialized Views • BI Engine
  • 28. Usability Features • BigQuery Omni – External Tables • Time Travel • Migration Service – SQL Translation • Looker Studio • Colab Notebooks • BigQuery ML 28
  • 29. Pricing Compute BigQuery Omni On-demand $5 per TB $5 per TB Flex $4.00/hr per 100 slots $5.00/hr per 100 slots Monthly Commit* $2.74/hr per 100 slots $3.42/hr per 100 slots Annual Commit* $2.33/hr per 100 slots $2.91/hr per 100 slots BI Engine $0.0416/hr per GB N/A Storage1 Logical2 Physical3 Active $0.02/GB- month $0.04/GB- month Long-term4 $0.01/GB- month $0.02/GB- month Batch loading FREE Streaming inserts $0.01 per 200MB Storage API $0.025 per 1GB 1 You get to choose logical or physical billing 2 Logical = Uncompressed size (Time travel free) 3 Physical = Compressed size + Time travel 4 Table not modified in 90 days *comes with some free BI Engine
  • 30. Google BigQuery ML Stack Category Dedicated Compute Google BigQuery Storage Google BigQuery Storage Data Integration Google Dataflow (Batch) Streaming Google Dataflow (Streaming) Spark Analytics Google Dataproc Data Lake Google BigQuery On-Demand Infrastructure Business Intelligence Google BigQuery BI Engine Machine Learning Google BigQuery ML Identity Management Google Cloud IAM Data Catalog Google Data Catalog
  • 31. Azure Kubernetes Services (AKS) Front-end E-Commerce Website Back-end Cart Profile Products Stock Deployed Recommender ML Model Training & Deployment Automatic Model deployment Vertex AI Prediction Vertex AI Data Governance • Google Dataplex Transactional Database Cloud Firestore Data Loading Data Processing Cloud Data Fusion BigQuery Data Transformation Data Lake + Historical Data Cloud Dataprep Cloud Dataflow Cloud Storage (data lake) MDM Database Talend Google Machine Learning Stack
  • 33. Sample Stack Cost Breakout
  • 34. Line Item Pricing (AWS) Lookup CostCenter Category Platform Product Size UnitNode Amazon Redshift ra3.4xlarge-Infrastructure Infrastructure 01-Dedicated Compute AWS Amazon Redshift ra3.4xlarge 1-Medium ra3.4xlarge Amazon Redshift ra3.16xlarge-Infrastructure Infrastructure 01-Dedicated Compute AWS Amazon Redshift ra3.16xlarge 2-Large ra3.16xlarge Amazon Redshift Managed Storage-Storage Storage 02-Storage AWS Amazon Redshift Managed Storage 1-Medium GB-month Amazon Redshift Managed Storage-Storage Storage 02-Storage AWS Amazon Redshift Managed Storage 2-Large GB-month AWS Glue-Software Software 03-Data Integration AWS AWS Glue 1-Medium DPU-Hour AWS Glue-Software Software 03-Data Integration AWS AWS Glue 2-Large DPU-Hour Amazon Kinesis Data Analytics-Infrastructure Infrastructure 04-Streaming AWS Amazon Kinesis Data Analytics 1-Medium KPU-Hour Amazon Kinesis Data Analytics-Infrastructure Infrastructure 04-Streaming AWS Amazon Kinesis Data Analytics 2-Large KPU-Hour Amazon Kinesis Data Analytics-Storage Storage 04-Streaming AWS Amazon Kinesis Data Analytics 1-Medium GB-month Amazon Kinesis Data Analytics-Storage Storage 04-Streaming AWS Amazon Kinesis Data Analytics 2-Large GB-month Amazon EMR-Infrastructure Infrastructure 05-Spark Analytics AWS Amazon EMR 1-Medium r5.4xlarge Amazon EMR-Software Software 05-Spark Analytics AWS Amazon EMR 1-Medium EMR on r5.4xlarge Amazon EMR-Infrastructure Infrastructure 05-Spark Analytics AWS Amazon EMR 2-Large r5.4xlarge Amazon EMR-Software Software 05-Spark Analytics AWS Amazon EMR 2-Large EMR on r5.4xlarge Amazon Kinesis-Shards Shards 05-Spark Analytics AWS Amazon Kinesis 1-Medium Shard-hour Amazon Kinesis-Shards Shards 05-Spark Analytics AWS Amazon Kinesis 2-Large Shard-hour Amazon Redshift Spectrum-Software Software 06-Data Exploration AWS Amazon Redshift Spectrum 1-Medium TB-month Amazon Redshift Spectrum-Software Software 06-Data Exploration AWS Amazon Redshift Spectrum 2-Large TB-month Amazon Redshift ra3.4xlarge-Infrastructure Infrastructure 06-Data Exploration AWS Amazon Redshift ra3.4xlarge 1-Medium ra3.4xlarge Amazon Redshift ra3.4xlarge-Infrastructure Infrastructure 06-Data Exploration AWS Amazon Redshift ra3.4xlarge 2-Large ra3.4xlarge Amazon EMR-Infrastructure Infrastructure 07-Data Lake AWS Amazon EMR 1-Medium r5.4xlarge Amazon EMR-Software Software 07-Data Lake AWS Amazon EMR 1-Medium EMR on r5.4xlarge Amazon EMR-Infrastructure Infrastructure 07-Data Lake AWS Amazon EMR 2-Large r5.4xlarge Amazon EMR-Software Software 07-Data Lake AWS Amazon EMR 2-Large EMR on r5.4xlarge Amazon Quicksight Readers-Licenses Licenses 08-Business Intelligence AWS Amazon Quicksight Readers 1-Medium User-month Amazon Quicksight Readers-Licenses Licenses 08-Business Intelligence AWS Amazon Quicksight Readers 2-Large User-month Amazon Quicksight Authors-Licenses Licenses 08-Business Intelligence AWS Amazon Quicksight Authors 1-Medium User-month Amazon Quicksight Authors-Licenses Licenses 08-Business Intelligence AWS Amazon Quicksight Authors 2-Large User-month Amazon SageMaker-Infrastructure Infrastructure 09-Machine Learning AWS Amazon SageMaker 1-Medium ml.r5.2xlarge Amazon SageMaker-Software Software 09-Machine Learning AWS Amazon SageMaker 1-Medium ml.r5.2xlarge Amazon SageMaker-Infrastructure Infrastructure 09-Machine Learning AWS Amazon SageMaker 2-Large ml.r5.2xlarge Amazon SageMaker-Software Software 09-Machine Learning AWS Amazon SageMaker 2-Large ml.r5.2xlarge Amazon IAM-Licenses Licenses 10-Identity Management AWS Amazon IAM 1-Medium Included Amazon IAM-Licenses Licenses 10-Identity Management AWS Amazon IAM 2-Large Included AWS Glue Data Catalog-Software Software 11-Data Catalog AWS AWS Glue Data Catalog 1-Medium 100K objects AWS Glue Data Catalog-Software Software 11-Data Catalog AWS AWS Glue Data Catalog 2-Large 100K objects 34
  • 35. Stack Cost by Use Case for Medium-Sized Enterprises • 1st Year of Project • 1st Large Scale ML Project • 1.3M – 3.2M 35
  • 36. Stack Cost by Use Case for Large Size Enterprises • 1st Year of Project • 1st Large Scale ML Project • 3.4M – 8.5M 36
  • 37. Project ROI & TCO 37 ROI = Benefit TCO Infrastructure Software + FTE + Consulting +
  • 38. Summary • For large-sized enterprise projects, the stack cost typically ranges between $3.4M-$8.5M to ensure successful deployment of ML-based projects into production, in addition to labor expenses. • The total cost of ownership of cloud analytics platforms scales up as the demand for analytics at your company grows over time. • Snowflake adopts a usage-based or consumption-based pricing model, where users are charged based on the amount of data processed, resulting in higher costs for higher usage levels. • Redshift offers both provisioned clusters and serverless options to cater to different business requirements. • Synapse is available for purchase in DWU, which comprises a collection of analytic resources that can be adjusted to meet the specific needs of the organization. • BigQuery slots operate as virtual CPUs to ensure efficient data processing and analysis. • While there are numerous technology stacks available, the ones mentioned here are just a few examples. • Dedicated Compute, Storage, Data Integration, Streaming, Spark Analytics, Data Lake, Business Intelligence, Machine Learning, Identity Management, and Data Catalog are all essential components of a modern data management and analytics ecosystem. • Estimating the costs of building a technology stack can be a complex task and requires careful consideration of various factors. • It is recommended to seek reliable performance at a predictable price to ensure the successful implementation of data management and analytics projects. • The true measure of project efficacy is Return on Investment (ROI), and organizations should strive to achieve positive ROI in their data management and analytics endeavors.
  • 39. Architecture, Products and Total Cost of Ownership of the Leading Machine Learning Stacks Presented by: William McKnight “#1 Global Influencer in Big Data” Thinkers360 President, McKnight Consulting Group A 2-time Inc. 5000 Company linkedin.com/in/wmcknight/ www.mcknightcg.com (214) 514-1444 Second Thursday of Every Month, at 2:00 ET With William McKnight