While designing and building Data & AI platforms, you may need to evaluate the options available. Whether your platform would be on-premise or you could use cloud/s services or you would take a hybrid approach.
In any case, you may need to look and evaluate various tools & services for your ingestion, storage, process/analysis and serving layers.
In this post, I have mapped open-source and popular managed cloud services to make our evaluation process a bit easier.
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
1. Data & AI Platforms - Open Source Vs Managed Services (AWS vs Azure vs GCP)
OpenSource AWS Azure GCP Description
Ingest Streaming Apache Kafka
Kinesis
Streams/Firehose Azure Event Hubs Cloud Pub/Sub
Services that allow the mass ingestion of small data inputs, typically from devices
and sensors, to process and route the data.
IoT Kaa AWS IoT Azure IoT Cloud IoT Core
A cloud gateway for managing bidirectional communication with billions of IoT
devices, securely and at scale.
Messages Apache ActiveMQ Amazon SQS Azure Service Bus Cloud Pub/Sub
Supports a set of cloud-based, message-oriented middleware technologies
including reliable message queuing and durable publish/subscribe messaging.
Batch Apache Spark Data Pipeline Azure Data Factory Cloud Data Transfer
Processes and moves data between different compute and storage services, as well
as on-premises data sources at specified intervals. Create, schedule, orchestrate,
and manage data pipelines.
Store InMemory Redis
Amazon
ElastiCache Azure Redis Cache Cloud Memorystore
An in-memory–based, distributed caching service that provides a high-
performance store typically used to offload nontransactional work from a
database.
SQL OLTP MySQL
Amazon
RDS/Aurora Azure SQL Database
Cloud SQL/Cloud
Spanner
Managed relational database service where resiliency, scale, and maintenance are
primarily handled by the platform.
NoSQL-Key Value Redis
Amazon
DynamoDB Table Storage Cloud Bigtable
A globally distributed, multi-model database that natively supports multiple data
models: key-value, documents, graphs, and columnar.
NoSQL-Indexed Apache Cassandra Amazon SimpleDB Azure Cosmos DB Firestore
Object MinIO Amazon S3
Azure Data Lake
Storage Cloud Storage
Object storage service, for use cases including cloud applications, content
distribution, backup, archiving, disaster recovery, and big data analytics.
Cool S3 IA
Azure Storage cool
tier Coldline Storage
Cool storage is a lower-cost tier for storing data that is infrequently accessed and
long-lived.
Archive Amazon Glacier
Azure Storage
archive access tier Archive Storage
Archive storage has the lowest storage cost and higher data retrieval costs
compared to hot and cool storage.
Backup AWS Backup Azure Backup Nearline Storage
Back up and recover files and folders from the cloud, and provide offsite protection
against data loss.
Process MapReduce Hadoop/Spark Amazon EMR
Azure
HDInsight/Databrick
s Cloud Dataproc Managed Hadoop/Spark service.
by ankitrathi.com
2. Data Movement Airflow
AWS Data
Pipeline Azure Data Factory Cloud Dataprep
Processes and moves data between different compute and storage services, as well
as on-premises data sources at specified intervals. Create, schedule, orchestrate,
and manage data pipelines.
Batch Computing Apache Nifi AWS Batch Azure Batch Cloud Dataflow
Run large-scale parallel and high-performance computing applications efficiently in
the cloud.
Serverless
Computing Kubeless AWS Lambda Azure Functions Cloud Functions
Runs code in response to events and automatically manages the computing
resources required by that code.
Analyze Interactive Presto Amazon Athena Data Lake Analytics Cloud Datalab
Provides a serverless interactive query service that uses standard SQL for analyzing
databases.
SQL OLAP Apache Kylin Amazon Redshift
Azure Synapse
Analytics BigQuery analytics
Cloud-based Enterprise Data Warehouse (EDW) that uses Massively Parallel
Processing (MPP) to quickly run complex queries across petabytes of data.
AI/ML
skLearn/Tensorflo
w
Amazon
SageMaker
Azure Machine
Learning AI Platform A cloud service to train, deploy, automate, and manage machine learning models.
Steam Analytics Apache Flink
Amazon Kinesis
Analytics Stream Analytics Cloud Dataflow
Storage and analysis platforms that create insights from large quantities of data, or
data that originates from many sources.
Search Analytics Elasticsearch
Amazon
Elasticsearch Azure Search Cloud Search Delivers full-text search and related search analytics and capabilities.
AI/ML Speech Simon/Kaldi
Amazon
Transcribe/Polly
Cognitive Services -
Speech Speech-to-Text Enables both Speech to Text, and Text into Speech capabilities.
Vision OpenCV
Amazon
Rekognition
Cognitive Services -
Computer Vision Cloud Vision
Computer Vision: Extract information from images to categorize and process visual
data.
Face: Detect, identy, and analyze faces in photos.
Emotions: Recognize emotions in images.
NLP NLTK/OpenNLP
Amazon
Comprehend
Cognitive Services -
Language Cloud Natural Language API
Translation OpenNMT Amazon Translate Cloud Translation
Conversational
Interface RASA Amazon Lex Dialogflow Enterprise Edition
Video
intelligence
Amazon
Rekognition Video Video Indexer Video AI
Auto-generated
Models TPOT/AutoKeras AutoGluon
Automated
Machine Learning AutoML
Fully Managed ML
skLearn/
Tensorflow
Amazon
SageMaker
Azure Machine
Learning AI Platform
Visualize BI & Reporting BIRT
Amazon
QuickSight Power BI DataStudio
Business intelligence tools that build visualizations, perform ad hoc analysis, and
develop business insights from data.
Google Sheets
3. Govern Access Control OpenIAM AWS IAM
MS Identity
Platform Cloud IAM
Allows users to securely control access to services and resources while offering
data security and protection. Create and manage users and groups, and use
permissions to allow and deny access to resources.
Monitoring Nagios
Amazon
CloudWatch Azure Monitor Cloud Monitoring
Comprehensive solution for collecting, analyzing, and acting on telemetry from
your cloud and on-premises environments.
Logging Logstash/Graylog
Amazon
CloudWatch Logs Log Analytics Cloud Logging
Data Catalog TrueDat AWS Glue Data Catalog Data Catalog
A fully managed service that serves as a system of registration and system of
discovery for enterprise data sources
Hive Metastore
Amazon Athena Catalog
Manage
Workflow
Orchestration Airflow AWS Glue Azure Logic Apps Cloud Composer
Cloud technology to build distributed applications using out-of-the-box connectors
to reduce integration challenges. Connect apps, data and devices on-premises or in
the cloud.
Deployment Terraform
AWS
CloudFormation
Azure Resource
Manager
Cloud Deployment
Manager
Provides a way for users to automate the manual, long-running, error-prone, and
frequently repeated IT tasks.
API management
API
Umbrella/APIman API Gateway API Management
Apigee/Cloud
Endpoints A turnkey solution for publishing APIs to external and internal consumers.
DevOps Gradle/Jenkins
AWS
CodeBuild/CodeC
ommit/CodeDepl
oy/CodePipeline Azure DevOps DevOps Fully managed build service that supports continuous integration and deployment.
Compute IaaS OpenStack Amazon EC2 Virtual Machines Compute Engine
Virtual servers allow users to deploy, manage, and maintain OS and server
software. Instance types provide combinations of CPU/RAM. Users pay for what
they use with the flexibility to change sizes.
Containers Kubernetes
Amazon Elastic
Container Service
Azure Kubernetes
Service/Azure
Service Fabric
Google Kubernetes
Engine
Azure Container Instances is the fastest and simplest way to run a container in
Azure, without having to provision any virtual machines or adopt a higher-level
orchestration service.
Auto Scaling KEDA/Nomad AWS Auto Scaling
Virtual Machine
Scale Sets Autoscaling
Allows you to automatically change the number of VM instances. You set defined
metric and thresholds that determine if the platform adds or removes instances.
Load Balancing
Seesaw/LoadMast
er
Application Load
Balancer
Application
Gateway Load balancing
Application Gateway is a layer 7 load balancer. It supports SSL termination, cookie-
based session affinity, and round robin for load-balancing traffic.
ankitrathi.com