Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Practical Cloud


Published on

Talk on public cloud architectures, particularly for data pipelines. Includes Serverless and Server-based architectures built on AWS and GCP.

Published in: Technology
  • Want to earn $4000/m? Of course you do. Learn how when you join today! ■■■
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Practical Cloud

  1. 1. • 5 – 15x market share of all others (US market)AWS Dominates • AWS cloud bills are for ‘old services’ (EC2, RDS)80% EC2++ • First-time cloud projects outcome rate*50% FAIL
  2. 2. Definitions Vendors Patterns Lessons 1 2 3 4
  3. 3. 10 Million Executions = ~ $ 500 USD
  4. 4. Files AWS S3 RDBMS AWS RDS VMs AWS EC2
  5. 5. Use image Files -> “BIG” Application PLUS OS Coordinate with Load Balancers Mature Technology Use Dockerfiles -> “small” Application Coordinate with Container Managers New Technology
  6. 6. Just write code –> “tiny” Function / Method Auto-scales Very NewTechnology Use Dockerfiles –> “small” Application Coordinate via Container Mgr. NewTechnology
  7. 7. 100 10 1 VM Container Lambda Drive to Lambda – Save Money Cost
  8. 8. 5 50 95 Lambda Container VM UseVMs – Keep Control Control
  9. 9. “But why are Compute AND File Storage commodities on Azure, AWS AND GCP?”
  10. 10. Compute EC2 Containers Lambda Files S3 Glacier Data RDS DynamoDB Other Machine Learning Kinesis Serverless
  11. 11. Alpha Some parts may work Service may be changed Service may be discontinued Beta Many parts should work Service may be changed Year One Most parts should work Can include some service integrations YearTwo All parts should work Patterns and scripts emerge YearThree Service is stable Tools and partners emerge
  12. 12. $$ $ $$$
  13. 13. Functions • Logic Apps • No Code • Generates JSON Which Vendor?
  14. 14. Server-based Solutions
  15. 15. Server-based Solutions High Availability Core Security
  16. 16. Server-based Solutions High Availability Security
  17. 17. Server-based Solutions High Availability Security Scalability Cost Control
  18. 18. Google Compute Engine Very fast to start …globally Automatically discounted for sustained use Easier to size via the ’slider’
  19. 19. Hadoop/Spark Transform Visualization Client Data Lake ANSI SQL Exploratory ANSI SQL Warehouse
  20. 20. Kappa Architecture on the Cloud – Servers?
  21. 21. PySpark ETL Glue Visualization Client - QuickSight Data Lake – S3 Explore SQL Athena MPP SQL Spectrum
  22. 22. AWS
  23. 23. AWS
  24. 24. Beam ETL Dataflow Visualization Client - DataStudio Data Lake – GCS Explore SQL BigQuery MPP SQL BigQuery
  25. 25. Servers / IaaS? PaaS? Serverless? Integration testing? Orchestration? Deployment?
  26. 26. “My” Programming Language ? Debugging? Unit testing? Integration testing? Orchestration? Deployment?
  27. 27. AWS X-Ray
  28. 28. Reduce Attack Surface Test external connections Minimal permissions Granular policies Unique credentials
  29. 29. Service Costs Training Costs Tooling Costs Migration Costs Learning Costs
  30. 30. ServiceType Servers (or Containers) Serverless Compute EC2 Lambda Files File Services on EC2 S3 SQL on Relational Data RDBMS on EC2 or RDS Athena / Redshift Spectrum Data Pipeline Kafka cluster on EC2 Kinesis Machine Learning EMR with Spark ML or Hadoop on EC2 Machine Learning API IoT MQTT Message Broker RabbitMQ on EC2 IoT Broker NoSQL MongoDB… on EC2 DynamoDB