AWS for the Data Professional

5,769
-1

Published on

Core AWS services for the data professional - EC2, RDS, S3, Kinesis and more

Published in: Software
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,769
On Slideshare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
21
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • https://console.aws.amazon.com/console/home
  • https://www.windowsazure.com/en-us/home/features/overview/
  • http://aws.amazon.com/rds/sqlserver/ and http://aws.amazon.com/rds/faqs/#4
    Can scale to larger instances, can backup, can restore up to 5 minutes, all tools work, all patching is managed
  • C:Program Files (x86)AWS ToolsDocumentationAWSToolsForWindows.html

    How to use PowerShell -- http://docs.aws.amazon.com/powershell/latest/userguide/pstools-welcome.html
  • C:Program Files (x86)AWS ToolsDocumentationAWSToolsForWindows.html

    How to use PowerShell -- http://docs.aws.amazon.com/powershell/latest/userguide/pstools-welcome.html
  • Hadoop on AWS - http://wiki.apache.org/hadoop/AmazonEC2
  • http://aws.amazon.com/aws-free-usage-tier0/
  • S3 = .12 / GB / month -> $ 150 100 GB / yr
    EBS = .10 / GB / month -> $ 100 100 GB / yr
    EC2 = .12 / hr (Small, on-demand, Windows) -> $ 1051 run all year (up to 3.85 / hr, down to .01 / hr for spot instances) can be PLUS other services, i.e. CloudWatch…
    RDS = .14 / hr (small, on demand, SQL 2008 STD) -> $ 1226 run all year ( up to 3.85 / hr, down to .05 / hr to heavy utilitization PLUS up/down data charged
    Dynamo = .01 / 10 writes & .01 / 50 reads PLUS up/down charges
    Elastic Beanstalk / Windows = starter package $ 42 / month -> $ 504 / yr
  • http://aws.amazon.com/usergroups/ & http://aws.amazon.com/aws-training/
  • AWS for the Data Professional

    1. 1. Amazon Web Services for the SQL Server Professional Lynn Langit Architect Level: Intermediate
    2. 2. What and Why AWS? AWS Amazon’s cloud Set of services Compute Data More Market leader In market longest Usually cheapest Most often used in production
    3. 3. Amazon Web Services
    4. 4. EC 2- VMs for train, test & production Pricing • On-demand • Spot • Reserved
    5. 5. Demo - EC2 • Virtual Machines 5
    6. 6. S3 and Glacier
    7. 7. About EC2 storage S3 •10 GB max •3 copies •Usually for data storage EBS – expand / snapshot, etc… •Can store AMIs (persistent) •Can ‘stop’ EC2 instances and ‘re-start’ – saves $$$ •Costs more •Can expand •One copy only (faster) SSD – optional •For high performance •Provisioned IOPs
    8. 8. Demo – S3 • File Storage 8
    9. 9. Demo – Glacier • Archival Storage 9
    10. 10. RDS – Managed Relational Data
    11. 11. Demo – RDS • SQL Server as a service 11
    12. 12. RDS vs. EC2 for SQL Server • Provisioned IO – performance guarantees • Scheduled backups • Point in time restores • Scheduled maintenance windows • Full use of all SQL tools, SSMS, Profiler, DTA, etc… • Supports Availability Groups (requires 2012 Enterprise) Why RDS costs more
    13. 13. Redshift – $999 / TB / year
    14. 14. Demo – Redshift • Data Warehousing as a Service 14
    15. 15. DynamoDB for fast NoSQL with SSDs
    16. 16. Demo – DynamoDB • NoSQL on SSD 16
    17. 17. Elastic MapReduce for easy Hadoop
    18. 18. Demo – MapReduce • Hadoop on AWS 18
    19. 19. Kinesis for real-time Big Data Streams
    20. 20. Demo – Kinesis • Real-time streaming for Big Data 20
    21. 21. Data Pipelines – automated data transfer
    22. 22. Demo – Data Pipeline • Build data flows on AWS 22
    23. 23. Integration w/ Visual Studio – AWS SDK See Also: • AWS Tools for Windows Developers • Includes AWS Powershell
    24. 24. AWS SDK includes AWS Powershell
    25. 25. Demo – AWS SDK • Add-in for Visual Studio and .NET 25
    26. 26. Cloud Database Services by Vendor AWS Google Microsoft RDBMS VMs EC2 AMIs w/SQL Server, etc… GCE w/MySQL Azure VM images w/SQL Server Managed RDBMS RDS - SQL Server, MySQL Cloud SQL - MySQL SQL Azure NoSQL buckets/databases S3, EBS, Glacier, DynamoDB Cloud Storage HR Datastore on GAE Azure Blobs & Tables Pipelines Data Pipelines Data Pipelines (beta) SSIS? Streaming Machine Learning Kinesis or Custom EC2 BigQuery & Prediction API StreamInsight Azure Machine Learning Document MongoDB on EC2 MongoDB on GCE MongoDB on Windows Azure Hadoop MapReduce Big Query (Dremel) HDInsight Other Redshift – Data Warehouse Workspaces & Zocalo Managed VMs GAE Azure Marketplace – premium data
    27. 27. Costs - Free Tier for Database Services
    28. 28. How much does it cost? Tip: When testing use Billing Alerts to make sure you’ve turned off test services!
    29. 29. Creative Financing • Use what you need and no more, i.e. instance size, storage size… • Watch for price drops – RDS price decrease this week Regular Pricing • Pause EC2 instances to reduce compute charges • Delete EC2 instances to reduce storage charges Smart EC2 Instance Usage • Set pricing alerts • Use spot pricing • Re-selling compute / storage Vanity Pricing
    30. 30. Usage Summary Compute EC2 Dev & Test Train Prod Storage S3 Raw Storage Glacier Archiving Data Services RDS Partially Managed RDBMS HA SQL Server Redshift Data Warehousing DynamoDB fast NoSQL – on SSDs EMR On Demand MapReduce Kinesis Streaming Data Pipelines Automation
    31. 31. 31
    32. 32. Keep Learning • Connect – @LynnLangit – www.youtube.com/user/SoCalDevGal • Get started – Sign up for AWS – use ‘Free Tier’ – Email me to get $100 AWS usage credit

    ×