AWS for Big Data Experts

1,494 views
1,207 views

Published on

presentation for BigDataCampLA

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,494
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
79
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

AWS for Big Data Experts

  1. 1. AWS for Big Data Experts @LynnLangit Nov 2013
  2. 2. Data Expertise / Lynn Langit Practicing Architect • Cloud Deployments (Azure, AWS, Google) Technical author / trainer • • • • Google Cloud Developer Series SQL Server 2012 Developer Series Cloudera Certified Developer 2 books on SQL Server BI Industry awards • • • Microsoft – MVP for SQL Server Google – GDE for Cloud Platform 10Gen – Master for MongoDB Former MSFT FTE • 4 years
  3. 3. What and Why AWS? Market leader AWS Amazon’s cloud Large Set of services • Compute • Data • More • In market longest • Usually cheapest • Most often used in production
  4. 4. Amazon Web Services
  5. 5. How to Work with AWS • Web Console • Command Line Tools • AWS SDK and IDE Tools 5
  6. 6. EC2 – Virtual Machines (AMIs)
  7. 7. EC2 – VMs (AMIs) from AWS Marketplace
  8. 8. Demo - EC2 Virtual Machines 8
  9. 9. Understanding EC2 storage options
  10. 10. S3 -- Storage
  11. 11. S3 – bucket properties
  12. 12. Demo – S3 Storage 12
  13. 13. Glacier -- storage & archiving
  14. 14. Demo – Glacier Archival Storage 14
  15. 15. RDS – partially managed SQL Server and more…
  16. 16. Demo – RDS Partially managed MySQL, Oracle or SQL Server 16
  17. 17. RDS vs. EC2 for SQL Server Why RDS costs more • Provisioned IO – performance guarantees • Scheduled backups • Point in time restores • Scheduled maintenance windows • Full use of all SQL tools, SSMS, Profiler, DTA, etc… • Supports Availability Groups (requires 2012 Enterprise) • Cross-regional snapshots
  18. 18. Redshift – Warehouse as a Service
  19. 19. Demo – Redshift Data Warehousing with PostgreSQL 19
  20. 20. DynamoDB for fast NoSQL with SSDs
  21. 21. Demo – DynamoDB NoSQL (wide-column store) on SSD 21
  22. 22. Elastic MapReduce for easy Hadoop
  23. 23. Demo – MapReduce Hadoop on AWS 23
  24. 24. New Services - AWS:Invent Kinesis – real-time processing of streaming Big Data (into AppStream – deliver streaming applications to clients from AWS CloudTrail – capture AWS API calls RDS addition – now supports PostgreSQL Workspaces – Virtual Desktops for PC or Mac 24
  25. 25. Data Pipelines – automated data transfer
  26. 26. Demo – Data Pipeline Build data flows on AWS 26
  27. 27. Elastic Beanstalk for application scalability
  28. 28. Demo – Beanstalk PaaS on AWS 28
  29. 29. AWS SDK for Visual Studio 29
  30. 30. Demo – AWS SDK Add-in for Visual Studio and .NET 30
  31. 31. Cloud Database Services by Vendor AWS Google Microsoft Virtual Machines EC2 GCE – Linux only Azure VM Cloud RDBMS RDS - SQL Server, MySQL, Oracle Redshift - Postgres mySQL > MariaDB SQL Azure NoSQL buckets Key-Value stores EBS S3 Glacier DynamoDB Cloud Storage HR Datastore on GAE Azure Blobs Azure Tables Pipelines Data Pipelines Via APIs only SSIS (on-premises) Document MongoDB on EC2 None MongoDB on Windows Azure Hadoop MapReduce or Dremel MapReduce on EC2 using S3 Big Query HDInsight (HDFS) Other Datasets Streaming Machine Learning Kinesis EBS volumes w/datasets Freebase Translation API Full-text search Prediction API StreamInsight Azure Marketplace
  32. 32. How much does it cost?
  33. 33. Getting Started – Free Tier
  34. 34. Creative Financing Regular Pricing • Use what you need and no more, i.e. instance size, storage size… • Watch for price drops – RDS price decrease this week Smart EC2 Instance Usage • Pause EC2 instances to reduce compute charges • Delete EC2 instances to reduce storage charges Vanity Pricing • Set pricing alerts • Use spot pricing • Re-selling compute / storage
  35. 35. Example: EC2 Spot Pricing 35
  36. 36. Example: EC2 Reserved Pricing 36
  37. 37. Tip: Use AWS ‘Trusted Advisor’ 37
  38. 38. Tip: Use Pricing Calculators Example – from RightScale ‘PlanForCloud’ 38
  39. 39. Conclusions EC2 for testing, training and production (IaaS) S3 for archiving R/W Glacier for archiving W fast & cheap, R slow & expensive RDS for HA SQL Server Redshift for Data Warehousing on demand DynamoDB for fast NoSQL – on SSDs Elastic Map Reduce for easy Hadoop MapReduce
  40. 40. • recipes) www.TeachingKidsProgramming.org • • • Free Courseware (Java, SmallBasic or C# / Pluralsight) Do a Recipe  Teach a Kid (Ages 10 ++) Dec 2013 – Code.org – ‘Hour of Code’ education partner
  41. 41. Keep Learning Twitter: @LynnLangit YouTube: http://www.youtube.com/user/SoCalDevGal Hire me • To help build your BI/Big Data solution • To teach your team next gen BI • To learn more about using NoSQL solutions

×