AWS for Big Data Experts
@LynnLangit

Nov 2013
Data Expertise / Lynn Langit
Practicing Architect
• Cloud Deployments (Azure, AWS, Google)

Technical author / trainer
•
•...
What and Why AWS?
Market leader

AWS
Amazon’s cloud

Large Set of
services
• Compute
• Data
• More

• In market longest
• ...
Amazon Web Services
How to Work with AWS
• Web Console
• Command
Line Tools
• AWS SDK and
IDE Tools

5
EC2 – Virtual Machines (AMIs)
EC2 – VMs (AMIs) from AWS Marketplace
Demo - EC2
Virtual Machines

8
Understanding EC2 storage options
S3 -- Storage
S3 – bucket properties
Demo – S3
Storage

12
Glacier -- storage & archiving
Demo – Glacier
Archival Storage

14
RDS – partially managed SQL Server and
more…
Demo – RDS
Partially managed
MySQL, Oracle or SQL Server

16
RDS vs. EC2 for SQL Server

Why RDS
costs more

• Provisioned IO –
performance guarantees
• Scheduled backups
• Point in t...
Redshift – Warehouse as a Service
Demo – Redshift
Data Warehousing with PostgreSQL

19
DynamoDB
for fast NoSQL with SSDs
Demo – DynamoDB
NoSQL (wide-column store) on SSD

21
Elastic MapReduce
for easy Hadoop
Demo – MapReduce
Hadoop on AWS

23
New Services - AWS:Invent
Kinesis – real-time processing
of streaming Big Data (into
AppStream – deliver streaming
applica...
Data Pipelines – automated data transfer
Demo – Data Pipeline
Build data flows on AWS

26
Elastic Beanstalk
for application scalability
Demo – Beanstalk
PaaS on AWS

28
AWS SDK for Visual Studio

29
Demo – AWS SDK
Add-in for Visual Studio and .NET

30
Cloud Database Services by Vendor
AWS

Google

Microsoft

Virtual Machines

EC2

GCE – Linux only

Azure VM

Cloud RDBMS

...
How much does it cost?
Getting Started – Free Tier
Creative Financing
Regular Pricing

• Use what you need and no more, i.e. instance size, storage size…
• Watch for price d...
Example: EC2 Spot Pricing

35
Example: EC2 Reserved Pricing

36
Tip: Use AWS ‘Trusted Advisor’

37
Tip: Use Pricing Calculators
Example – from RightScale ‘PlanForCloud’

38
Conclusions
EC2 for testing, training and production (IaaS)
S3 for archiving R/W
Glacier for archiving W fast & cheap, R s...
• recipes)

www.TeachingKidsProgramming.org
•
•
•

Free Courseware (Java, SmallBasic or C# / Pluralsight)
Do a Recipe  Te...
Keep Learning
Twitter: @LynnLangit
YouTube:

http://www.youtube.com/user/SoCalDevGal

Hire me
• To help build your BI/Big ...
Upcoming SlideShare
Loading in …5
×

AWS for Big Data Experts

1,591 views

Published on

presentation for BigDataCampLA

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,591
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
80
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

AWS for Big Data Experts

  1. 1. AWS for Big Data Experts @LynnLangit Nov 2013
  2. 2. Data Expertise / Lynn Langit Practicing Architect • Cloud Deployments (Azure, AWS, Google) Technical author / trainer • • • • Google Cloud Developer Series SQL Server 2012 Developer Series Cloudera Certified Developer 2 books on SQL Server BI Industry awards • • • Microsoft – MVP for SQL Server Google – GDE for Cloud Platform 10Gen – Master for MongoDB Former MSFT FTE • 4 years
  3. 3. What and Why AWS? Market leader AWS Amazon’s cloud Large Set of services • Compute • Data • More • In market longest • Usually cheapest • Most often used in production
  4. 4. Amazon Web Services
  5. 5. How to Work with AWS • Web Console • Command Line Tools • AWS SDK and IDE Tools 5
  6. 6. EC2 – Virtual Machines (AMIs)
  7. 7. EC2 – VMs (AMIs) from AWS Marketplace
  8. 8. Demo - EC2 Virtual Machines 8
  9. 9. Understanding EC2 storage options
  10. 10. S3 -- Storage
  11. 11. S3 – bucket properties
  12. 12. Demo – S3 Storage 12
  13. 13. Glacier -- storage & archiving
  14. 14. Demo – Glacier Archival Storage 14
  15. 15. RDS – partially managed SQL Server and more…
  16. 16. Demo – RDS Partially managed MySQL, Oracle or SQL Server 16
  17. 17. RDS vs. EC2 for SQL Server Why RDS costs more • Provisioned IO – performance guarantees • Scheduled backups • Point in time restores • Scheduled maintenance windows • Full use of all SQL tools, SSMS, Profiler, DTA, etc… • Supports Availability Groups (requires 2012 Enterprise) • Cross-regional snapshots
  18. 18. Redshift – Warehouse as a Service
  19. 19. Demo – Redshift Data Warehousing with PostgreSQL 19
  20. 20. DynamoDB for fast NoSQL with SSDs
  21. 21. Demo – DynamoDB NoSQL (wide-column store) on SSD 21
  22. 22. Elastic MapReduce for easy Hadoop
  23. 23. Demo – MapReduce Hadoop on AWS 23
  24. 24. New Services - AWS:Invent Kinesis – real-time processing of streaming Big Data (into AppStream – deliver streaming applications to clients from AWS CloudTrail – capture AWS API calls RDS addition – now supports PostgreSQL Workspaces – Virtual Desktops for PC or Mac 24
  25. 25. Data Pipelines – automated data transfer
  26. 26. Demo – Data Pipeline Build data flows on AWS 26
  27. 27. Elastic Beanstalk for application scalability
  28. 28. Demo – Beanstalk PaaS on AWS 28
  29. 29. AWS SDK for Visual Studio 29
  30. 30. Demo – AWS SDK Add-in for Visual Studio and .NET 30
  31. 31. Cloud Database Services by Vendor AWS Google Microsoft Virtual Machines EC2 GCE – Linux only Azure VM Cloud RDBMS RDS - SQL Server, MySQL, Oracle Redshift - Postgres mySQL > MariaDB SQL Azure NoSQL buckets Key-Value stores EBS S3 Glacier DynamoDB Cloud Storage HR Datastore on GAE Azure Blobs Azure Tables Pipelines Data Pipelines Via APIs only SSIS (on-premises) Document MongoDB on EC2 None MongoDB on Windows Azure Hadoop MapReduce or Dremel MapReduce on EC2 using S3 Big Query HDInsight (HDFS) Other Datasets Streaming Machine Learning Kinesis EBS volumes w/datasets Freebase Translation API Full-text search Prediction API StreamInsight Azure Marketplace
  32. 32. How much does it cost?
  33. 33. Getting Started – Free Tier
  34. 34. Creative Financing Regular Pricing • Use what you need and no more, i.e. instance size, storage size… • Watch for price drops – RDS price decrease this week Smart EC2 Instance Usage • Pause EC2 instances to reduce compute charges • Delete EC2 instances to reduce storage charges Vanity Pricing • Set pricing alerts • Use spot pricing • Re-selling compute / storage
  35. 35. Example: EC2 Spot Pricing 35
  36. 36. Example: EC2 Reserved Pricing 36
  37. 37. Tip: Use AWS ‘Trusted Advisor’ 37
  38. 38. Tip: Use Pricing Calculators Example – from RightScale ‘PlanForCloud’ 38
  39. 39. Conclusions EC2 for testing, training and production (IaaS) S3 for archiving R/W Glacier for archiving W fast & cheap, R slow & expensive RDS for HA SQL Server Redshift for Data Warehousing on demand DynamoDB for fast NoSQL – on SSDs Elastic Map Reduce for easy Hadoop MapReduce
  40. 40. • recipes) www.TeachingKidsProgramming.org • • • Free Courseware (Java, SmallBasic or C# / Pluralsight) Do a Recipe  Teach a Kid (Ages 10 ++) Dec 2013 – Code.org – ‘Hour of Code’ education partner
  41. 41. Keep Learning Twitter: @LynnLangit YouTube: http://www.youtube.com/user/SoCalDevGal Hire me • To help build your BI/Big Data solution • To teach your team next gen BI • To learn more about using NoSQL solutions

×