AWS Roadshow 2013
Über den Wolken – befreien Sie Ihre IT
Datenanalyse und Business Intelligence
Michael Hanisch
Mgr. Solut...
Overview

1. Introducing Big Data
2. From data to actionable information
3. Analytics and Cloud Computing
1

Introducing Big Data
Generation

Collection & storage

Analytics & computation

Collaboration & sharing
The cost of data generation
is falling
The volume of data
is increasing
Lower cost,
higher throughput

Generation

Collection & storage

Analytics & computation

Collaboration & sharing
Lower cost,
higher throughput

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Highly
...
Data volume

Generated data

Available for analysis

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data ...
Elastic and highly scalable
+
No upfront capital expense
+
Only pay for what you use
+
Available on-demand

=

Remove
cons...
Lower cost,
higher throughput

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Highly
...
Generation

Collection & storage
Accelerated

Analytics & computation

Collaboration & sharing
Big Data
Technologies and techniques for
working productively with data,
at any scale.
2

From data to
actionable information
“Who buys video games?”
Per day:
3.5 billion records
13 TB of click stream logs

71 million unique cookies
Results:
500% return on ad spend
From 2 months procurement time
to a few minutes
“Who is using our service?”
Finding signal in the noise of logs
Identified early mobile usage
Invested heavily in mobile development
In January 2013
9,432,061 unique mobile devices
used the Yelp mobile app.
4 million+ calls. 5 million+ directions.
3

Analytics and
Cloud Computing
Generation

Collection & storage

Analytics & computation

Collaboration & sharing
Generation

Collection & storage

Analytics & computation

Collaboration & sharing

S3, Glacier,
Storage Gateway,
DynamoDB...
Generation

Collection & storage

Analytics & computation

Collaboration & sharing

EC2 &
Elastic MapReduce
Generation

Collection & storage

Analytics & computation

Collaboration & sharing

EC2 & S3,
CloudFormation,
Elastic MapR...
Generation

Collection & storage
AWS Data Pipeline

Analytics & computation

Collaboration & sharing

S3, Glacier,
Storage...
Simple Storage Service

S3
Elastic MapReduce

EMR
Hadoop-as-a-service
Map-Reduce engine

Integrated with tools

What is EMR?
Massively parallel

Integrated to AWS services
...
How does it work?

1. Put the data
into S3 (or HDFS)

S3

EMR Cluster

EMR

3. Get the
results

2. Launch your cluster.
Ch...
How does it work?

EMR Cluster
S3

EMR

You can
easily resize
the cluster
How does it work?

EMR Cluster
S3

EMR

Use Spot
nodes to
save time
and money
How does it work?

EMR Cluster
S3

EMR

Launch parallel clusters
against the same data
source (tune for the
workload)
How does it work?

S3

EMR Cluster

When the work is complete,
you can terminate the cluster
(and stop paying)
How does it work?

EMR Cluster

You can store
everything in HDFS
(local disk)

High Storage nodes
= 48 TB/node
How does it work?

EMR Cluster

Launch in a Virtual
Private Cloud for
extra security
Thousands of Customers, 5+ Million Clusters
Integrates with Hadoop
Ecosystem

EMR
Integrates with Hadoop
Ecosystem

EMR
Give it a try:
aws.amazon.com/elasticmapreduce

Cost to run a 100-node EMR cluster:
EUR 6.15/hour
($8/h)
+
Photos: renee_mcgurk https://www.flickr.com/photos/51018933@N08/5355664961/in/photostream/
Calgary Reviews https://www.f...
What if all I want is a
database?
Customers asked us for a data warehouse the AWS way:
Easy to provision and scale up massively

No upfront costs, pay as yo...
Amazon Redshift Is:
A fast and powerful, petabyte-scale data warehouse that is

A Lot Faster

A Lot Cheaper

Amazon Redshi...
Amazon Redshift Dramatically Reduces IO
Id

Age

State

123

20

CA

345

25

WA

678

40

FL
Amazon Redshift parallelizes and distributes everything
Common BI Tools

Query

JDBC/ ODBC

Leader
Node

Load
Backup
Resto...
Amazon Redshift Runs on Optimized Hardware
HS1.8XL:
128GB RAM, 16 Cores, 24 Spindles, 16TB Storage, 2GB/sec scan rate

HS1...
Redshift lets you start
small Node (XL) grow big Large Node (8XL)
Extra Large and
8 Extra
3 spindles, 2TB, 15GiB RAM
2 vir...
Priced to Analyze All the
Customer’s Data
Price Per Hour for HS1.XL
Single Node

Effective Hourly Price Per
TB

Effective ...
Amazon Redshift Simplifies
Provisioning
•

Create a cluster in minutes

•

Automatically patch your OS and data warehouse ...
Amazon Redshift Simplifies
Operations

(Optional) SSL

Continuous, Automatic Backup
Streaming Restore

Clients

Amazon Red...
Initial Pilot Results

Current production environment
32 nodes, 128 CPUs, 4.2TB RAM, 1.6 PB disk
Tested 2B row data set, 6...
Amazon Redshift Integrates
With All Data Sources
Amazon EC2
Amazon
DynamoDB

Amazon Relational
Database Service (RDS)

Ama...
Integrates With Existing BI Tools

JDBC/ODBC

Amazon Redshift

Connect your tools to Amazon Redshift using standard
driver...
On-Premises Integration

OLTP
ERP

Reporting
and BI

RDBMS

Redshift

Data
Integration
Partners*
Cloud ETL for Big Data

Reporting
and BI

S3
Elastic MapReduce

•
•
•

Redshift

Maintain online SQL access to your histor...
Thanks.
glez@amazon.de

Learn More: aws.amazon.com/bigdata
Thank you!
glez@amazon.de
AWS Data Pipeline

Data-intensive orchestration and automation
Reliable and scheduled
Easy to use, drag and drop
Execution...
Anatomy of a pipeline
Additional checks and notifications
Arbitrarily complex pipelines
AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
Upcoming SlideShare
Loading in …5
×

AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

838 views

Published on

Vortrag von der AWS Roadshow Herbst 2013

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
838
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

  1. 1. AWS Roadshow 2013 Über den Wolken – befreien Sie Ihre IT Datenanalyse und Business Intelligence Michael Hanisch Mgr. Solutions Architecture Matthias Jung Solutions Architect Constantin Gonzalez Solutions Architect
  2. 2. Overview 1. Introducing Big Data 2. From data to actionable information 3. Analytics and Cloud Computing
  3. 3. 1 Introducing Big Data
  4. 4. Generation Collection & storage Analytics & computation Collaboration & sharing
  5. 5. The cost of data generation is falling
  6. 6. The volume of data is increasing
  7. 7. Lower cost, higher throughput Generation Collection & storage Analytics & computation Collaboration & sharing
  8. 8. Lower cost, higher throughput Generation Collection & storage Analytics & computation Collaboration & sharing Highly constrained
  9. 9. Data volume Generated data Available for analysis Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  10. 10. Elastic and highly scalable + No upfront capital expense + Only pay for what you use + Available on-demand = Remove constraints
  11. 11. Lower cost, higher throughput Generation Collection & storage Analytics & computation Collaboration & sharing Highly constrained
  12. 12. Generation Collection & storage Accelerated Analytics & computation Collaboration & sharing
  13. 13. Big Data Technologies and techniques for working productively with data, at any scale.
  14. 14. 2 From data to actionable information
  15. 15. “Who buys video games?”
  16. 16. Per day: 3.5 billion records 13 TB of click stream logs 71 million unique cookies
  17. 17. Results: 500% return on ad spend From 2 months procurement time to a few minutes
  18. 18. “Who is using our service?”
  19. 19. Finding signal in the noise of logs Identified early mobile usage Invested heavily in mobile development
  20. 20. In January 2013 9,432,061 unique mobile devices used the Yelp mobile app. 4 million+ calls. 5 million+ directions.
  21. 21. 3 Analytics and Cloud Computing
  22. 22. Generation Collection & storage Analytics & computation Collaboration & sharing
  23. 23. Generation Collection & storage Analytics & computation Collaboration & sharing S3, Glacier, Storage Gateway, DynamoDB, Redshift, RDS, HBase
  24. 24. Generation Collection & storage Analytics & computation Collaboration & sharing EC2 & Elastic MapReduce
  25. 25. Generation Collection & storage Analytics & computation Collaboration & sharing EC2 & S3, CloudFormation, Elastic MapReduce, RDS, DynamoDB, Redshift
  26. 26. Generation Collection & storage AWS Data Pipeline Analytics & computation Collaboration & sharing S3, Glacier, Storage Gateway, DynamoDB, Redshift, RDS, HBase EC2 & Elastic MapReduce EC2 & S3, CloudFormation, Elastic MapReduce, RDS, DynamoDB, Redshift
  27. 27. Simple Storage Service S3
  28. 28. Elastic MapReduce EMR
  29. 29. Hadoop-as-a-service Map-Reduce engine Integrated with tools What is EMR? Massively parallel Integrated to AWS services Cost effective AWS wrapper
  30. 30. How does it work? 1. Put the data into S3 (or HDFS) S3 EMR Cluster EMR 3. Get the results 2. Launch your cluster. Choose: • Hadoop distribution • How many nodes • Node type (hi-CPU, hi-memory, etc.) • Hadoop apps (Hive, Pig, HBase)
  31. 31. How does it work? EMR Cluster S3 EMR You can easily resize the cluster
  32. 32. How does it work? EMR Cluster S3 EMR Use Spot nodes to save time and money
  33. 33. How does it work? EMR Cluster S3 EMR Launch parallel clusters against the same data source (tune for the workload)
  34. 34. How does it work? S3 EMR Cluster When the work is complete, you can terminate the cluster (and stop paying)
  35. 35. How does it work? EMR Cluster You can store everything in HDFS (local disk) High Storage nodes = 48 TB/node
  36. 36. How does it work? EMR Cluster Launch in a Virtual Private Cloud for extra security
  37. 37. Thousands of Customers, 5+ Million Clusters
  38. 38. Integrates with Hadoop Ecosystem EMR
  39. 39. Integrates with Hadoop Ecosystem EMR
  40. 40. Give it a try: aws.amazon.com/elasticmapreduce Cost to run a 100-node EMR cluster: EUR 6.15/hour ($8/h)
  41. 41. + Photos: renee_mcgurk https://www.flickr.com/photos/51018933@N08/5355664961/in/photostream/ Calgary Reviews https://www.flickr.com/photos/calgaryreviews/6328302248/in/photostream/
  42. 42. What if all I want is a database?
  43. 43. Customers asked us for a data warehouse the AWS way: Easy to provision and scale up massively No upfront costs, pay as you go Really fast performance at a really low price Open and flexible with support for popular tools
  44. 44. Amazon Redshift Is: A fast and powerful, petabyte-scale data warehouse that is A Lot Faster A Lot Cheaper Amazon Redshift A Whole Lot Simpler
  45. 45. Amazon Redshift Dramatically Reduces IO Id Age State 123 20 CA 345 25 WA 678 40 FL
  46. 46. Amazon Redshift parallelizes and distributes everything Common BI Tools Query JDBC/ ODBC Leader Node Load Backup Restore Resize 1 0 GigE Mesh Compute Node Compute Node Compute Node
  47. 47. Amazon Redshift Runs on Optimized Hardware HS1.8XL: 128GB RAM, 16 Cores, 24 Spindles, 16TB Storage, 2GB/sec scan rate HS1.XL: 16GB RAM, 2 Cores, 3 Spindles, 2TB Storage Optimized for I/O intensive workloads High disk density Runs in HPC - fast network HS1.8XL available on Amazon EC2
  48. 48. Redshift lets you start small Node (XL) grow big Large Node (8XL) Extra Large and 8 Extra 3 spindles, 2TB, 15GiB RAM 2 virtual cores, 10GigE Single Node (2TB) 24 spindles, 16TB, 120GiB RAM 16 virtual cores, 10GigE Cluster 2-100 Nodes (32TB – 1.6PB) 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L Cluster 2-32 Nodes (4TB – 64TB) 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L
  49. 49. Priced to Analyze All the Customer’s Data Price Per Hour for HS1.XL Single Node Effective Hourly Price Per TB Effective Annual Price per TB On-Demand $ 0.850 $ 0.425 $ 3,723 1 Year Reservation $ 0.500 $ 0.250 $ 2,190 3 Year Reservation $ 0.228 $ 0.114 $ Simple Pricing: Number of Nodes x Cost per Hour No charge for Leader Node Pay as you grow 999
  50. 50. Amazon Redshift Simplifies Provisioning • Create a cluster in minutes • Automatically patch your OS and data warehouse software • Scale up to 1.6PB with a few clicks and no downtime Amazon Redshift Amazon Redshift
  51. 51. Amazon Redshift Simplifies Operations (Optional) SSL Continuous, Automatic Backup Streaming Restore Clients Amazon Redshift *SSL, Amazon VPC, AES-256 (Hardware Accelerated) Amazon S3
  52. 52. Initial Pilot Results Current production environment 32 nodes, 128 CPUs, 4.2TB RAM, 1.6 PB disk Tested 2B row data set, 6 representative queries on a 2-node Amazon Redshift cluster queries ran > 10x faster
  53. 53. Amazon Redshift Integrates With All Data Sources Amazon EC2 Amazon DynamoDB Amazon Relational Database Service (RDS) Amazon Redshift Corporate Data Center Amazon Elastic MapReduce Amazon Simple Storage Service (S3) AWS Storage Gateway Service
  54. 54. Integrates With Existing BI Tools JDBC/ODBC Amazon Redshift Connect your tools to Amazon Redshift using standard drivers from PostgreSQL.org
  55. 55. On-Premises Integration OLTP ERP Reporting and BI RDBMS Redshift Data Integration Partners*
  56. 56. Cloud ETL for Big Data Reporting and BI S3 Elastic MapReduce • • • Redshift Maintain online SQL access to your historical data Transformation and enrichment with EMR Longer history ensures better insight
  57. 57. Thanks. glez@amazon.de Learn More: aws.amazon.com/bigdata
  58. 58. Thank you! glez@amazon.de
  59. 59. AWS Data Pipeline Data-intensive orchestration and automation Reliable and scheduled Easy to use, drag and drop Execution and retry logic Map data dependencies Create and manage temporary compute resources
  60. 60. Anatomy of a pipeline
  61. 61. Additional checks and notifications
  62. 62. Arbitrarily complex pipelines

×