Big Data and the Cloud a Best Friend Story

Amazon Web Services
Big Data and the Cloud: A Best Friend Story

Joe Ziegler
Technical Evangelist
zieglerj@amazon.com @jiyosub

죠 지글러
테크니컬 에벤젤리스트
zieglerj@amazon.com @jiyosub

Characteristics of
Big Data

How the Cloud Is
Big Data’s Best Friend

Big Data on the Cloud
In the Real World

Characteristics of
Big Data

BIG DATA
When your data sets become
so large that you have to start
innovating how to collect, store,
organize, analyze and share it

Bigger Data
is
Better Data

Bigger Data
is
Harder Data

Big Data is Getting Bigger
Unconstrained data growth

95% of the 1.2 zettabytes of
ZB data in the digital universe is
unstructured
70% of of this is user-
EB generated content
Unstructured data growth
explosive, with estimates of
PB compound annual growth
(CAGR) at 62% from 2008 –
GB TB 2012.
Source: IDC

Big Data is Hard
and getting harder

Changing Data Requirements
Faster response time of fresher data
Sampling is not good enough & history is important
Increasing complexity of analytics
Users demand inexpensive experimentation

Where is it Coming From?

Computer Generated Human Generated
• Application server logs • Twitter “Fire Hose” 50m
(web sites, games) tweets/day 1,400% growth
• Sensor data per year
(weather, water, smart • Blogs/Reviews/Emails/Pict
grids) ures
• Images/videos • Social Graphs: Facebook,
(traffic, security cameras) Linked-in, Contacts

Storage Big Data Compute
Big Data
How quickData has gravity it?
do you need to read

App Data App

http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/

Big Data
…and inertia atto read
quick do you need volume…
How…and inertia at volume… it?

Data


Big Data
…easierquick inertiaapplications to the data
to move need to read
How…and do youat volume… it?

Data


The Role of Data
is Changing

Until now, Questions you ask drove Data model

New model is collect as much data as possible
– “Data-First Philosophy”

Data is the new raw material for
Data is theanyraw material for on business on par
new
business any
par with
with capital, people, labor
capital, people, labor

We Need Tools Built Specifically
for Big Data

Hadoop

• Scale out Easily • Solves some Problems
• Parallel Computing • Complex to Run
• Commodity Hardware • Special Skills to Maintain

How the Cloud Is

How do we define the cloud?
By Benefits!

No Cap Ex
Pay Per
Elasticity Use

Cloud
Fast Time to Market Focus on core
competency

Why is the Cloud

We know we want collect, store,
organize, analyze and share it.
But we have limited resources.

The Cloud Optimizes
Precious IT Resources
i.e. Skilled People

“Over the next decade, the number of ﬁles or containers
that encapsulate the information in the digital universe
will grow by 75x.
While the pool of IT staff available to manage them will
grow only slightly. At 1.5x”
- 2011 IDC Digital Universe Study

Deploying a Hadoop cluster is hard

Cloud computing

30% 70%

The Old Using Big Managing All of the
IT World Data “Undifferentiated Heavy Lifting”

Cloud computing

30% 70%

The Old Using Big Managing All of the
IT World Data “Undifferentiated Heavy Lifting”
Configuring
Cloud-Based
Analyzing and Using Big Data Cloud
Infrastructure
Assets
70% 30%

The Cloud
Reduces Cost
For Experimentation

Managed
Reusability Services

Scale Innovation

The Cloud Optimizes
Capacity Resources

Elastic Compute Capacity

On and Off Fast Growth

Variable peaks Predictable peaks

WASTE

On and Off Fast Growth

Variable peaks Predictable peaks

CUSTOMER DISSATISFACTION


Capacity Traditional
IT capacity
Elastic cloud capacity
Time
Your IT needs

The Cloud
Empowers Users to Balance
Cost and Time

1 instance for 500 hours
=
500 instances for 1 hour

Big Data
From one instance…
How quick do you need to read it?

Big Data
…to thousands
How quick do you need to read it?

AMAZON ELASTIC MAPREDUCE
• Managed Hadoop offering in the cloud
• Integration with other AWS services
• Thousands of customers ran over 2 million clusters on EMR
over the last year

Prod Cluster
S3 (EMR)

EMR
HDFS

Data streamed directly from S3 to the cluster

Prod Cluster
S3 (EMR)

EMR
HDFS

Results streamed back to S3

Recommendation Ad-hoc
Engine Analysis Personalization

Prod Cluster
S3 (EMR)

EMR

Data consumed in multiple ways

Prod Cluster
(EMR)
S3

EMR

Wide range of processing languages used

The Cloud
Enables Collection and Storage
of Big Data

Simple Storage Service
1 Trillion
1000.000

750.000

500.000

250.000

0.000

650k+ peak transactions per second

Global Accessibility

Region
US-WEST (N. California) EU-WEST (Ireland)
GOV CLOUD ASIA PAC (Tokyo)

US-EAST (Virginia)

US-WEST (Oregon)

ASIA PAC
(Singapore)
SOUTH AMERICA (Sao Paulo)

Amazon DynamoDB
DynamoDB is a fully managed NoSQL database service
that provides extremely fast and predictable performance
with seamless scalability

Zero Administration

Low Latency SSD’s

Reserved Capacity
Unlimited Potential Storage and
Throughput

We know we want
collect, store, organize, analyze and
share it.
But we have limited resources.

Big Data on the Cloud
In the Real World

Big Data Verticals

Social
Media/Adverti Financial
Oil & Gas Retail Life Sciences Security Network/Gami
sing Services
ng

User
Anti-virus
Targeted Monte Carlo Demographics
Recommend
Advertising Simulations

Seismic Genome Fraud
Usage analysis
Analysis Analysis Detection

Image and
Transactions
Video Risk Analysis
Analysis Image In-game
Processing
Recognition metrics

Netflix Web Services
(Honu) S3

8 TB of event data per day

S3

Legacy Data

Legacy data from on-premise
Netflix Data Center
data center

Customer dimension data stored
in Cassandra

S3

~1 PB of data stored in Amazon S3

Bank – Monte Carlo Simulations
“The AWS platform was a good fit for its
unlimited and flexible computational power to

23 Hours to our risk-simulation process requirements.

With AWS, we now have the power to decide

20 Minutes how fast we want to obtain simulation
results, and, more importantly, we have the
ability to run simulations not possible before
due to the large amount of infrastructure
required.” – Castillo, Director, Bankinter

Recommendations

The Taste Test
http://www.etsy.com/tastetest

Recommendations

Gift Ideas for Facebook Friends

etsy.com/gifts

Click Stream Analysis

User recently
purchased a Targeted Ad
sports movie and
(1.7 Million per day)
is searching for
video games

Big Data and the Cloud a Best Friend Story

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Big Data and the Cloud a Best Friend Story

Similar to Big Data and the Cloud a Best Friend Story (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

Big Data and the Cloud a Best Friend Story

Editor's Notes