Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

DAT 205 - Amazon Redshift in Action
Enterprise, Big Data, and SaaS Use Cases

November 15, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift

Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year

Amazon Redshift architecture
• Leader Node
–
–
–

JDBC/ODBC

SQL endpoint
Stores metadata
Coordinates query execution

• Compute Nodes
–
–
–
–

10 GigE
(HPC)

Local, columnar storage
Execute queries in parallel
Load, backup, restore via Amazon S3
Parallel load from Amazon DynamoDB

• Single node version available

Ingestion
Backup
Restore

Amazon Redshift is priced to let you analyze all your data
Price Per Hour for
HS1.XL Single Node

Effective Hourly
Price per TB

Effective Annual
Price per TB

On-Demand

$ 0.850

$ 0.425

$ 3,723

1 Year Reservation

$ 0.500

$ 0.250

$ 2,190

3 Year Reservation

$ 0.228

$ 0.114

$

999

Simple Pricing
Number of Nodes x Cost per Hour
No charge for Leader Node
No upfront costs
Pay as you go

Data Warehousing for Capital Markets
Jason Timmes, AVP of Software Development, NASDAQ OMX
November 15, 2013


Where innovation meets action
OUR TECHNOLOGY

WE OWN AND OPERATE

IS USED TO POWER MORE THAN

70 M ARKETPLACES

26 MARKETS
including

IN 50 COUNTRIES

3 CLEARINGHOUSES

1 MILLION
MESSAGES/SECOND
AT A MEDIAN SPEED OF
SUB-55 MICROSECONDS

POWER
1 IN 10

OF THE WORLD’S SECURITIES TRANSACTIONS

AND 5 CENTRAL

SECURITIES
OUR GLOBAL PLATFORM
CAN HANDLE MORE THAN

WE

D E P OS ITOR IE S

MORE THAN 5500
STRUCTURED PRODUCTS
ARE TIED TO OUR GLOBAL INDEXES
WITH THE NOTIONAL VALUE OF

AT LEAST $1 TRILLION

WE LIST ~3300
GLOBAL COMPANIES WORTH

$6 TRILLION
IN MARKET CAP REPRESENTING

DIVERSE INDUSTRIES AND

MANY OF THE WORLD’S
MOST WELL-KNOWN AND

INNOVATIVE BRANDS

6

What I do

New data and analytics platforms to store and
serve data to internal and external customers.

The Challenge
• Archiving Market Data
– classic “Big Data” problem

• Power Surveillance and Business
Intelligence/Analytics
• Minimize cost
– Not only infrastructure, but development/IT labor costs too

• Empower the business for self-service

SIP Total Monthly Message Volumes
OPRA, UQDF and CQS

Market
Data
Is Big
Data

Total Monthly Message Volume
Date
Aug-12
Sep-12
Oct-12
Nov-12
Dec-12
Jan-13
Feb-13
Mar-13
Apr-13
May-13
Jun-13
Jul-13
Aug-13

Charts courtesy of the
Financial Information
Forum

NASDAQ Exchange Daily Peak Messages
600,000,000
500,000,000
400,000,000
300,000,000
200,000,000
100,000,000
0

OPRA Annual Increase: 69%
CQS Annual Increase: 10%
UQDF Annual Decrease: 6%

Jan-13

Feb-13 Mar-13

Apr-13 May-13 Jun-13

Jul-13

Aug-13 Sep-13

Financial Information Forum, Redistribution without permission from FIF prohibited, email: fifinfo@fif.com

UQDF
2,317,804,321
1,948,330,199
1,016,336,632
2,148,867,295
2,017,355,401
2,099,233,536
1,969,123,978
2,010,832,630
2,447,109,450
2,400,946,680
2,601,863,331
2,142,134,920
2,188,338,764

CQS
8,241,554,280
7,452,279,225
7,452,279,225
9,552,313,807
8,052,399,165
7,474,101,082
7,531,093,813
7,896,498,260
9,805,224,566
9,430,865,048
11,062,086,463
8,266,215,553
9,079,813,726

Total Monthly
Message Volume
Date
OPRA
Aug-12
80,600,107,361
Sep-12
77,303,404,427
Oct-12
98,407,788,187
Nov-12
104,739,265,089
Dec-12
81,363,853,339
Jan-13
82,227,243,377
Feb-13
87,207,025,489
Mar-13
93,573,969,245
Apr-13
123,865,614,055
May-13
134,587,099,561
Jun-13
162,771,803,250
Jul-13
120,920,111,089
Aug-13
136,237,441,349

Combined
Average Daily
Volume
459,102,548
494,768,917
403,267,422
557,199,100
503,487,728
455,873,077
500,011,463
495,366,545
556,924,273
537,809,624
683,197,490
473,106,840
512,188,750
Average Daily
Volume
3,504,352,494
4,068,600,233
4,686,085,152
4,987,584,052
4,068,192,667
3,915,583,018
4,589,843,447
4,678,698,462
5,630,255,184
6,117,595,435
8,138,590,163
5,496,368,686
6,192,610,970
23

Our legacy solution
• On-premises MPP DB
– Relatively expensive, finite storage
– Required periodic additional expenses to add more storage
– Ongoing IT (administrative) human costs

• Legacy BI tool
– Requires developer involvement for new data sources, reports,
dashboards, etc.

New Solution: Amazon Redshift
• Cost Effective
– Redshift is 43% of the cost of legacy
• Assuming equal storage capacities

– Doesn’t include IT ongoing costs!

• Performance
– Easily outperforms our legacy BI/DB solution
– Insert 550K rows/second on a 2 node 8XL cluster

• Elastic
– Add additional capacity on demand, easy to grow our cluster

New Solution: Pentaho BI/ETL
• Amazon Redshift partner
– http://aws.amazon.com/redshift/par
tners/pentaho/

• Self Service
– Tools empower BI users to
integrate new data sources, create
their own analytics, dashboards,
and reports without requiring
development involvement

• Cost effective

Net Result
• New solution is cheaper, faster, and offers
capabilities that our business didn’t have before
– Empowers our business users to explore data like they never
could before
– Reduces IT and development as bottlenecks
– Margin improvement (expense reduction and supports business
decisions to grow revenue)

HauteLook + Amazon Redshift
A Case Study
Kevin Diamond, HauteLook
November 15, 2014


Who am I? Kevin Diamond
• CTO of HauteLook, a Nordstrom Company
• Oversee all technology, infrastructure, data,
engineering, etc.

• Major focus on great customer experience and
the analytics to provide it

What is HauteLook?
• Private sale, members-only limited-time sale events
• Premium fashion and lifestyle brands at exclusive prices of
50-75% off
• Over 20 new sale events begin each morning at 8am PST

• Over 14 million members
• Acquired by Nordstrom in 2011

Why a Data Warehouse?
• Centralized storage of multiple data sources
• Singular reporting consistency for all departments
• Data model that supports analytics not transactions
• Operational reports vs. analytical reports
– Real-time vs. previous day

Why Amazon Redshift?
• Looked at some competitors:
– Ranged from $ to $$$
– All required Software, Implementation and BIG Hardware

• Skipped the RFP

• Jumped into the Public Beta of Amazon Redshift
and never looked back

How We Implemented Amazon Redshift
• ETL from MySQL and Microsoft SQL Server into AWS across a
Direct Connect line storing on S3
• Also used S3 to dump flat files (iTunes Connect Data, Web Analytics
dumps, log files, etc)
• Used AWS Data Pipeline for executing Sqoop and Hadoop running
on EC2 to load data into Amazon Redshift
• Redshift Data Model based on Star Schema which looks something
like …

Usage with Business Intelligence
• Already selected a BI Tool
• Had difficulty deploying in the cloud
• But worked great on-premises
• Easily tied into Amazon Redshift using ODBC Drivers
• BUT, metadata for reports had to live in MSSQL
• Ported many SSIS/SSRS reports over
– But only the analytical reports!

Amazon Redshift Instances
• We use a little under 2TB
• Thought to use 2 - BIG 8XL instance to get great performance (in
passive failover mode)
• Cost us $$$
• Then we tested using 6 - XL instances in a cluster
• Performed better and allowed for more concurrency of queries in all
but a handful of cases that really needed the 8XL power
• Cost us $
• Duh! That’s why we do distributed everything else!!

Some First Hand Experience
• ETL was hardest part
• Amazon Redshift performs awesome
• Someone needs to make a great client SQL tool

• MicroStrategy works great on it (just wished it loved
running in EC2)
• Saving a ton, thanks to:
–

No hardware costs

–

No maintenance/overhead (rack + power)

–

Annual costs are equivalent to just the annual maintenance
of some of the cheaper DW on-premises options

Conclusion/Last Advice
•

Only use 8XL instances if you need >2TB of space
–

Otherwise distribute on a bunch of XL nodes

•

Buy reserved instances (we still need to do this!) since you likely will have this always on

•

Although we haven’t yet, the idea of a flexible scale-up/down DW is crazy awesome – maybe during
Holiday we will

•

Probably could have used Elastic MapReduce instead of Hadoop – wasn’t sure how it would play with Sqoop

•

Almost all BI tools play with Amazon Redshift now, so choose what is right for your business, and make sure it
works in EC2 before just putting it there

•

Communication between AWS and your DC is easy and fast, but I recommend a Direct Connect

•

Passed our rigorous information security standards, but used in a VPC

Amazon Redshift in Action:
Enterprise, Big Data, and SaaS Use Cases
Parag Thakker – VP, Roundarch Isobar
Colin McGuigan – Architect, Roundarch Isobar

November 15th, 2013


roundarch isobar
OUR SERVICES ACROSS BOUGHT, OWNED AND EARNED MEDIA

Strategies

Campaigns

Experiences

Platforms

Products

We digitally transform
business processes and
disrupt industries

We create, measure and
optimize digitally-focused
campaigns

We produce joyful
experiences that inspire
consumer interaction

We design and build
flexible and scalable
technology solutions

We invent digital
products that generate
new revenue streams

Audience insight

Research: competitive,
segmentation, persona
development, heuristics

Platforms: content
management, search,
portals, mobile, frontend technology,
internet-enabled
devices/wearables, social
apps, web services,
security, big data,
hosting

Digital products

Business planning:
competitive & industry
analysis, business cases,
maturity models,
roadmaps
Strategies: brand,
interactive, multichannel, social, content

27

Communications planning
Creative: advertising, visual
design, content creation,
studio production
Optimization: analytics,
monitoring, SEO, MVT,
media ROI analysis

Requirements and
specifications: content
analysis and specs,
functional requirements,
functional specifications
User experience design:
information architecture,
taxonomy and meta data,
interaction design, mobile

Digital product
extensions
Brand as a service

We have served the U.S. Air Force since 2001, building their enterprise portal and many
mission-critical applications
U.S. Air Force

Key metrics for our USAF work include:

• 900,000+ registered users

• Portal availability over 99.9% of time

• 700,000+ PK-E users

• 28 production enterprise services

• Response time worldwide: 3 seconds for 80% of all pages

• Over 300 applications available

• Over 1.2 million logins/week

• Public-facing and secure private instances (NIPR & SIPR)

• 124,000 unique daily users
28

• 4-5+ million pages daily (40-70 Mbit/sec)

• Portal support for over 5,000 “Communities of Interest”

Transforming in-stadium operations through a touch-screen command center
New York Jets
Our executive touch-screen environment provides real-time stadium
and game data, allowing the Jets owner, Woody Johnson, to monitor
the fan experience during game time and make operational
decisions that help maximize sales. The command center provides
summary-level and drill-down views of stadium operations such as
tickets, parking and concessions. It also creates predictive
algorithms that help identify pinch points and open revenue
opportunities.

29

“We brought the big picture close enough to
identify new, better ways to do business.”

Through a joint venture with Copia Capital, we created a new product offering for William
Blair
William Blair | Investment Research Management System
• Facilitates collaboration between
portfolio managers and analysts

Technology:

• Provides a holistic view of a
company/stock

• Uses Jquery,
JavascriptMVC, Less

– What is everything our
organization knows about
AAPL

• Digitizes PDF/Excel tools and
reports to enable rich, dynamic
interactions
• Simplifies content creation; e.g.,
comments, recommendation
reports, document upload

• Rich charting and visualization of
analytics

30

• JavaScript, HTML5, CSS3

• JSON Web Services

• Java, Spring, JPA, Mongo
DB
• User comment: “We love
how fast it is!”

What is the focus of your
CMO today?

Optimize marketing spend
across all channels (Bought, Earned
and Owned)
31

domain
marketing spend

billions

Web

Mobile
Display
Ads

Affiliate

Search

Sales

hundreds
data size

Email
TV
Radio

dozens
data sources

Print
Social

media channels

multiple terabytes
clients

multiple
32

marketing effectiveness stages
DLP

Scorecard

Sonar

AMNET

Compass

Optimize
Scorecard

Real-Time and Non-Real-Time

Learn
Analyze
• Centralized cross channel
Big Data Platform
• Standardized cross channel
reporting tools

• Discovery tools to identify
channel optimization
opportunities
• Modeling solutions

• Channel experience
enhancements
• Improved media buying,
planning & reporting functions
• Real time integration into DSP
• A/B testing based micro
segment adjustments

So what have we accomplished?
Built Marketing Analytics Platform - Radar
with 200+ in-time analytics, reporting andfrequency, granularity
forenable feeds (1TB/week) with various optimization
to scalable multi-tenant in 3 platform on Amazon
as multiple clients with customized metrics
with first launch SaaS months
and classification

34

scorecard logical architecture
Media Team

Display

Paid
Search

Organic
Search

Digital
Video

Site
Metrics

Sales

Google
DFA

Google
Bing
Marin

Google
Bing

Custom

Google
Omniture

Client
Stakeholders

TBD

Scorecard App

Detailed Analytic
Reports

TV

Radio

Print
OOH

Earned
Social

DDS

DDS

DDS

Facebook
Twitter

Competit
ive
Custom

Paid
Social
Facebook
Twitter
Media Team

36

Planners

Client Team

data sources

DATA VOLUME

Voluminous Data

Digital
CRM
Research
- Surveys

- Demographics
- Campaigns

- Search
- Mobile
- Attribution
- Site
- Social
- Display

VARIETY and GRANULARITY
37

- Cookie Level
- UGC
- Geospatial
- Weather
- Sales
- Competitive

tech architecture
SaaS
Reporting
Platform

BI Tools

Analysts

Clients

Radio

WWW

Display
Ads

Search

S3

Redshift

Social

Feeds

38

Hadoop EMR

MySQL RDS

EC2

Beanstalk

ETL
Extract
Files loaded on Amazon S3/Amazon Glacier

Transform
Utilize Pig on Amazon EMR to cleanse,
standardize and validate the data

Radio

Glacier

Display
Ads

S3

Redshift

Search

Load
Use COPY to load Pig output

Social

Feeds
Hadoop EMR

39

data warehouse
Performance
Handles humongous aggregation quickly

Tableau,
BI Tools
Analysts

Cheap, fast, easily scalable

ODBC and JDBC access
For BI / adhoc analysis

Redshift

40

aggregation
Mapping

Radio

Join performance data with metadata
Display
Ads

Multi-step aggregation

SQL

Product,
Campaign

In Amazon Redshift using SQL
Search
Views, Clicks,
CTR, CPC etc

Load aggregates

Social

in MySQL for sub second web response
Aggregates

Redshift

41

MySQL RDS

data workflow
Jenkins for client+channel ETL
Job control dashboard

Jenkins

Ruby for provisioning, job flow
Data intake/extract
Amazon DynamoDB for state management

Ruby

DynamoDB

On demand, job-initiated
Amazon EMR clusters

S3

42

Hadoop EMR

Redshift

MySQL RDS

SaaS dashboard
Designed for redundancy
Hardware and location

Client1.com

Client2.com

ElastiCache

Multi-Tenant
Managed services

DNS

Automated stack provisioning
For clients
MySQL RDS

43

EC2
Beanstalk

Load
Balancing

AWS advantages
Innovate

US

Quickly with reduced risk

AMAZON

Time
To market

Java

Ruby
Python

Lower
Operational overhead

Highly
Scalable

44

Developers

DevOps

AWS Ops

learnings
Metadata is more important than the data
Design for scalability upfront
Always explore better ways to aggregate
Cost management is very important
Build Agile: Perform early end-to-end validation on smaller dataset
Separate data visualization, data cleansing, storage & data aggregation
Be smart about implementing data aggregation routines across multiple granularities

45

Please give us your feedback on this
presentation

DAT205
As a thank you, we will select prize
winners daily for completed surveys!

Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Similar to Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013 (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013