Fast Cycle, Multi-Terabyte
Data Analysis
ClearStory Data Solution on Amazon Redshift
Today’s Speakers
2
Tina Adams
Senior Product Manager
Amazon Web Services
Andrew Yeung
Director, Product Marketing
ClearStory Data
Scott Anderson
Senior Sales Engineer
ClearStory Data
Agenda
•  Overview of Amazon Redshift
•  Fast Cycle Data Analysis with ClearStory Data on
Amazon Redshift
•  Demo
•  Q&A
3
Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year
Amazon Redshift
Amazon Redshift Architecture
•  Leader Node
–  SQL endpoint
–  Stores metadata
–  Coordinates query execution
•  Compute Nodes
–  Local, columnar storage
–  Execute queries in parallel
–  Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB or SSH
•  Two hardware platforms
–  Optimized for data processing
–  DW1: HDD; scale from 2TB to 1.6PB
–  DW2: SSD; scale from 160GB to 256TB
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3 / DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 cores
Compute
Node
128GB RAM
16TB disk
16 cores
Compute
Node
128GB RAM
16TB disk
16 cores
Compute
Node
Leader
Node
Amazon Redshift is priced to let you analyze all your data
•  Number	
  of	
  nodes	
  x	
  cost	
  per	
  
hour	
  
•  No	
  charge	
  for	
  leader	
  node	
  
•  No	
  upfront	
  costs	
  
•  Pay	
  as	
  you	
  go	
  
DW1 (HDD)
Price Per Hour for
DW1.XL Single
Node
Effective Annual
Price per TB
On-Demand $ 0.850 $ 3,723
1 Year
Reservation
$ 0.500 $ 2,190
3 Year
Reservation
$ 0.228 $ 999
DW2 (SSD)
Price Per Hour for
DW2.L Single Node
Effective Annual
Price per TB
On-Demand $ 0.250 $ 13,688
1 Year
Reservation
$ 0.161 $ 8,794
3 Year
Reservation
$ 0.100 $ 5,498
Common Customer Use Cases
•  Reduce costs by
extending DW rather than
adding HW
•  Migrate completely from
existing DW systems
•  Respond faster to
business
•  Improve performance by
an order of magnitude
•  Make more data
available for analysis
•  Access business data via
standard reporting tools
•  Add analytic functionality
to applications
•  Scale DW capacity as
demand grows
•  Reduce HW & SW costs
by an order of magnitude
Traditional Enterprise DW Companies with Big Data SaaS Companies
Selected Amazon Redshift Customers
Amazon Redshift integrates with multiple data sources
Amazon S3
Amazon EMR
Amazon Redshift
DynamoDB
Amazon RDS
Corporate Datacenter
ClearStory Data Solution for
Amazon Redshift
Consider the Following Question…
CPG/Retail
“Is daily product sales being impacted by
restocking rate, product freshness, store
merchandising, competitor pricing or
demographic buying patterns?”
Or…
Consider the Following Question…
Consumer Internet
“Who are my users, how long are they on the
system, what features are they accessing, how
do they decide what purchases to make?”
How would you find an answer, or uncover
new insight, on fast cycle?
Hurdles to Fast-Cycle Data Analysis
Proliferation of inconsistent, siloed views
Resulting Line-of-Business Pains
Lengthy round trip to
ask new questions
Resort to point solutions,
spreadsheets or desktop
visualization tools
Increased blind spots & slow decisions
No traceability to validate insights
Data Refresh
Velocity
Restrictions
Limited Data
Scale &
Data Formats
Slow Decision
Times
Skills Gap
Rigid Dashboards
Sampling of data
Limitations of Traditional Solutions
Date & Time
Location
Text
Currency
Categories
Numbers
ClearStory Data Solution Overview
More LOB Users
•  Interactive StoryBoards
for fast answers for LOB
More Speed
•  Reduce data
manipulation
•  Automates data
blending
•  Fast exploration
More Sources
•  More internal sources/
formats
•  Direct access to external
data
User&DataGovernance
Data Access Analysis/Exploration StoryBoards
Application
Data Steward Story Authors Business Users
Collaboration
Harmonization
Data Inference & Metadata
Platform
Date & Time
Location
Text
Currency
Categories
Numbers
Product Name
Product SKU
Product Cat
Product Brand
Zip Code
County
State
Internal Data External Data
Semi-
Structured
Structured Files API / Web Premium Public
Amazon
Redshift
Why ClearStory for Amazon Redshift?
Scale out as
data
volume
grows – no
constraints
Scalability
Less pre-
processing
and data
aggregation
Aggregation
Data
governance,
user
governance,
lineage and
traceability
Governance
Speed of
analysis –
enabled by
ClearStory’s
underlying
Spark-
based in-
memory
data
processing
Speed
Ease-of-use
on front-end
for any user.
Less
reliance on
users with
specialized
skillsets
Simplicity
Consumer Internet, Online Gaming
Need: Intra-Day Analysis on Large Volume Data Sets
16
Data
Captured
Gaming Platform
Amazon Redshift
Centralized
Data Store
Intra-Day,
Multi-
Terabyte
Analysis
with
ClearStory
Data
Understand user behavior based on usage patterns on online game.
Analyze drivers of in-app purchase revenue by partner source and user profile.
Partner NetworkBusiness Analyst
Executives
Collaboration
Event-based
Game Data
User Profile
Awards &
Promotions
In-App
Purchases
Leader in Dairy Products
How Are We Performing Daily by Grocery Store and Why?
17
Data
Sources
Internal Supply Chain Retailer’s Systems
Daily,
Fast-Cycle
Analysis
10+ Data Sources Blended Daily
Retailers / GrocersBusiness Analyst
Executives
Collaboration
Inventory Demand
Planning
Logistics VMI
Point-of-
Sales
Warehouse
Store
Shelves
Fill Rate
Syndicated Retail Sales Data
•  Holistic customer
analysis
•  Impacts of promos,
placement, price,
packaging
•  Collaborative
insight for key
stakeholders and
grocers
Converge
Disparate Data
Data Platform
•  Converge data silos
across the entire
supply chain
•  Spot sales
opportunities and
competitive threats
•  Speed of execution
driven by business
need
Demo
Proprietary & Confidential 18
Summary
1. More Data
- More Internal/External sources and diverse data formats
- Plus direct access to Amazon Redshift
2. More Speed
- Eliminate data manipulation
- And automates data blending for fast answers
3. More Business Consumption of Data
- New simple user model for any skillset
- Interactive StoryBoards for fast answers for line-of-business
Q&A
Fast Cycle, Multi-Terabyte Data Analysis with Amazon Redshift and ClearStory Data

Fast Cycle, Multi-Terabyte Data Analysis with Amazon Redshift and ClearStory Data

  • 1.
    Fast Cycle, Multi-Terabyte DataAnalysis ClearStory Data Solution on Amazon Redshift
  • 2.
    Today’s Speakers 2 Tina Adams SeniorProduct Manager Amazon Web Services Andrew Yeung Director, Product Marketing ClearStory Data Scott Anderson Senior Sales Engineer ClearStory Data
  • 3.
    Agenda •  Overview ofAmazon Redshift •  Fast Cycle Data Analysis with ClearStory Data on Amazon Redshift •  Demo •  Q&A 3
  • 4.
    Fast, simple, petabyte-scaledata warehousing for less than $1,000/TB/Year Amazon Redshift
  • 5.
    Amazon Redshift Architecture • Leader Node –  SQL endpoint –  Stores metadata –  Coordinates query execution •  Compute Nodes –  Local, columnar storage –  Execute queries in parallel –  Load, backup, restore via Amazon S3; load from Amazon DynamoDB or SSH •  Two hardware platforms –  Optimized for data processing –  DW1: HDD; scale from 2TB to 1.6PB –  DW2: SSD; scale from 160GB to 256TB 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node
  • 6.
    Amazon Redshift ispriced to let you analyze all your data •  Number  of  nodes  x  cost  per   hour   •  No  charge  for  leader  node   •  No  upfront  costs   •  Pay  as  you  go   DW1 (HDD) Price Per Hour for DW1.XL Single Node Effective Annual Price per TB On-Demand $ 0.850 $ 3,723 1 Year Reservation $ 0.500 $ 2,190 3 Year Reservation $ 0.228 $ 999 DW2 (SSD) Price Per Hour for DW2.L Single Node Effective Annual Price per TB On-Demand $ 0.250 $ 13,688 1 Year Reservation $ 0.161 $ 8,794 3 Year Reservation $ 0.100 $ 5,498
  • 7.
    Common Customer UseCases •  Reduce costs by extending DW rather than adding HW •  Migrate completely from existing DW systems •  Respond faster to business •  Improve performance by an order of magnitude •  Make more data available for analysis •  Access business data via standard reporting tools •  Add analytic functionality to applications •  Scale DW capacity as demand grows •  Reduce HW & SW costs by an order of magnitude Traditional Enterprise DW Companies with Big Data SaaS Companies
  • 8.
  • 9.
    Amazon Redshift integrateswith multiple data sources Amazon S3 Amazon EMR Amazon Redshift DynamoDB Amazon RDS Corporate Datacenter
  • 10.
    ClearStory Data Solutionfor Amazon Redshift
  • 11.
    Consider the FollowingQuestion… CPG/Retail “Is daily product sales being impacted by restocking rate, product freshness, store merchandising, competitor pricing or demographic buying patterns?” Or…
  • 12.
    Consider the FollowingQuestion… Consumer Internet “Who are my users, how long are they on the system, what features are they accessing, how do they decide what purchases to make?” How would you find an answer, or uncover new insight, on fast cycle?
  • 13.
    Hurdles to Fast-CycleData Analysis Proliferation of inconsistent, siloed views Resulting Line-of-Business Pains Lengthy round trip to ask new questions Resort to point solutions, spreadsheets or desktop visualization tools Increased blind spots & slow decisions No traceability to validate insights Data Refresh Velocity Restrictions Limited Data Scale & Data Formats Slow Decision Times Skills Gap Rigid Dashboards Sampling of data Limitations of Traditional Solutions
  • 14.
    Date & Time Location Text Currency Categories Numbers ClearStoryData Solution Overview More LOB Users •  Interactive StoryBoards for fast answers for LOB More Speed •  Reduce data manipulation •  Automates data blending •  Fast exploration More Sources •  More internal sources/ formats •  Direct access to external data User&DataGovernance Data Access Analysis/Exploration StoryBoards Application Data Steward Story Authors Business Users Collaboration Harmonization Data Inference & Metadata Platform Date & Time Location Text Currency Categories Numbers Product Name Product SKU Product Cat Product Brand Zip Code County State Internal Data External Data Semi- Structured Structured Files API / Web Premium Public Amazon Redshift
  • 15.
    Why ClearStory forAmazon Redshift? Scale out as data volume grows – no constraints Scalability Less pre- processing and data aggregation Aggregation Data governance, user governance, lineage and traceability Governance Speed of analysis – enabled by ClearStory’s underlying Spark- based in- memory data processing Speed Ease-of-use on front-end for any user. Less reliance on users with specialized skillsets Simplicity
  • 16.
    Consumer Internet, OnlineGaming Need: Intra-Day Analysis on Large Volume Data Sets 16 Data Captured Gaming Platform Amazon Redshift Centralized Data Store Intra-Day, Multi- Terabyte Analysis with ClearStory Data Understand user behavior based on usage patterns on online game. Analyze drivers of in-app purchase revenue by partner source and user profile. Partner NetworkBusiness Analyst Executives Collaboration Event-based Game Data User Profile Awards & Promotions In-App Purchases
  • 17.
    Leader in DairyProducts How Are We Performing Daily by Grocery Store and Why? 17 Data Sources Internal Supply Chain Retailer’s Systems Daily, Fast-Cycle Analysis 10+ Data Sources Blended Daily Retailers / GrocersBusiness Analyst Executives Collaboration Inventory Demand Planning Logistics VMI Point-of- Sales Warehouse Store Shelves Fill Rate Syndicated Retail Sales Data •  Holistic customer analysis •  Impacts of promos, placement, price, packaging •  Collaborative insight for key stakeholders and grocers Converge Disparate Data Data Platform •  Converge data silos across the entire supply chain •  Spot sales opportunities and competitive threats •  Speed of execution driven by business need
  • 18.
  • 19.
    Summary 1. More Data -More Internal/External sources and diverse data formats - Plus direct access to Amazon Redshift 2. More Speed - Eliminate data manipulation - And automates data blending for fast answers 3. More Business Consumption of Data - New simple user model for any skillset - Interactive StoryBoards for fast answers for line-of-business
  • 20.