Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013
 

Like this? Share it with your network

Share

Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

on

  • 4,149 views

Since Amazon Redshift launched last year, it has been adopted by a wide variety of companies for data warehousing. In this session, learn how customers NASDAQ, HauteLook, and Roundarch Isobar are ...

Since Amazon Redshift launched last year, it has been adopted by a wide variety of companies for data warehousing. In this session, learn how customers NASDAQ, HauteLook, and Roundarch Isobar are taking advantage of Amazon Redshift for three unique use cases: enterprise, big data, and SaaS. Learn about their implementations and how they made data analysis faster, cheaper, and easier with Amazon Redshift.

Statistics

Views

Total Views
4,149
Views on SlideShare
4,126
Embed Views
23

Actions

Likes
10
Downloads
82
Comments
0

1 Embed 23

https://twitter.com 23

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013 Presentation Transcript

  • 1. DAT 205 - Amazon Redshift in Action Enterprise, Big Data, and SaaS Use Cases November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Amazon Redshift Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year
  • 3. Amazon Redshift architecture • Leader Node – – – JDBC/ODBC SQL endpoint Stores metadata Coordinates query execution • Compute Nodes – – – – 10 GigE (HPC) Local, columnar storage Execute queries in parallel Load, backup, restore via Amazon S3 Parallel load from Amazon DynamoDB • Single node version available Ingestion Backup Restore
  • 4. Amazon Redshift is priced to let you analyze all your data Price Per Hour for HS1.XL Single Node Effective Hourly Price per TB Effective Annual Price per TB On-Demand $ 0.850 $ 0.425 $ 3,723 1 Year Reservation $ 0.500 $ 0.250 $ 2,190 3 Year Reservation $ 0.228 $ 0.114 $ 999 Simple Pricing Number of Nodes x Cost per Hour No charge for Leader Node No upfront costs Pay as you go
  • 5. Data Warehousing for Capital Markets Jason Timmes, AVP of Software Development, NASDAQ OMX November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 6. Where innovation meets action OUR TECHNOLOGY WE OWN AND OPERATE IS USED TO POWER MORE THAN 70 M ARKETPLACES 26 MARKETS including IN 50 COUNTRIES 3 CLEARINGHOUSES 1 MILLION MESSAGES/SECOND AT A MEDIAN SPEED OF SUB-55 MICROSECONDS POWER 1 IN 10 OF THE WORLD’S SECURITIES TRANSACTIONS AND 5 CENTRAL SECURITIES OUR GLOBAL PLATFORM CAN HANDLE MORE THAN WE D E P OS ITOR IE S MORE THAN 5500 STRUCTURED PRODUCTS ARE TIED TO OUR GLOBAL INDEXES WITH THE NOTIONAL VALUE OF AT LEAST $1 TRILLION WE LIST ~3300 GLOBAL COMPANIES WORTH $6 TRILLION IN MARKET CAP REPRESENTING DIVERSE INDUSTRIES AND MANY OF THE WORLD’S MOST WELL-KNOWN AND INNOVATIVE BRANDS 6
  • 7. What I do New data and analytics platforms to store and serve data to internal and external customers.
  • 8. The Challenge • Archiving Market Data – classic “Big Data” problem • Power Surveillance and Business Intelligence/Analytics • Minimize cost – Not only infrastructure, but development/IT labor costs too • Empower the business for self-service
  • 9. SIP Total Monthly Message Volumes OPRA, UQDF and CQS Market Data Is Big Data Total Monthly Message Volume Date Aug-12 Sep-12 Oct-12 Nov-12 Dec-12 Jan-13 Feb-13 Mar-13 Apr-13 May-13 Jun-13 Jul-13 Aug-13 Charts courtesy of the Financial Information Forum NASDAQ Exchange Daily Peak Messages 600,000,000 500,000,000 400,000,000 300,000,000 200,000,000 100,000,000 0 OPRA Annual Increase: 69% CQS Annual Increase: 10% UQDF Annual Decrease: 6% Jan-13 Feb-13 Mar-13 Apr-13 May-13 Jun-13 Jul-13 Aug-13 Sep-13 Financial Information Forum, Redistribution without permission from FIF prohibited, email: fifinfo@fif.com UQDF 2,317,804,321 1,948,330,199 1,016,336,632 2,148,867,295 2,017,355,401 2,099,233,536 1,969,123,978 2,010,832,630 2,447,109,450 2,400,946,680 2,601,863,331 2,142,134,920 2,188,338,764 CQS 8,241,554,280 7,452,279,225 7,452,279,225 9,552,313,807 8,052,399,165 7,474,101,082 7,531,093,813 7,896,498,260 9,805,224,566 9,430,865,048 11,062,086,463 8,266,215,553 9,079,813,726 Total Monthly Message Volume Date OPRA Aug-12 80,600,107,361 Sep-12 77,303,404,427 Oct-12 98,407,788,187 Nov-12 104,739,265,089 Dec-12 81,363,853,339 Jan-13 82,227,243,377 Feb-13 87,207,025,489 Mar-13 93,573,969,245 Apr-13 123,865,614,055 May-13 134,587,099,561 Jun-13 162,771,803,250 Jul-13 120,920,111,089 Aug-13 136,237,441,349 Combined Average Daily Volume 459,102,548 494,768,917 403,267,422 557,199,100 503,487,728 455,873,077 500,011,463 495,366,545 556,924,273 537,809,624 683,197,490 473,106,840 512,188,750 Average Daily Volume 3,504,352,494 4,068,600,233 4,686,085,152 4,987,584,052 4,068,192,667 3,915,583,018 4,589,843,447 4,678,698,462 5,630,255,184 6,117,595,435 8,138,590,163 5,496,368,686 6,192,610,970 23
  • 10. Our legacy solution • On-premises MPP DB – Relatively expensive, finite storage – Required periodic additional expenses to add more storage – Ongoing IT (administrative) human costs • Legacy BI tool – Requires developer involvement for new data sources, reports, dashboards, etc.
  • 11. New Solution: Amazon Redshift • Cost Effective – Redshift is 43% of the cost of legacy • Assuming equal storage capacities – Doesn’t include IT ongoing costs! • Performance – Easily outperforms our legacy BI/DB solution – Insert 550K rows/second on a 2 node 8XL cluster • Elastic – Add additional capacity on demand, easy to grow our cluster
  • 12. New Solution: Pentaho BI/ETL • Amazon Redshift partner – http://aws.amazon.com/redshift/par tners/pentaho/ • Self Service – Tools empower BI users to integrate new data sources, create their own analytics, dashboards, and reports without requiring development involvement • Cost effective
  • 13. Net Result • New solution is cheaper, faster, and offers capabilities that our business didn’t have before – Empowers our business users to explore data like they never could before – Reduces IT and development as bottlenecks – Margin improvement (expense reduction and supports business decisions to grow revenue)
  • 14. HauteLook + Amazon Redshift A Case Study Kevin Diamond, HauteLook November 15, 2014 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 15. Who am I? Kevin Diamond • CTO of HauteLook, a Nordstrom Company • Oversee all technology, infrastructure, data, engineering, etc. • Major focus on great customer experience and the analytics to provide it
  • 16. What is HauteLook? • Private sale, members-only limited-time sale events • Premium fashion and lifestyle brands at exclusive prices of 50-75% off • Over 20 new sale events begin each morning at 8am PST • Over 14 million members • Acquired by Nordstrom in 2011
  • 17. Why a Data Warehouse? • Centralized storage of multiple data sources • Singular reporting consistency for all departments • Data model that supports analytics not transactions • Operational reports vs. analytical reports – Real-time vs. previous day
  • 18. Why Amazon Redshift? • Looked at some competitors: – Ranged from $ to $$$ – All required Software, Implementation and BIG Hardware • Skipped the RFP • Jumped into the Public Beta of Amazon Redshift and never looked back
  • 19. How We Implemented Amazon Redshift • ETL from MySQL and Microsoft SQL Server into AWS across a Direct Connect line storing on S3 • Also used S3 to dump flat files (iTunes Connect Data, Web Analytics dumps, log files, etc) • Used AWS Data Pipeline for executing Sqoop and Hadoop running on EC2 to load data into Amazon Redshift • Redshift Data Model based on Star Schema which looks something like …
  • 20. Example of Star Schema
  • 21. Usage with Business Intelligence • Already selected a BI Tool • Had difficulty deploying in the cloud • But worked great on-premises • Easily tied into Amazon Redshift using ODBC Drivers • BUT, metadata for reports had to live in MSSQL • Ported many SSIS/SSRS reports over – But only the analytical reports!
  • 22. And it all looks like this
  • 23. Amazon Redshift Instances • We use a little under 2TB • Thought to use 2 - BIG 8XL instance to get great performance (in passive failover mode) • Cost us $$$ • Then we tested using 6 - XL instances in a cluster • Performed better and allowed for more concurrency of queries in all but a handful of cases that really needed the 8XL power • Cost us $ • Duh! That’s why we do distributed everything else!!
  • 24. Some First Hand Experience • ETL was hardest part • Amazon Redshift performs awesome • Someone needs to make a great client SQL tool • MicroStrategy works great on it (just wished it loved running in EC2) • Saving a ton, thanks to: – No hardware costs – No maintenance/overhead (rack + power) – Annual costs are equivalent to just the annual maintenance of some of the cheaper DW on-premises options
  • 25. Conclusion/Last Advice • Only use 8XL instances if you need >2TB of space – Otherwise distribute on a bunch of XL nodes • Buy reserved instances (we still need to do this!) since you likely will have this always on • Although we haven’t yet, the idea of a flexible scale-up/down DW is crazy awesome – maybe during Holiday we will • Probably could have used Elastic MapReduce instead of Hadoop – wasn’t sure how it would play with Sqoop • Almost all BI tools play with Amazon Redshift now, so choose what is right for your business, and make sure it works in EC2 before just putting it there • Communication between AWS and your DC is easy and fast, but I recommend a Direct Connect • Passed our rigorous information security standards, but used in a VPC
  • 26. Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases Parag Thakker – VP, Roundarch Isobar Colin McGuigan – Architect, Roundarch Isobar November 15th, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 27. roundarch isobar OUR SERVICES ACROSS BOUGHT, OWNED AND EARNED MEDIA Strategies Campaigns Experiences Platforms Products We digitally transform business processes and disrupt industries We create, measure and optimize digitally-focused campaigns We produce joyful experiences that inspire consumer interaction We design and build flexible and scalable technology solutions We invent digital products that generate new revenue streams Audience insight Research: competitive, segmentation, persona development, heuristics Platforms: content management, search, portals, mobile, frontend technology, internet-enabled devices/wearables, social apps, web services, security, big data, hosting Digital products Business planning: competitive & industry analysis, business cases, maturity models, roadmaps Strategies: brand, interactive, multichannel, social, content 27 Communications planning Creative: advertising, visual design, content creation, studio production Optimization: analytics, monitoring, SEO, MVT, media ROI analysis Requirements and specifications: content analysis and specs, functional requirements, functional specifications User experience design: information architecture, taxonomy and meta data, interaction design, mobile Digital product extensions Brand as a service
  • 28. We have served the U.S. Air Force since 2001, building their enterprise portal and many mission-critical applications U.S. Air Force Key metrics for our USAF work include: • 900,000+ registered users • Portal availability over 99.9% of time • 700,000+ PK-E users • 28 production enterprise services • Response time worldwide: 3 seconds for 80% of all pages • Over 300 applications available • Over 1.2 million logins/week • Public-facing and secure private instances (NIPR & SIPR) • 124,000 unique daily users 28 • 4-5+ million pages daily (40-70 Mbit/sec) • Portal support for over 5,000 “Communities of Interest”
  • 29. Transforming in-stadium operations through a touch-screen command center New York Jets Our executive touch-screen environment provides real-time stadium and game data, allowing the Jets owner, Woody Johnson, to monitor the fan experience during game time and make operational decisions that help maximize sales. The command center provides summary-level and drill-down views of stadium operations such as tickets, parking and concessions. It also creates predictive algorithms that help identify pinch points and open revenue opportunities. 29 “We brought the big picture close enough to identify new, better ways to do business.”
  • 30. Through a joint venture with Copia Capital, we created a new product offering for William Blair William Blair | Investment Research Management System • Facilitates collaboration between portfolio managers and analysts Technology: • Provides a holistic view of a company/stock • Uses Jquery, JavascriptMVC, Less – What is everything our organization knows about AAPL • Digitizes PDF/Excel tools and reports to enable rich, dynamic interactions • Simplifies content creation; e.g., comments, recommendation reports, document upload • Rich charting and visualization of analytics 30 • JavaScript, HTML5, CSS3 • JSON Web Services • Java, Spring, JPA, Mongo DB • User comment: “We love how fast it is!”
  • 31. What is the focus of your CMO today? Optimize marketing spend across all channels (Bought, Earned and Owned) 31
  • 32. domain marketing spend billions Web Mobile Display Ads Affiliate Search Sales hundreds data size Email TV Radio dozens data sources Print Social media channels multiple terabytes clients multiple 32
  • 33. marketing effectiveness stages DLP Scorecard Sonar AMNET Compass Optimize Scorecard Real-Time and Non-Real-Time Learn Analyze • Centralized cross channel Big Data Platform • Standardized cross channel reporting tools • Discovery tools to identify channel optimization opportunities • Modeling solutions • Channel experience enhancements • Improved media buying, planning & reporting functions • Real time integration into DSP • A/B testing based micro segment adjustments
  • 34. So what have we accomplished? Built Marketing Analytics Platform - Radar with 200+ in-time analytics, reporting andfrequency, granularity forenable feeds (1TB/week) with various optimization to scalable multi-tenant in 3 platform on Amazon as multiple clients with customized metrics with first launch SaaS months and classification 34
  • 35. scorecard dashboard 35
  • 36. scorecard logical architecture Media Team Display Paid Search Organic Search Digital Video Site Metrics Sales Google DFA Google Bing Marin Google Bing Custom Google Omniture Client Stakeholders TBD Scorecard App Detailed Analytic Reports TV Radio Print OOH Earned Social DDS DDS DDS Facebook Twitter Competit ive Custom Paid Social Facebook Twitter Media Team 36 Planners Client Team
  • 37. data sources DATA VOLUME Voluminous Data Digital CRM Research - Surveys - Demographics - Campaigns - Search - Mobile - Attribution - Site - Social - Display VARIETY and GRANULARITY 37 - Cookie Level - UGC - Geospatial - Weather - Sales - Competitive
  • 38. tech architecture SaaS Reporting Platform BI Tools Analysts Clients Radio WWW Display Ads Search S3 Redshift Social Feeds 38 Hadoop EMR MySQL RDS EC2 Beanstalk
  • 39. ETL Extract Files loaded on Amazon S3/Amazon Glacier Transform Utilize Pig on Amazon EMR to cleanse, standardize and validate the data Radio Glacier Display Ads S3 Redshift Search Load Use COPY to load Pig output Social Feeds Hadoop EMR 39
  • 40. data warehouse Performance Handles humongous aggregation quickly Tableau, BI Tools Analysts Cheap, fast, easily scalable ODBC and JDBC access For BI / adhoc analysis Redshift 40
  • 41. aggregation Mapping Radio Join performance data with metadata Display Ads Multi-step aggregation SQL Product, Campaign In Amazon Redshift using SQL Search Views, Clicks, CTR, CPC etc Load aggregates Social in MySQL for sub second web response Aggregates Redshift 41 MySQL RDS
  • 42. data workflow Jenkins for client+channel ETL Job control dashboard Jenkins Ruby for provisioning, job flow Data intake/extract Amazon DynamoDB for state management Ruby DynamoDB On demand, job-initiated Amazon EMR clusters S3 42 Hadoop EMR Redshift MySQL RDS
  • 43. SaaS dashboard Designed for redundancy Hardware and location Client1.com Client2.com ElastiCache Multi-Tenant Managed services DNS Automated stack provisioning For clients MySQL RDS 43 EC2 Beanstalk Load Balancing
  • 44. AWS advantages Innovate US Quickly with reduced risk AMAZON Time To market Java Ruby Python Lower Operational overhead Highly Scalable 44 Developers DevOps AWS Ops
  • 45. learnings Metadata is more important than the data Design for scalability upfront Always explore better ways to aggregate Cost management is very important Build Agile: Perform early end-to-end validation on smaller dataset Separate data visualization, data cleansing, storage & data aggregation Be smart about implementing data aggregation routines across multiple granularities 45
  • 46. Please give us your feedback on this presentation DAT205 As a thank you, we will select prize winners daily for completed surveys!