Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with Aggregate Knowledge

2,671 views
2,391 views

Published on



In today's world, consumer habits change fast and marketing decisions need to be made within seconds, not days. Delivering engaging advertising experiences requires real time, high performing architectures that provide digital advertisers the ability to measure and improve the performance of their campaigns and tie them more closely to corporate goals. The insights gleaned from the massive amounts of data collected can then be used to dynamically adjust media spend and creative execution for optimal performance. The AWS Cloud enables you to deliver marketing content and advertisements with the levels of availability, performance, and personalization that your customers expect. Plus, AWS lowers your costs. Join us to learn about how big data and low latency / high performing architectures are changing the game for digital advertising.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,671
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
36
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with Aggregate Knowledge

  1. 1. October 3, 2013 Amazon Redshift, Customer Acquisition Cost & Advertising ROI Rahul Pathak, AWS (@rahulpathak) Timon Karnezos, Aggregate Knowledge
  2. 2. AWS Database Services Fully managed SQL database service for OLTP workloads Amazon RDS Amazon DynamoDB Fully managed NoSQL service for massively scalable, high throughput, low latency workloads Amazon Redshift Fully managed fast and powerful, petabyte-scale data warehouse service Amazon ElastiCache Fully managed Memcached-compliant in memory caching service
  3. 3. Data Warehousing the AWS way Deploy Easy to provision Pay as you go, no up front costs Fast, cheap, easy to use SQL
  4. 4. • Fastest growing service in AWS history • 1,000+ customers; adding over a hundred a week • Over 20 partners; adding one a week • SOC1, SOC2, PCI certification obtained with more on the way • Available in US East (N. Virginia), US West (Oregon), EU (Ireland), Asia Pacific (Tokyo), with more regions coming soon Progress since launch on Feb 14, 2013
  5. 5. • LZO/LZOP compression support • JSON, Regex, Cursors • UTF-8 4 byte and invalid character substitution • CRC32, SHA1, MD5 • Statement and workload queue timeouts • Time zone support • JDBC Fetch size • UNLOAD encrypted files New features since launch on Feb 14, 2013 Full list: http://docs.aws.amazon.com/redshift/latest/dg/doc-history.html
  6. 6. Common Customer Use Cases • Reduce costs by extending DW rather than adding HW • Migrate completely from existing DW systems • Respond faster to business; provision in minutes • Improve performance by an order of magnitude • Make more data available for analysis • Access business data via standard reporting tools • Add analytic functionality to applications • Scale DW capacity as demand grows • Reduce HW & SW costs by an order of magnitude Traditional Enterprise DW Companies with Big Data SaaS Companies
  7. 7. • Customer acquisition – Ad spend – Traffic sources • Customer behavior – Clickstream – Referrals, sharing – Actions taken • Lifetime value – Conversions – Churn rate Digital marketing and advertising use cases Amazon S3 Amazon EMR Amazon Redshift JDBC/ODBCDynamoDB Amazon RDS
  8. 8. Amazon Redshift Customers “[Amazon Redshift] took an industry famous for its opaque pricing, high TCO and unreliable results and completely turned it on its head.” “Redshift is twenty times faster than Hive…The cost saving is even more impressive…Our analysts like [it] so much they don’t want to go back.” “Queries that used to take hours came back in seconds. Our analysts are orders of magnitude more productive.” “We saw 50% reduction in costs with 2x improvement in query times.” “We use Redshift anytime we need fast, interactive analysis.”
  9. 9. Amazon Redshift Customers “When we want to answer a question with Redshift, we just write a SQL query and get an answer within a few minutes – if not seconds.” “[We] run queries up to 50 times faster than our current OLAP solution.” “Customers can get consistent, accurate, and useful data fast - in weeks not months or years.” “Did I mention it's ridiculously fast? We'll be using it immediately to provide our analysts an alternative to Hadoop.” “Team played with Redshift today and concluded it is ****** awesome. Un-indexed complex queries returning in < 10s.”
  10. 10. • Leader Node – SQL endpoint – Stores metadata – Coordinates query execution • Compute Nodes – Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB • Single node version available Amazon Redshift architecture 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  11. 11. • Optimized for I/O intensive workloads • High disk density • Runs in HPC - fast network • HS1.8XL available on Amazon EC2 Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage
  12. 12. Amazon Redshift parallelizes and distributes everything • Query • Load • Backup/Restore • Resize
  13. 13. • Query • Load • Backup/Restore • Resize Amazon Redshift parallelizes and distributes everything • Load in parallel from Amazon S3 or Amazon DynamoDB • Columnar storage, automatic compression • Data automatically distributed and sorted according to DDL • Scales linearly with number of nodes
  14. 14. • Query • Load • Backup/Restore • Resize Amazon Redshift parallelizes and distributes everything • Backups to Amazon S3 are automatic, continuous and incremental • Configurable system snapshot retention period • Take user snapshots on-demand • Streaming restores enable you to resume querying faster
  15. 15. • Query • Load • Backup/Restore • Resize Amazon Redshift parallelizes and distributes everything • Resize while remaining online • Provision a new cluster in the background • Copy data in parallel from node to node • Only charged for source cluster
  16. 16. • Query • Load • Backup/Restore • Resize • Automatic SQL endpoint switchover via DNS • Decommission the source cluster • Simple operation via AWS Console or API Amazon Redshift parallelizes and distributes everything
  17. 17. Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores Single Node (2 TB) Cluster 2-32 Nodes (4 TB – 64 TB) Amazon Redshift lets you start small and grow big Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB) Note: Nodes not to scale
  18. 18. Price Per Hour for HS1.XL Single Node Effective Hourly Price per TB Effective Annual Price per TB On-Demand $ 0.850 $ 0.425 $ 3,723 1 Year Reservation $ 0.500 $ 0.250 $ 2,190 3 Year Reservation $ 0.228 $ 0.114 $ 999 Amazon Redshift is priced to let you analyze all your data Simple Pricing Number of Nodes x Cost per Hour No charge for Leader Node No upfront costs Pay as you go
  19. 19. • Provision in minutes • Monitor query performance • Point and click resize • Built in security • Automatic backups Amazon Redshift is easy to use
  20. 20. Amazon Redshift continuously backs up your data and recovers from failures • Replication within the cluster and backup to Amazon S3 to maintain multiple copies of data at all times • Backups to Amazon S3 are continuous, automatic, and incremental – Designed for eleven nines of durability • Continuous monitoring and automated recovery from failures of drives and nodes • Able to restore snapshots to any Availability Zone within a region
  21. 21. • SSL to secure data in transit • Encryption to secure data at rest – AES-256; hardware accelerated – All blocks on disks and in Amazon S3 encrypted • No direct access to compute nodes • Amazon VPC support Amazon Redshift has security built-in 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal Security Group JDBC/ODBC
  22. 22. MTA and Redshift Understanding the Cost of Customer Acquisition and Marketing ROI Timon Karnezos | @timonk The Future of Digital Advertising with Cloud Computing San Francisco, CA – 10/03/2013
  23. 23. 23 MTA vs. LTA Framing MTA
  24. 24. 24 Browsing the Web – Monday Tracking Impression (Site A) Time Monday
  25. 25. 25 Browsing the Web – Tuesday Tracking Impression (Site A) Time Tuesday
  26. 26. 26 Search – Wednesday Tracking Impression (Search) Time Wednesday
  27. 27. 27 Convert – Wednesday Time Conversion Wednesday
  28. 28. 28 View Chains by Site (Site A) (Search) Time Time Conversion Wednesday
  29. 29. 29 Properties of the Conversion Chains (Site A) (Search) Time Time Position: 3 Day: 2 Position: 2 Day: 1 Position: 1 Day: 0 (same as conv.) Conversion Conversion: Wednesday Chain Length: 3 (touches) Chain Age: 2 (days)
  30. 30. 30 Last-Touch Attribution (LTA) Site A: Search: 0 / 1 = 0.0 Conversions 1 / 1 = 1.0 Conversions (Site A) (Search) Time Time Position: 1 Day: 0 (same as conv.) Conversion
  31. 31. 31 Multi-Touch Attribution – 2 Positions (Touches) Site A: Search: 1 / 2 = 0.5 Conversions 1 / 2 = 0.5 Conversions (Site A) (Search) Time Time Position: 2 Day: 1 Position: 1 Day: 0 (same as conv.) Conversion
  32. 32. 32 Multi-Touch Attribution – 3 Positions (Touches) Site A: Search: 2 / 3 = 0.67 Conversions 1 / 3 = 0.33 Conversions (Site A) (Search) Time Time Position: 3 Day: 2 Position: 2 Day: 1 Position: 1 Day: 0 (same as conv.) Conversion
  33. 33. 33 Richer model space… More nuanced… Adaptable to client’s business! Why do we sell MTA?
  34. 34. 34 1. Build user sessions (chains) by site 2. Window over report period 3. Assign credit to sites 4. Aggregate by day Why is MTA hard? Daily scale ~109 impressions, cookies ~107 conversions ~104 sites x 90 per report
  35. 35. 35 Redshift is MPP Why is MTA easier with RS? Fast columnar scans ~109 rows in ~10s Sorting as main index 100k/s/$ load Even work distribution Cookie ID shards well
  36. 36. 36 Redshift is MPP Why is MTA easier with RS? Fast columnar scans ~109 rows in ~10s
  37. 37. 37 Redshift is MPP Why is MTA easier with RS? Sorting as main index 100k/s/$ load
  38. 38. 38 Redshift is MPP Why is MTA easier with RS? Even work distribution Cookie ID shards well
  39. 39. 39 Redshift is SQL Why is MTA easier with RS? Logical decomposition CTEs cut complexity Powerful aggregates COUNT DISTINCT works Window functions Market basket is easy
  40. 40. 40 Redshift is SQL Why is MTA easier with RS? Logical decomposition CTEs cut complexity
  41. 41. 41 Example: CTE WITH chains AS ( SELECT campaign_id, site_id, DENSE_RANK() OVER (PARTITION BY user_id, advertiser_id ORDER BY record_date DESC) AS position FROM impressions WHERE record_date >= 'YYYY-MM-DD' AND record_date < 'YYYY-MM-DD' ) SELECT campaign_id, site_id, position, COUNT(*) as ct FROM chains GROUP BY 1,2,3;
  42. 42. 42 Redshift is SQL Why is MTA easier with RS? Powerful aggregates COUNT DISTINCT works
  43. 43. 43 Redshift is SQL Why is MTA easier with RS? Window functions Market basket is easy
  44. 44. 44 Example: Window Function WITH chains AS ( SELECT campaign_id, site_id, DENSE_RANK() OVER (PARTITION BY user_id, advertiser_id ORDER BY record_date DESC) AS position FROM impressions WHERE record_date >= 'YYYY-MM-DD' AND record_date < 'YYYY-MM-DD' ) SELECT campaign_id, site_id, position, COUNT(*) as ct FROM chains GROUP BY 1,2,3;
  45. 45. 45 Redshift is EASY Bigger Picture: Why is X easier with RS? Credit card = eval No reps, no PoC Operations made simple Dashboards rock Integrations are outstanding S3 means no pain
  46. 46. 46 Redshift changed the game. http://bit.ly/rs_ak

×