AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift

  • 1,041 views
Uploaded on

Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift Customizing the customer experience based on user behavior is a constant challenge for today’s consumer apps. Business …

Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift Customizing the customer experience based on user behavior is a constant challenge for today’s consumer apps. Business intelligence helps analyze and model large amounts of data. Looker offers a modern approach to BI leveraging AWS that’s fast, agile, and easy to manage. Join this webinar to learn how MessageMe, which provides emotionally engaging messaging apps to consumers, leverages Looker business intelligence software and the Amazon Redshift data warehouse service to analyze billions of rows of customer data in seconds.

Webinar topics include:
• How MessageMe turns billions of rows of customer data stored in Amazon Redshift into actionable insights
• How Looker connects directly to Amazon Redshift in just a few clicks, enabling MessageMe to build a modern, big data analytics in the cloud. Who should attend
• Information or Solution Architects, Data Analysts, BI Directors, DBAs, Development Leads, Developers, or Technical IT Leaders.

Presenters:
• Justin Rosenthal, CTO, MessageMe
• Keenan Rice, VP, Marketing & Alliances, Looker
• Tina Adams, Senior Product Manager, AWS

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,041
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
27
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift
  • 2. Welcome Maya Cabassi Partner Marketing Manager Amazon Web Services
  • 3. Webinar Overview  Submit Your Questions using the Q&A tool.  A copy of today’s presentation will be made available on:  AWS SlideShare Channel@ http://www.slideshare.net/AmazonWebServices/  AWS Webinar Channel on YouTube@ http://www.youtube.com/channel/UCTnPlVzJI-ccQXlxjSvJmw
  • 4. Introducing Keenan Rice VP, Marketing & Alliances Looker Justin Rosenthal Tina Adams Chief Technology Officer MessageMe Senior Product Manager Amazon Web Services
  • 5. What We’ll Cover  Overview of Amazon Redshift data warehouse  How Looker integrates with Amazon Redshift to enable big data analytics in the cloud  How MessageMe turns application metrics stored in Amazon Redshift into actionable insights with Looker BI  Q&A
  • 6. Amazon Redshift Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year Tina Adams| tinaadam@amazon.com Senior Product Manager
  • 7. We set out to build… A fast and powerful, petabyte-scale data warehouse that is: A Lot Faster A Lot Cheaper Amazon Redshift A Lot Simpler
  • 8. Data warehousing done the AWS way Deploy • Easy to provision • Pay as you go, no up front costs • Fast, cheap, easy to use • SQL
  • 9. Common Customer Use Cases Traditional Enterprise DW Companies with Big Data SaaS Companies • Reduce costs by extending DW rather than adding HW • Improve performance by an order of magnitude • Add analytic functionality to applications • Migrate completely from existing DW systems • Make more data available for analysis • Scale DW capacity as demand grows • Respond faster to business; provision in minutes • Access business data via standard reporting tools • Reduce HW & SW costs by an order of magnitude
  • 10. Amazon Redshift Customers Channel
  • 11. Feature Delivery Unload logs (7/5) Temp Credentials (4/11) Sharing snapshots (7/18) DUB (4/25) Resource Level IAM (8/9) SOC1/2/3 (5/8) SHA1 Builtin (7/15) Statement Timeout (7/22) WLM Timeout/Wildcards (8/1) JDBC Fetch Size (6/27) UTF-8 Substitution (8/29) Service Launch (2/14) Kinesis EMR/HDFS/SSH copy, Distributed Tables, Audit Logging/CloudTrail, Concurrency, Resize Perf., Approximate Count Distinct, SNS Alerts (11/13) Split_part, Audit tables (10/3) EIP Support for VPC Clusters (12/28) PCI (8/22) SIN/SYD (10/8) PDX (4/2) Distributed Tables, Single Node Cursor Support, Maximum Connections to 500 (12/13) JSON, Regex, Cursors (9/10) NRT (6/5) CRC32 Builtin, CSV, Restore Progress (8/9) Timezone, Epoch, Autoformat (7/25) 4 byte UTF-8 (7/18) Unload Encrypted Files HSM Support (11/11)
  • 12. Amazon Redshift architecture • Leader Node – – Stores metadata – • SQL endpoint Coordinates query execution Compute Nodes – Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3 – Parallel load from Amazon Amazon S3, DynamoDB, EMR/HDFS/SSH Kinesis integration – • • JDBC/ODBC Hardware optimized for data processing 10 GigE (HPC) Ingestion Backup Restore Scale while remaining online from a single node to a 100 node 1.6 PB cluster
  • 13. Amazon Redshift is priced to let you analyze all your data Effective Hourly Price (single node) Effective Hourly Price Per TB Effective Annual Price per TB On-Demand $ 0.850 $ 0.425 $ 3,723 1 Year Reservation $ 0.500 $ 0.250 $ 2,190 3 Year Reservation $ 0.228 $ 0.114 $ Simple Pricing Number of Nodes x Cost per Hour No charge for Leader Node No upfront costs Pay as you go 999
  • 14. Amazon Redshift has security built-in • SSL to secure data in transit • Encryption to secure data at rest Customer VPC – AES-256; hardware accelerated – All blocks on disks and in Amazon S3 encrypted – HSM/CloudHSM JDBC/ODBC Internal Security Group 10 GigE (HPC) • No direct access to compute nodes • Amazon VPC support • SOC1/2/3, PCI level 1, and others Ingestion coming soon Backup Restore
  • 15. Amazon Redshift integrates with multiple data sources Corporate Datacenter Amazon RDS Amazon S3 JDBC ODBC Amazon Kinesis Amazon Redshift Amazon DynamoDB Amazon EMR
  • 16. Analytics For Today’s Data-Driven Organizations Keenan Rice, Vice President, Marketing & Alliances 1.28.14 17
  • 17. The New Data Landscape The Missed Innovation Cycle The Next Generation Innovative Customers MessageMe Intro 18
  • 18. Ridiculous Quantities of Event & Business Data Business Data New MPP ETL Data Warehouse Databases Data Analysts Business Users New Breed of Data Experts Data Modeling New Curious Generation Limited data discovery Expect Immediate Results New Data Landscape 19
  • 19. Event & Business Application Data New MPP databases No direct data access No reusability Cubes / Simple models BI Software One-time-use queries Heavy desktop apps Traditional BI Back to handcoding SQL Data Analysts Business Users New Breed of Data Experts New Curious Generation Expect Immediate Results Missed Innovation Cycle BI is a relic of the old (expensive) data landscape 20
  • 20. Load Query Transform Data Analysts Flexible Delivery Agile Modeling BI Software Web Based App Business Users High-Resolution Discovery Sharing & Collaboration Looker — The Next Generation Modern analytics, built for the new data landscape 21
  • 21. Load Query Transform Near real-time access to your Redshift data Data Analysts computing power of theBusiness Users • Exploit the BI Software Flexible Delivery High-Resolution Discovery AWS cloud and Redshift App Web Based • Agile Modeling • Sharing & Collaboration No need to re-architect or cube data Looker Inside 22
  • 22. Copy Query Transform • Extend the power of your data analysts Fold data as complex as necessary Business Users without any BI Software database effortDiscovery High-Resolution Web Based App Sharing & Collaboration • Use Git for agile team development • Data Analysts Flexible Delivery Agile Modeling Looker Intelligence 23
  • 23. Copy Transform • Powerful data discovery for anyone • Share, save, and collaborate Data Analysts BI Software Access allFlexible data, in an interactive App the Delivery Web Based Agile Modeling web application Query • Business Users High-Resolution Discovery Sharing & Collaboration Looker Everywhere 24
  • 24. A New Perspective Changing the way organizations make decisions 2012 Founded in Santa Cruz, California $18M Redpoint, First Round Capital & Pivot North 1200 Hours per month spent in Looker per customer 50+ Customers changing how they run their businesses Lloyd Tabb Frank Bien Marc Randolph Created first app server (Netscape), founder Mozilla.org, LiveOps, etc. Proven software exec: Greenplum, EMC Founder and first CEO Netflix © 2014 Looker Inc. All Rights Reserved. 25
  • 25. Who’s Lookering? Data-driven organizations realizing the power of Looker © 2014 Looker Inc. All Rights Reserved. 26
  • 26. Powering Analytics @ MessageMe 1. Redshift + Looker 2. Example Looker Report & Model 3. MessageMe Data Storage 4. Analytics Strategies 5. DynamoDB → Redshift
  • 27. Redshift + Looker Empower your team to answer their own questions. • What types of Stickers are sent the most? • How do event/holiday themed-packs perform? • Which SMS provider is most cost-effective? Internal dashboards and Looker link-sharing are commonplace. Looker makes the data accessible and Redshift makes it fast.
  • 28. Redshift + Looker
  • 29. Redshift + Looker
  • 30. Data Storage: Why Redshift? At Launch: • DynamoDB for all application data • MySQL for all statistics data RDS Config (March 2013) RDS Config (April 2013) Master: db.m1.xlarge (15GB) Slave: db.m1.xlarge (15GB) Master: db.m1.xlarge (15GB) Slave: db.m2.4xlarge (68GB) 90% of writes were via LOAD_DATA_INFILE, so write IOPS were not a problem. However, index sizes were growing quickly…
  • 31. Data Storage: Why Redshift? MySQL Status (April 2013) event message Engine InnoDB Engine InnoDB Index Width 48 Bytes / Row Index Width 32 Bytes / Row Row Count ~3 Billion Row Count ~2 Billion Index Size 144 GB Index Size 64 GB Slave: db.m2.4xlarge (68GB)
  • 32. Data Storage: Why Redshift? We could put data in, but we couldn’t get it back out! Possible Solutions 1. Summarize • PRO: Data compression • CON: Data loss 2. Shard • PRO: No data loss • CON: Difficult to query 3. Redshift?
  • 33. Data Storage: Current System Redshift (90%) MySQL (10%) • Append-only tables • Delayed, bulk inserts OK • • Examples: • `event` • `message` • `user_demographic` Examples: • `purchase` • `user_demograhic` Inline inserts Non-negotiable uniqueness requirements (ON DUPLICATE KEY UPDATE)
  • 34. Analytics Strategies w/ Billions of Rows Deep-dive queries w/ row-level specifics vs. Super fast top-line metrics, aggregates You get this out-of-the-box with Redshift 1. Summarization 2. Cached Derived Tables How do we get these, really fast?
  • 35. Analytics Strategies: Summarization sm_message message Columns `id` `sender_id` `recipient_user_id` `recipient_room_id` `message_type` `country` `os_family` `os_version` `app_version` `timestamp` Rows / Day 10-100,000,000 Columns 1,000:1 Compression `send_hour` `recipient_type` `message_type` `country` `os_family` `send_count` Rows / Day 10-100,000 How many doodles were sent each day in the US since we launched? 100 seconds vs. 3 seconds
  • 36. Analytics Strategies: Cached Derived Tables Some important queries will be complex and demand row-specific data. Summarizing is not an option, what to do? …build Cached Derived Tables • Turn long-running, complex queries into flat tables
  • 37. Analytics Strategies: Cached Derived Tables Example: Retention by Cohort SELECT … INTO TABLE `sm_retention_day` FROM ( SELECT …. FROM `user` JOIN `message` JOIN `user_source` ), ( SELECT …. FROM `user` JOIN `user_source` ) sm_retention_day `join_day` `nday` `country` `os_family` `os_version` `traffic_source` `active_users` `signups`
  • 38. DynamoDB → Redshift • Stats tables are homogenous and compact • Application data can be heterogeneous and heavy – Mixture of numbers, strings, binary, etc. How many users signed up this week with a .edu email address? COPY dynamodb://user
  • 39. Questions Contacts: Looker: http://www.looker.com/request-demo MessageMe: https://messageme.com AWS: aws.amazon.com/contact-us tinaadam@amazon.com
  • 40. We’d like your feedback. Please respond to a short survey. https://aws.asia.qualtrics.com/SE/?SID=SV_1 yUN9wjaZX960kd