Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deriving Value with Next Gen Analytics and ML Architectures

1,206 views

Published on

This presentation was delivered on March 19, 2019at Gartner's Data and Analytics Summit in Orlando, FL. Rahul Pathak, GM at AWS discusses Deriving Value with Next Gen Analytics and ML Architectures on AWS.

  • Be the first to comment

Deriving Value with Next Gen Analytics and ML Architectures

  1. 1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Deriving Value with Next Gen Analytics and ML Architectures Rahul Pathak, GM Big Data & Data Lakes March 19, 2019
  2. 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential 125+ million players Data provides a constant feedback loop for game designers Up-to-the-minute analysis of gamer satisfaction to drive gamer engagement Resulting in the most popular game played in the world 30 PB+ data lake in S3 growing at 2PB every month Fortnite
  3. 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Customers want more value from their data Growing exponentially From new sources Increasingly diverse Used by many people Analyzed by many applications
  4. 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Cloud data lakes are the future Data Lake Customers want: To move to a single store; i.e., a data lake in the cloud To store data securely in standard formats To grow to any scale, with low costs To analyze their data in a variety of ways To democratize data access and analysis
  5. 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Why choose AWS for data lakes and analytics? Most comprehensive Most secure Easiest to build Most cost-effective Most customers & partners
  6. 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Most comprehensive Broadest and deepest portfolio, purpose-built for builders Migration & Streaming Services Infrastructure Data Catalog & ETL Security & Management Dashboards Predictive Analytics Data Warehousing Big Data Processing Interactive Query Operational Analytics Real time Analytics Serverless Data processing Visualization & Machine Learning Data Movement Analytics Data Lake Infrastructure & Management
  7. 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Data Movement Analytics Most comprehensive Broadest and deepest portfolio, purpose-built for builders + 10 more Redshift EMR (Spark & Hadoop) Athena Elasticsearch Service Kinesis Data Analytics Glue (Spark & Python) S3/Glacier GlueLake Formation Visualization & Machine Learning QuickSight SageMaker Comprehend Lex Polly Rekognition Translate Transcribe Deep Learning AMIs Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka Data Lake Infrastructure & Management
  8. 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Amazon SageMaker Frameworks Interfaces EC2 P3 & P3dn EC2 C5 FPGASs GreenGrass Elastic Inference The Amazon ML stack Broadest & deepest set of capabilities AI Services ML Frameworks & Infrastructure Rekognition Image Polly Transcribe Translate Comprehend & Comprehend Medical Rekognition Video Textract Forecast PersonalizeLex Vision Speech ChatbotsLanguage Forecasting Recommendations Infrastructure Pre-built algorithms & notebooks Data labeling (Ground Truth) One-click model training & tuning Optimization (NEO) One-click deployment & hosting Reinforcement learningAlgorithms & models (AWS Marketplace for ML) Train DeployBuild ML Services
  9. 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Most secure Services for security and governance Compliance AWS Artifact Amazon Inspector Amazon Cloud HSM Amazon Cognito AWS CloudTrail Security Amazon GuardDuty AWS Shield AWS WAF Amazon Macie VPC Encryption AWS Certification Manager AWS Key Management Service Encryption at rest Encryption in transit Bring your own keys, HSM support Identity AWS IAM AWS SSO Amazon Cloud Directory AWS Directory Service AWS Organizations Customers need to have multiple levels of security, identity and access management, encryption, and compliance to secure their data lake
  10. 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Most secure — Certifications CSA Cloud Security Alliance Controls ISO 9001 Global Quality Standard ISO 27001 Security Management Controls ISO 27017 Cloud Specific Controls ISO 27018 Personal Data Protection PCI DSS Level 1 Payment Card Standards SOC 1 Audit Controls Report SOC 2 Security, Availability, & Confidentiality Report SOC 3 General Controls Report Global United States CJIS Criminal Justice Information Services DoD SRG DoD Data Processing FedRAMP Government Data Standards FERPA Educational Privacy Act FIPS Government Security Standards FISMA Federal Information Security Management GxP Quality Guidelines and Regulations ISO FFIEC Financial Institutions Regulation HIPPA Protected Health Information ITAR International Arms Regulations MPAA Protected Media Content NIST National Institute of Standards and Technology SEC Rule 17a-4(f) Financial Data Standards VPAT/Section 508 Accountability Standards Asia Pacific FISC [Japan] Financial Industry Information Systems IRAP [Australia] Australian Security Standards K-ISMS [Korea] Korean Information Security MTCS Tier 3 [Singapore] Multi-Tier Cloud Security Standard My Number Act [Japan] Personal Information Protection Europe C5 [Germany] Operational Security Attestation Cyber Essentials Plus [UK] Cyber Threat Protection G-Cloud [UK] UK Government Standards IT-Grundschutz [Germany] Baseline Protection Methodology X P G
  11. 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Most cost effective Decouple compute and storage, choice of PAYG analytics services Storage S3 tiers & intelligent tiering From $0.023 per GB/mo to as low as $0.004 per GB/mo Compute Spot & reserved instances Save up to 90% off on-demand prices EMR Autoscaling 57% less than on-premises per IDC report Redshift less than a tenth of the cost of traditional solutions. Athena & QuickSight Serverless pay only for what is used
  12. 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential More data lakes and analytics than anywhere else More than 10,000 data lakes on AWS
  13. 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Most partners to complement AWS offerings
  14. 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Data movement solutions Migration & Streaming Services Data Movement
  15. 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Most ways to move data to the data lake Data movement from on-premises datacenters Dedicated network connection Secure appliances Ruggedized shipping containers Database migration Gateway that lets applications write to the cloud Data movement from real-time sources Connect devices to AWS Real-time data streams Real-time video streams Data movement from real-time sources Data movement from your on-premises datacenters Amazon S3 Amazon Glacier AWS Glue Synchronizing data across environments Professional services and partners to help migration Data Movement
  16. 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Data lake infrastructure & management solutions Infrastructure Data Catalog & ETL Security & Management Data lake infrastructure & management
  17. 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential S3 Lake Formation & Glue Snowball Kinesis Data Streams Snowmobile Kinesis Data Firehose Redshift EMR Athena Kinesis Elasticsearch Service Robust data lake infrastructure SageMaker Comprehend Rekognition Durable and available; exabyte scale Secure, compliant, auditable Object-level controls for fine-grain access Fast performance by retrieving subsets of data Decoupling of compute and storage On-demand resources, tiering, cost choices Data lake infrastructure & management
  18. 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Build on robust data lake infrastructure with Amazon S3 ✔ 99.99999999999% durability ✔ Global replication capabilities ✔ Management features ✔ Cost-effective storage classes ✔ Most partner integrations Data lake infrastructure & management
  19. 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential “Zestimates are more up-to- date and accurate, because they’re built with the absolute latest data. That’s a huge benefit for our users, who depend on this information to influence their buying or selling decisions.” —Jasjeet Thind, Vice President of Data Science and Engineering, Zillow Group Data lake infrastructure & management
  20. 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Set up a catalog, ETL, and data prep with AWS Glue Serverless provisioning, configuration, and scaling to run your ETL jobs on Apache Spark Pay only for the resources used for jobs Crawl your data sources, identify data formats and suggest schemas and transformations Automates the effort in building, maintaining and running ETL jobs Data lake infrastructure & management
  21. 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential “Beeswax uses Amazon S3 and AWS Glue Data Catalog to build a highly reliable data lake that is fully managed by AWS. Our platform leverages the AWS Glue Data Catalog integration with Amazon EMR in Hive and SparkSQL applications to deliver reporting and optimization features to our customers.” —Ram Kumar Rengaswamy, CTO, Beeswax Data lake infrastructure & management
  22. 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Challenges to making a secure data lake Typical steps of building a data lake Move data2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics5 Setup storage1 Data lake infrastructure & management
  23. 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Build a secure data lake in days with AWS Lake Formation Move, store, catalog, and clean your data faster Move, store, catalog, and clean your data faster with Machine Learning Enforce security policies across multiple services Enforce security policies across multiple services Gain and manage new insights Empower analyst and data scientist to gain and manage new insights Data lake infrastructure & management
  24. 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Data lake infrastructure & management “With an enterprise-ready option like Lake Formation, we will be able to spend more time deriving value from our data rather than doing the heavy lifting involved in manually setting up and managing our data lake.” —Joshua Couch, VP Engineering at Fender Digital
  25. 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Analytics solutions Data Warehousing Big Data Processing Interactive Query Operational Analytics Real time Analytics Serverless Data processing
  26. 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Big data processing with Apache Spark & Hadoop with Amazon EMR Easy to use notebooks Low cost vs on-premises Elastic autoscaling Reliable 99.9% SLA Secure with encryption and keys Flexible, open source choice Analytics Enterprise-grade Easy Lowest cost
  27. 27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Analytics FINRA’s legacy system did not scale to handle 130 billion events per day. They needed to run complex surveillance queries over 40+ PB of data FINRA migrated their big data appliance to a S3 Data Lake and uses EMR for ingestion and processing
  28. 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Data warehouse for business reporting with Amazon RedShift Fast—up to 10x faster than traditional data warehouses Easy to setup, deploy and manage Cost-effective Scale on-demand for large data volume and high query concurrency Query data in open formats directly from the data lake Analytics
  29. 29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Analytics “20 percent of our queries now complete in less than one second. Best of all, we didn’t have to change anything to get this speed-up with Redshift, which supports our mission-critical workloads.” —Greg Rokita, Executive Director of Technology, Edmunds
  30. 30. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Challenge Needed to analyze data to find insights, identify opportunities and evaluate business performance The Oracle DW did not scale, was difficult to maintain and costly Solution Deployed a data lake with Amazon S3, and run analytics with Amazon Redshift, Amazon Redshift Spectrum, and Amazon EMR Result: They doubled the data stored (100PB), lowered costs, and was able to gain insights faster 50 PB of data 600,000 analytics jobs/day Analytics
  31. 31. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Migrated from on-premises data warehouse Built a data warehouse with Redshift and a data lake with S3 Analytics on data lake with Amazon Athena, Amazon Redshift Spectrum, and Amazon EMR Report delivery went from months to days Analytics
  32. 32. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Real-time analytics for timely insights with Amazon Kinesis Make streaming data available to multiple real-time analytics applications Run streaming applications without managing any infrastructure Durable to reduce the probability of data loss Scalable to process data from hundreds of thousands of sources with low latencies Analytics
  33. 33. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Analytics “Amazon Kinesis makes it simple to scale our solution end to end, including the capture, processing, and delivery of actionable insights. This empowers our customers to better understand their user base.” — Indu Narayan, Director of Data, Yieldmo
  34. 34. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Operational analytics for logs and search with Amazon Elasticsearch Fully managed; deploy production-ready cluster in minutes Direct access to Elasticsearch open-source APIs, Logstash and Kibana VPC support; at-rest and in-transit encryption Scale up and down easily Analytics
  35. 35. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Analytics “Ultimately, we are improving our software products and offering better service to our customers because of the real-time visibility we’re getting into log data.” “Amazon Elasticsearch Service enables data forensic activities to take place and help find and fix application problems faster.” —Tommy Li, Senior Software Architect, Autodesk
  36. 36. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Interactive analysis with Amazon Athena Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Ability to run SQL queries on data archived in Amazon Glacier (coming soon) Analytics
  37. 37. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Analytics “One of the big attractions of Amazon Athena is that it’s serverless and purely consumption-based.” —Matt Chesler, director of DevOps at Movable Ink “We only pay when we’re actually querying the data, and we don’t have to keep a cluster running all the time. Using Amazon Athena, we’re able to query seven years’ worth of data—adding up to hundreds of terabytes— get results at least 50 percent faster, and save nearly $15,000 per month.”
  38. 38. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Serverless analytics Deliver on-demand analytics on the data lake S3 Data lake Glue (ETL & Data Catalog) Athen a QuickSight Serverless. Zero infrastructure. Zero administration Never pay for idle resources $ Availability and fault tolerance built in Automatically scales resources with usage AWS IoT AI/ML Devices Web Sensors Social Analytics
  39. 39. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Visualization & machine learning solutions Dashboards Predictive Analytics Visualization & Machine Learning
  40. 40. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Visual insights for everyone with Amazon QuickSight Pay only for what you use Scale to tens of thousands of users Embedded analytics Build end-to-end BI solutions Visualization & Machine Learning
  41. 41. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Priced to allow access to everyone Create and publish dashboards Secure access to dashboards anytime, anywhere $18 /user/month Billed annually $0.30 /session* up to a max of $5 /user/month ReadersAuthors Visualization & Machine Learning
  42. 42. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Introducing ML Insights ML Anomaly Detection ML Forecasting Auto Narratives Visualization & Machine Learning
  43. 43. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential “ Customer Use Case – Embedding "Amazon QuickSight will allow us to quickly build fast, interactive dashboards that will seamlessly integrate with our Next Gen Stats applications. With the Amazon QuickSight Readers and pay-per-session pricing, we are able to extend these secure, customized and easy to use dashboards for each Club without having to provision servers or manage infrastructure – all while only paying for actual usage. We love the direction, and look forward to expanding use of Amazon QuickSight.” Matt Swensson, VP Emerging Products, NFL Use case: 500+ users (NFL teams, broadcasters, internal research team) Previous tools: Custom-built web application Auth: SAML-based SSO ” Visualization & Machine Learning
  44. 44. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Visualization & Machine Learning With over 20,000 Rio Tinto CRM users globally, QuickSight is providing an interactive solution to explore thousands of data points quickly and to ensure safety in every decision
  45. 45. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Visual insights for everyone With AWS ML & AI services Frameworks and interfaces for machine learning practitioners Platform services that make it easy for any developer to get started and get deep with ML Application services that enable developers to plug-in pre-built AI functionality into their apps Visualization & Machine Learning Amazon S3 Raw Data Initial training data is annotated by human labelers Active learning model is trained from human labeled data Ambiguous data is sent to human labelers for annotation Human labeled data is then sent back to retrain and improve the machine learning model Training data the model understands is labeled automatically An accurate training data set is ready for use in Amazon SageMaker
  46. 46. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Visualization & Machine Learning Using Amazon Translate, Lionbridge is able to scale machine translation in order to localize content faster and in more languages. Using Translate, Lionbridge was able to reduce translation costs by 20 percent.
  47. 47. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Near real-time fraud detection in Turbo Tax  Detecting account take-over and Identity theft detection Develop machine learning models that not only detect fraud offline but also enable the product to block it online Visualization & Machine Learning
  48. 48. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Closing Thoughts… • Data is growing 10x every 5 years; plan for scale and plan for change • Use open data formats to maximize your technical agility • Clean, well-governed data is the foundation for machine learning • AWS provides composable services that make it easy to build data-driven applications that drive business value
  49. 49. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential And, finally… Do your taxes by 4/15! (and no cheating if you’re using TurboTax )
  50. 50. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Thank you! Rahul Pathak rapathak@amazon.com

×