Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS

229 views

Published on

AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
客戶分享: Steven Hsieh

Published in: Technology
  • Login to see the comments

AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS

  1. 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T BuildingServerlessAnalytics onAWS Ivan Cheng Solutions Architect AWS Steven Hsieh Engineer TrendMicro
  2. 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T COLLECT STORE PROCESS/ ANALYZE CONSUME Data Answers Time to answer (Latency) Throughput Cost Data Processing START HERE WITH A BUSINESS CASE
  3. 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T To answer new questions quickly, we look to a modern data architecture design Massive upfront costs Overprovisioned capacity Long implementation times Pay as you go, for what you use Decoupled pipelines and engines Experimentation platform Ingest/ Collect Consume/ visualize Store Process/ analyze 1 4 0 9 5
  4. 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data Is Changing  Analytics Are Adopting Capture and store new data at PB-EB scale Do new type of analytics in a cost effective way • Machine learning • Big data processing • Real-time analytics • Full-text search New types of analytics
  5. 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T More data lakes and analytics than anywhere else More than 10,000 data lakes onAWS
  6. 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data Movement Analytics AWS Analytics Portfolio Broadest and deepest portfolio, purpose-built for builders + 10 more Redshift EMR (Spark & Hadoop) Athena Elasticsearch Service Kinesis Data Analytics Glue (Spark & Python) S3/Glacier GlueLake Formation Visualization, Engagement, & Machine Learning QuickSight SageMaker Comprehen d Le x Polly Rekognition Translate Transcribe Deep Learning AMIs Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka Data Lake Infrastructure & Management
  7. 7. Agility and Innovation Are Key Amazon SageMaker AWS Deep LearningAMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend AmazonTranslate AmazonTranscribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon KinesisVideo Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement
  8. 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Snowball Snowmobile Kinesis Data Firehose Kinesis Data Streams S3 Redshift EMR Athena Kinesis Elasticsearch Service Kinesis Video Streams AI Services QuickSight Durable and available; Exabyte scale Secure, compliant, auditable Rapid ingest and transformation Schema on read Decoupling of compute and storage On-demand resources, tiering, cost choices Robust Infrastructure
  9. 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Your choice of Amazon S3 storage classes Access FrequencyFrequent Infrequent • Active, frequently accessed data • Milliseconds access • > 3 AZ • $0.0210/GB • Data with changing access patterns • Milliseconds access • > 3 AZ • $0.0210 to $0.0125/GB • Monitoring fee per Obj. • Min storage duration • Infrequently accessed data • Milliseconds access • > 3 AZ • $0.0125/GB • Retrieval fee per GB • Min storage duration • Min object size S3 Standard S3 S-IA S3 Z-IA Amazon Glacier • Re-creatable, less accessed data • Milliseconds access • 1 AZ • $0.0100/GB • Retrieval fee per GB • Min storage duration • Min object size • Archive data • Select minutes or hours • > 3 AZ • $0.0040/GB • Retrieval fee per GB • Min storage duration • Min object size S3 INT
  10. 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Ingest Consume Amazon Kinesis BI Tools Data Analytics Pipeline Database Migration Service AWS Snowball Amazon MSK Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Process & Analyze Jupyter Notebooks Amazon API Gateway Amazon QuickSight Catalog AWS Glue Store Amazon S3 Store Amazon S3 Data sources Web logs / cookies ERP Connected devices
  11. 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Virtual machines Managed services Serverless Cloud Services Evolution
  12. 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Serverless analytics Deliver on-demand analytics on the data lake S3 Data lake Glue (ETL & Data Catalog) Athena QuickSight Serverless. Zero infrastructure. Zero administration Never pay for idle resources $ Availability and fault tolerance built in Automatically scales resources with usage AI/ML Devices Web Sensors Social Kinesis Data Firehose
  13. 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Athena-Interactive Analysis Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Supports Multiple Data Formats – Define Schema on Demand Fast. Really Fast. Interactive performance even for large datasets. Athena automatically executes queries in parallel, so most results come back within seconds. Open. Powerful. Standard Start Querying Instantly Pay Per Query Athena is serverless. Just point to your data in Amazon S3, define the schema, and start querying using the built- in query editor. Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet With Amazon Athena, you pay only for the queries that you run. You are charged $5 per terabyte scanned by your queries.
  14. 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon S3 Amazon Athena Data catalog Data Engineer Data Consumer AWS Tools and SDKs AWS Management Console Amazon QuickSight Amazon SageMaker User Analyst Data Scientist Use PyAthena to query Athena tables directly from Amazon SageMaker notebooks
  15. 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data consumption – Automated Reporting athena.startQueryExecution("SELECT * FROM business_view”) 1 2 3 4 5 1. Schedule query 2. Track QueryID for status 3. Query results to Amazon S3 4. New file trigger 5. Job complete notification Email notification Query_ID
  16. 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Athena Workgroups Athena Workgroups are used to isolate queries between different teams, workloads or applications, and to set limits on amount of data each query or the entire workgroup can process Workload Isolation Query Metrics Cost Controls
  17. 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Workgroups – Cost Controls • Per query data scanned threshold; exceeding, will cancel query • Trigger alarms to notify of increasing usage and cost • Disable Workgroup when all queries exceed a maximum threshold Any Athena metric: successful/failed & total queries, query run time, etc.
  18. 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Visualize your data with your favorite tools Featured Athena Partners Amazon QuickSight
  19. 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon QuickSight is a fully managed, serverless, cloud business intelligence system
  20. 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Why QuickSight Scalable From 10 users to 10,000, QuickSight seamlessly grows with you with no need for additional servers or infrastructure. No Servers to Manage QuickSight is a fully managed cloud service. There is no infrastructure to maintain or upgrade and no upfront costs. Fully integrated QuickSight integrates with your other AWS services and data sources giving you everything you need to build an end-to-end cloud analytics solution. Pay For What You Use Instead of buying costly licenses for all of your users, QuickSight allows you to share dashboards and reports and only pay when users access them.
  21. 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Connect to your data, wherever it is QuickSight allows you to connect to AWS data sources, Private VPC subnets, on-premise and hosted databases and third party business applications. On-premises Securely connect to on-premise databases and flat files like Excel and CSV In the cloud Connect to hosted database, big data formats, and secure VPCs Applications Connect directly to third party business applications • Salesforce • Square • Adobe Analytics • Jira • ServiceNow • Twitter • Github • Redshift • RDS • S3 • Athena • Aurora • Teradata • MySQL • Presto • Spark • SQL Server • Postgre SQL • MariaDB • Snowflake • IoT Analytics • Excel • CSV • Teradata • MySQL • SQL Server • PostgreSQL
  22. 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Embedding Dashboards In Your Application QuickSight allows you to seamlessly integrate interactive dashboards and analytics into your own applications • Enhance your applications with rich analytics and dashboards • Easy maintenance, no servers to manage • Fast! No Custom development or domain expertise needed • Leverage new features as we add them • Utilizes Pay-per-Session Pricing.
  23. 23. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  24. 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon S3 (Processed Data) Amazon Athena Amazon QuickSight Demo Scenario Glue Data catalog
  25. 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Building AWS Multi Account Cost Analytics Solution at Scale Steven Hsieh Engineer TrendMicro
  26. 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
  27. 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T About Me Steven Hsieh
  28. 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Background
  29. 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Pillars of
  30. 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Design Principles for Cost Optimization • Adopt a consumption model • Measure overall efficiency • Stop spending money on data center operations • Analyze and attribute expenditure • Use managed services to reduce cost of ownership Pay as you go / need
  31. 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Challenges Large Scale Accounts • Almost 400 accounts • Hard management via AWS console Multiple Data Sources • Billing data • Utilization data of AWS services ( e.g., EC2, S3)
  32. 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Challenges Permission Management • Multiple teams • Authorization of different team Insight for Better Design • Finding insight for design improvement • Providing utilization visibility for design change
  33. 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Other solution we have tried… AWS Billing Console • Hard to use in large scale • Single data source Amazon Redshift • Cost Model • ETL 3rd Party BI Tool • Expensive license fee • Additional operation cost
  34. 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Ideas + + • Data persistence in Amazon S3 • Data querying via Amazon Athena • Dashboard / Reporting via Amazon QuickSight
  35. 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Challenges
  36. 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Global Accelerator
  37. 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T • Using SQS to trigger parallel tasks • Lambda limitation: • Timeout: 15 minutes • /tmp: 512 MB • Spot instance interruptions • Fargate limitation: • Container storage: 10 GB • Run-task: 10 • Using assume role to collect data across accounts
  38. 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T • Using SNS to trace data uploading result • Preprocessing data before uploading to S3 • Only creator can modify datasets in QuickSight • Create view in Athena
  39. 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Global Accelerator • Web application host in Fargate • Lambda Integration with QuickSight for embedded URL. • Using ALB to handle all HTTPS interaction. • Permission & Metadata in DynamoDB • ADFS Federation using Cognito • Performance Improvement via AWS Global Accelerator • Web Security Enhancement via AWS WAF
  40. 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Quick Development & Evaluation
  41. 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Low Utilization & Right Sizing • Trusted Advisor Checks • Low utilization EC2 instances: CPU was 10% or less and network I/O was 5 MB or less on 4 or more days during last 14 days • Right Sizing • Analysis metric data to recommend proper instance type and size • Awareness of NIC driver and Linux virtualization type issue
  42. 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Saving Polar Bear • Analyzing the CPU utilization pattern • Tuning off non-production instances can saving almost 70% cost
  43. 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Recap • Using cost effective way to build the end-to-end BI solution • 2 power users $36 + ALB $18 = $54 • Using flexible reporting architecture to integrate with multiple data sources • Quick win & timely data driven decision • Validating innovation idea (e.g., the potential saving of polar bear project)
  44. 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Summary • More organizations building datalake on cloud to stay competitive • AWS provides the broadest and deepest portfolio of databases and analytics services includes machine learning. • Serverless Analytics helps you build modern data pipeline with increased agility and lower cost. • Learn more at: https://aws.amazon.com/big-data/datalakes-and-analytics/
  45. 45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ivan Cheng Solutions Architect AWS Steven Hsieh Engineer TrendMicro

×