Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

在 AWS 上構建無服務器分析

825 views

Published on

  • Be the first to comment

  • Be the first to like this

在 AWS 上構建無服務器分析

  1. 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building Serverless Analytics on AWS Ivan Cheng Solutions Architect AWS Steven Hsieh Engineer TrendMicro
  2. 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. COLLECT STORE PROCESS/ ANALYZE CONSUME Data Answers Time to answer (Latency) Throughput Cost Data Processing START HERE WITH A BUSINESS CASE
  3. 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. To answer new questions quickly, we look to a modern data architecture design Massive upfront costs Overprovisioned capacity Long implementation times Pay as you go, for what you use Decoupled pipelines and engines Experimentation platform Ingest/ Collect Consume/ visualize Store Process/ analyze 1 4 0 9 5
  4. 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Is Changing  Analytics Are Adopting Capture and store new data at PB-EB scale Do new type of analytics in a cost effective way • Machine learning • Big data processing • Real-time analytics • Full-text search New types of analytics
  5. 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. More data lakes and analytics than anywhere else More than 10,000 data lakes on AWS
  6. 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Movement Analytics AWS Analytics Portfolio Broadest and deepest portfolio, purpose-built for builders + 10 more Redshift EMR (Spark & Hadoop) Athena Elasticsearch Service Kinesis Data Analytics Glue (Spark & Python) S3/Glacier GlueLake Formation Visualization, Engagement, & Machine Learning QuickSight SageMaker Comprehend Lex Polly Rekognition Translate Transcribe Deep Learning AMIs Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka Data Lake Infrastructure & Management
  7. 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Snowball Snowmobile Kinesis Data Firehose Kinesis Data Streams S3 Redshift EMR Athena Kinesis Elasticsearch Service Kinesis Video Streams AI Services QuickSight Durable and available; Exabyte scale Secure, compliant, auditable Rapid ingest and transformation Schema on read Decoupling of compute and storage On-demand resources, tiering, cost choices Data Lake Robust Infrastructure
  8. 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ingest Consume Amazon Kinesis BI Tools Data Analytics Pipeline Database Migration Service AWS Snowball Amazon MSK Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Process & Analyze Jupyter Notebooks Amazon API Gateway Amazon QuickSight Catalog AWS Glue Store Amazon S3 Store Amazon S3 Data sources Web logs / cookies ERP Connected devices
  9. 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Virtual machines Managed services Serverless Cloud Services Evolution
  10. 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serverless analytics Deliver on-demand analytics on the data lake S3 Data lake Glue (ETL & Data Catalog) Athena QuickSight Serverless. Zero infrastructure. Zero administration Never pay for idle resources $ Availability and fault tolerance built in Automatically scales resources with usage AI/ML Devices Web Sensors Social Kinesis Data Firehose
  11. 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Athena-Interactive Analysis Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Supports Multiple Data Formats – Define Schema on Demand Fast. Really Fast. Interactive performance even for large datasets. Athena automatically executes queries in parallel, so most results come back within seconds. Open. Powerful. Standard Start Querying Instantly Pay Per Query Athena is serverless. Just point to your data in Amazon S3, define the schema, and start querying using the built-in query editor. Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet With Amazon Athena, you pay only for the queries that you run. You are charged $5 per terabyte scanned by your queries.
  12. 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 Amazon Athena Data catalog Data Engineer Data Consumer AWS Tools and SDKs AWS Management Console Amazon QuickSight Amazon SageMaker User Analyst Data Scientist
  13. 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data consumption – Automated Reporting athena.startQueryExecution("SELECT * FROM business_view”) 1 2 3 4 5 1. Schedule query 2. Track QueryID for status 3. Query results to Amazon S3 4. New file trigger 5. Job complete notification Email notification Query_ID
  14. 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Athena Workgroups Athena Workgroups are used to isolate queries between different teams, workloads or applications, and to set limits on amount of data each query or the entire workgroup can process Workload Isolation Query Metrics Cost Controls
  15. 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Visualize your data with your favorite tools Featured Athena Partners Amazon QuickSight
  16. 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why QuickSight Scalable From 10 users to 10,000, QuickSight seamlessly grows with you with no need for additional servers or infrastructure. No Servers to Manage QuickSight is a fully managed cloud service. There is no infrastructure to maintain or upgrade and no upfront costs. Fully integrated QuickSight integrates with your other AWS services and data sources giving you everything you need to build an end-to-end cloud analytics solution. Pay For What You Use Instead of buying costly licenses for all of your users, QuickSight allows you to share dashboards and reports and only pay when users access them.
  17. 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Connect to your data, wherever it is QuickSight allows you to connect to AWS data sources, Private VPC subnets, on-premise and hosted databases and third party business applications. On-premises Securely connect to on-premise databases and flat files like Excel and CSV In the cloud Connect to hosted database, big data formats, and secure VPCs Applications Connect directly to third party business applications • Salesforce • Square • Adobe Analytics • Jira • ServiceNow • Twitter • Github • Redshift • RDS • S3 • Athena • Aurora • Teradata • MySQL • Presto • Spark • SQL Server • Postgre SQL • MariaDB • Snowflake • IoT Analytics • Excel • CSV • Teradata • MySQL • SQL Server • PostgreSQL
  18. 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  19. 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 (Processed Data) Amazon Athena Amazon QuickSight Demo Scenario Glue Data catalog
  20. 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building AWS Multi Account Cost Analytics Solution at Scale Steven Hsieh Engineer TrendMicro
  21. 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  22. 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. About Me Steven Hsieh
  23. 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Background
  24. 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pillars of
  25. 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Design Principles for Cost Optimization • Adopt a consumption model • Measure overall efficiency • Stop spending money on data center operations • Analyze and attribute expenditure • Use managed services to reduce cost of ownership Pay as you go / need
  26. 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges Large Scale Accounts • Almost 400 accounts • Hard management via AWS console Multiple Data Sources • Billing data • Utilization data of AWS services ( e.g., EC2, S3)
  27. 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges Permission Management • Multiple teams • Authorization of different team Insight for Better Design • Finding insight for design improvement • Providing utilization visibility for design change
  28. 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Other solution we have tried… AWS Billing Console • Hard to use in large scale • Single data source Amazon Redshift • Cost Model • ETL 3rd Party BI Tool • Expensive license fee • Additional operation cost
  29. 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ideas + + • Data persistence in Amazon S3 • Data querying via Amazon Athena • Dashboard / Reporting via Amazon QuickSight
  30. 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges
  31. 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Global Accelerator
  32. 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Using SQS to trigger parallel tasks • Lambda limitation: • Timeout: 15 minutes • /tmp: 512 MB • Spot instance interruptions • Fargate limitation: • Container storage: 10 GB • Run-task: 10 • Using assume role to collect data across accounts
  33. 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Using SNS to trace data uploading result • Preprocessing data before uploading to S3 • Only creator can modify datasets in QuickSight • Create view in Athena
  34. 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Global Accelerator • Web application host in Fargate • Lambda Integration with QuickSight for embedded URL. • Using ALB to handle all HTTPS interaction. • Permission & Metadata in DynamoDB • ADFS Federation using Cognito • Performance Improvement via AWS Global Accelerator • Web Security Enhancement via AWS WAF
  35. 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Quick Development & Evaluation
  36. 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Low Utilization & Right Sizing • Trusted Advisor Checks • Low utilization EC2 instances: CPU was 10% or less and network I/O was 5 MB or less on 4 or more days during last 14 days • Right Sizing • Analysis metric data to recommend proper instance type and size • Awareness of NIC driver and Linux virtualization type issue
  37. 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Saving Polar Bear • Analyzing the CPU utilization pattern • Tuning off non-production instances can saving almost 70% cost
  38. 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recap • Using cost effective way to build the end-to-end BI solution • 2 power users $36 + ALB $18 = $54 • Using flexible reporting architecture to integrate with multiple data sources • Quick win & timely data driven decision • Validating innovation idea (e.g., the potential saving of polar bear project)
  39. 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary • More organizations building datalake on cloud to stay competitive • AWS provides the broadest and deepest portfolio of databases and analytics services includes machine learning. • Serverless Analytics helps you build modern data pipeline with increased agility and lower cost. • Learn more at: https://aws.amazon.com/big-data/datalakes-and-analytics/
  40. 40. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ivan Cheng Solutions Architect AWS Steven Hsieh Engineer TrendMicro

×