Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ABD202_Best Practices for Building Serverless Big Data Applications

1,426 views

Published on

Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. In this session, we show you how to incorporate serverless concepts into your big data architectures. We explore the concepts behind and benefits of serverless architectures for big data, looking at design patterns to ingest, store, process, and visualize your data. Along the way, we explain when and how you can use serverless technologies to streamline data processing, minimize infrastructure management, and improve agility and robustness and share a reference architecture using a combination of cloud and open source technologies to solve your big data problems. Topics include: use cases and best practices for serverless big data applications; leveraging AWS technologies such as Amazon DynamoDB, Amazon S3, Amazon Kinesis, AWS Lambda, Amazon Athena, and Amazon EMR; and serverless ETL, event processing, ad hoc analysis, and real-time analytics.

  • Be the first to comment

ABD202_Best Practices for Building Serverless Big Data Applications

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ben Snively, Solutions Architect Data and Analytics N o v e m b e r , 2 0 1 7 AWS re:INVENT Best Practices for Building Serverless Big Data Applications
  2. 2. Agenda Serverless – what and why? Serverless – which service when? Common big data applications Fitting serverless into big data applications Next steps…
  3. 3. Serverless Analytics Evolution… Virtualized Managed Serverless Provision Servers Configure Clusters Run Analytics
  4. 4. No servers to provision or manage Scales with usage Never pay for idle Availability and fault-tolerance built in Serverless characteristics
  5. 5. Serverless nicely fits into big data platforms • Mix and match serverless, managed, and virtualized • Leverage services to easily • Rapidly ingest, categorize, and discover your data • Allow easy query and analysis of your data • Transform and Load data • Provide custom event based handlers • Serverless allows you to focuses more analytics and not on infrastructure or servers
  6. 6. Lambda • Run your code in the cloud—fully managed and highly available • Triggered through API or state changes in your setup • Scales automatically to match the incoming event rate • Node.js (JavaScript), Python, Java, and C# • Charged per 100-ms execution time Serverless compute
  7. 7. Amazon Athena Interactive query service • Query directly from Amazon S3 • Use ANSI SQL • Serverless • Multiple data formats • Pay per query
  8. 8. AWS Glue Serverless catalog and ETL/ELT service Data Catalog Job Authoring Job Execution Crawl, discover, and organize data Integration with managed and serverless analytics Serverless ETL – Pay for what you consume
  9. 9. Kinesis Streams • For technical developers • Build your own custom applications that process or analyze streaming data Kinesis Firehose • For all developers and data scientists • Easily load massive volumes of streaming data into Amazon S3, Amazon Redshift, and Amazon ES Kinesis Analytics • For all developers and data scientists • Easily analyze data streams using standard SQL queries Amazon Kinesis: Streaming data made easy Services make it easy to capture, deliver, and process streams on AWS
  10. 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Artificial Intelligence Applying Serverless to Big Data Applications?
  11. 11. Characteristics of a big data applications Future Proof Flexible Access Dive in Anywhere Collect Anything
  12. 12. Components of big data applications Catalog & Search Protect & Secure Access & User InterfaceIngest & Store Prepare & Transform Analyze & Reason
  13. 13. Components of big data applications Catalog & Search Protect & Secure Access & User InterfaceIngest & Store S3 Prepare & Transform Analyze & Reason Amazon Kinesis
  14. 14. Components of big data applications Protect & Secure Access & User InterfaceIngest & Store S3 Prepare & Transform AWS Glue Data Catalog Analyze & Reason Amazon Kinesis
  15. 15. Components of big data applications Protect & Secure Access & User InterfaceIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Prepare & Transform AWS Glue Data Catalog Analyze & Reason
  16. 16. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena AWS Glue Data Catalog Analyze & Reason
  17. 17. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena Amazon QuickSight AWS Glue Data Catalog Analyze & Reason
  18. 18. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena QuickSight Glue Data Catalog Analyze & Reason
  19. 19. Serverless real-time analytics Prepare & TransformIngest & Store Kinesis AWS Lambda Kinesis Analytics Preprocessing Logic to transform real- time data Serverless Streams and Firehose SQL-based analytics on streaming data QuickSight Analyze & Reason
  20. 20. Demonstration
  21. 21. Lambda pre-processing Transform, Enrich, Filter
  22. 22. Amazon Kinesis Analytics—SQL In this example: 19 Lines of SQL = Serverless Realtime Analytics
  23. 23. Larger picture of what we showed:
  24. 24. Fitting into existing real-time analytics Producer Apache Kafka KCL AWS Lambda Spark Streaming Apache Storm Amazon SNS Notifications Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES Alert Analytics Output KPI Serverless Managed DynamoDB Streams Kinesis Streams Virtualized Kinesis Analytics Apache FlinkSQS
  25. 25. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena Amazon QuickSight Glue Data Catalog Analyze & Reason
  26. 26. Serverless interactive analytics Prepare & TransformIngest & Store AWS Glue Amazon Athena QuickSight S3 AWS Glue Data Catalog Analyze & Reason
  27. 27. Demonstration
  28. 28. Interactive analytics Producer Amazon S3 Amazon Redshift Amazon EMR Presto Impala Spark Amazon Athena Serverless Managed Virtualized Amazon QuickSight
  29. 29. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena QuickSight AWS Glue Data Catalog Integrated Pipeline Analyze & Reason
  30. 30. Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight Amazon AI EMR Redshift Athena Kinesis RDS Central Storage Secure, cost-effective Storage in Amazon S3 S3 Snowball Database Migration Service Kinesis Firehose Direct Connect Data Ingestion Get your data into S3 Quickly and securely Protect and Secure Use entitlements to ensure data is secure and users’ identities are verified Processing & Analytics Use of predictive and prescriptive analytics to gain better understanding Security Token Service CloudWatch CloudTrail Key Management Service Data lake reference architecture Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight A Central Storage Secure, cost-effective Storage in Amazon S3 Glue ETL
  31. 31. The right tool for the right job Machine Learning/Deep Learning Business Reporting Data Scientists Data Engineer IDE Data Catalog Central Storage
  32. 32. What about existing Hadoop Clusters?
  33. 33. The right tool for the right job Machine Learning/Deep Learning Business Reporting Data Scientists Data Engineer IDE Data Catalog Central Storage
  34. 34. Serverless nicely fits into big data platforms • Mix and match serverless, managed, and virtualized services • Leverage services to easily • Rapidly ingest, categorize, and discover your data • Allow easy query and analysis of your data • Transform and Load data • Provide custom event based handlers • Serverless allows you to focuses more analytics and not on infrastructure or servers • Pay only for what you use
  35. 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! C L I C K T O A D D T E X T C L I C K T O A D D T E X T
  36. 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!

×