Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ben Snively, Solutions Architect
Data and Analyt...
Agenda
Serverless – what and why?
Serverless – which service when?
Common big data applications
Fitting serverless into bi...
Serverless Analytics Evolution…
Virtualized Managed Serverless
Provision
Servers
Configure
Clusters
Run
Analytics
No servers to provision
or manage
Scales with usage
Never pay for idle Availability and
fault-tolerance built in
Serverles...
Serverless nicely fits into big data platforms
• Mix and match serverless, managed, and virtualized
• Leverage services to...
Lambda
• Run your code in the cloud—fully
managed and highly available
• Triggered through API or state
changes in your se...
Amazon
Athena
Interactive query service
• Query directly from Amazon S3
• Use ANSI SQL
• Serverless
• Multiple data format...
AWS Glue
Serverless catalog and ETL/ELT service
Data Catalog
Job Authoring
Job Execution
Crawl, discover, and organize dat...
Kinesis Streams
• For technical developers
• Build your own custom
applications that process
or analyze streaming
data
Kin...
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Artificial
Intelligence
Applying Serverless to B...
Characteristics of a big data applications
Future
Proof
Flexible
Access
Dive in
Anywhere
Collect
Anything
Components of big data applications
Catalog
& Search
Protect
& Secure
Access &
User InterfaceIngest & Store
Prepare &
Tran...
Components of big data applications
Catalog
& Search
Protect
& Secure
Access &
User InterfaceIngest & Store
S3
Prepare &
T...
Components of big data applications
Protect
& Secure
Access &
User InterfaceIngest & Store
S3
Prepare &
Transform
AWS Glue...
Components of big data applications
Protect
& Secure
Access &
User InterfaceIngest & Store
Amazon
Kinesis
S3
AWS Lambda
AW...
Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Amazon
Kine...
Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Amazon
Kine...
Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Amazon
Kine...
Serverless real-time analytics
Prepare &
TransformIngest & Store
Kinesis AWS Lambda Kinesis Analytics
Preprocessing Logic
...
Demonstration
Lambda pre-processing
Transform, Enrich, Filter
Amazon Kinesis Analytics—SQL
In this example:
19 Lines of SQL = Serverless Realtime Analytics
Larger picture of what we showed:
Fitting into existing real-time analytics
Producer
Apache
Kafka
KCL
AWS Lambda
Spark
Streaming
Apache
Storm
Amazon
SNS
Not...
Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Amazon
Kine...
Serverless interactive analytics
Prepare &
TransformIngest & Store
AWS Glue
Amazon
Athena
QuickSight
S3
AWS Glue
Data Cata...
Demonstration
Interactive analytics
Producer Amazon S3
Amazon
Redshift
Amazon EMR
Presto
Impala
Spark
Amazon
Athena
Serverless
Managed
V...
Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Kinesis
S3
...
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasti...
The right tool for the right job
Machine Learning/Deep Learning
Business
Reporting
Data Scientists
Data Engineer
IDE
Data
...
What about existing Hadoop Clusters?
The right tool for the right job
Machine Learning/Deep Learning
Business
Reporting
Data Scientists
Data Engineer
IDE
Data
...
Serverless nicely fits into big data platforms
• Mix and match serverless, managed, and virtualized services
• Leverage se...
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
C L I C K T O A D D T E X T
C L I C K...
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
Upcoming SlideShare
Loading in …5
×

ABD202_Best Practices for Building Serverless Big Data Applications

1,901 views

Published on

Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. In this session, we show you how to incorporate serverless concepts into your big data architectures. We explore the concepts behind and benefits of serverless architectures for big data, looking at design patterns to ingest, store, process, and visualize your data. Along the way, we explain when and how you can use serverless technologies to streamline data processing, minimize infrastructure management, and improve agility and robustness and share a reference architecture using a combination of cloud and open source technologies to solve your big data problems. Topics include: use cases and best practices for serverless big data applications; leveraging AWS technologies such as Amazon DynamoDB, Amazon S3, Amazon Kinesis, AWS Lambda, Amazon Athena, and Amazon EMR; and serverless ETL, event processing, ad hoc analysis, and real-time analytics.

  • Be the first to comment

ABD202_Best Practices for Building Serverless Big Data Applications

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ben Snively, Solutions Architect Data and Analytics N o v e m b e r , 2 0 1 7 AWS re:INVENT Best Practices for Building Serverless Big Data Applications
  2. 2. Agenda Serverless – what and why? Serverless – which service when? Common big data applications Fitting serverless into big data applications Next steps…
  3. 3. Serverless Analytics Evolution… Virtualized Managed Serverless Provision Servers Configure Clusters Run Analytics
  4. 4. No servers to provision or manage Scales with usage Never pay for idle Availability and fault-tolerance built in Serverless characteristics
  5. 5. Serverless nicely fits into big data platforms • Mix and match serverless, managed, and virtualized • Leverage services to easily • Rapidly ingest, categorize, and discover your data • Allow easy query and analysis of your data • Transform and Load data • Provide custom event based handlers • Serverless allows you to focuses more analytics and not on infrastructure or servers
  6. 6. Lambda • Run your code in the cloud—fully managed and highly available • Triggered through API or state changes in your setup • Scales automatically to match the incoming event rate • Node.js (JavaScript), Python, Java, and C# • Charged per 100-ms execution time Serverless compute
  7. 7. Amazon Athena Interactive query service • Query directly from Amazon S3 • Use ANSI SQL • Serverless • Multiple data formats • Pay per query
  8. 8. AWS Glue Serverless catalog and ETL/ELT service Data Catalog Job Authoring Job Execution Crawl, discover, and organize data Integration with managed and serverless analytics Serverless ETL – Pay for what you consume
  9. 9. Kinesis Streams • For technical developers • Build your own custom applications that process or analyze streaming data Kinesis Firehose • For all developers and data scientists • Easily load massive volumes of streaming data into Amazon S3, Amazon Redshift, and Amazon ES Kinesis Analytics • For all developers and data scientists • Easily analyze data streams using standard SQL queries Amazon Kinesis: Streaming data made easy Services make it easy to capture, deliver, and process streams on AWS
  10. 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Artificial Intelligence Applying Serverless to Big Data Applications?
  11. 11. Characteristics of a big data applications Future Proof Flexible Access Dive in Anywhere Collect Anything
  12. 12. Components of big data applications Catalog & Search Protect & Secure Access & User InterfaceIngest & Store Prepare & Transform Analyze & Reason
  13. 13. Components of big data applications Catalog & Search Protect & Secure Access & User InterfaceIngest & Store S3 Prepare & Transform Analyze & Reason Amazon Kinesis
  14. 14. Components of big data applications Protect & Secure Access & User InterfaceIngest & Store S3 Prepare & Transform AWS Glue Data Catalog Analyze & Reason Amazon Kinesis
  15. 15. Components of big data applications Protect & Secure Access & User InterfaceIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Prepare & Transform AWS Glue Data Catalog Analyze & Reason
  16. 16. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena AWS Glue Data Catalog Analyze & Reason
  17. 17. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena Amazon QuickSight AWS Glue Data Catalog Analyze & Reason
  18. 18. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena QuickSight Glue Data Catalog Analyze & Reason
  19. 19. Serverless real-time analytics Prepare & TransformIngest & Store Kinesis AWS Lambda Kinesis Analytics Preprocessing Logic to transform real- time data Serverless Streams and Firehose SQL-based analytics on streaming data QuickSight Analyze & Reason
  20. 20. Demonstration
  21. 21. Lambda pre-processing Transform, Enrich, Filter
  22. 22. Amazon Kinesis Analytics—SQL In this example: 19 Lines of SQL = Serverless Realtime Analytics
  23. 23. Larger picture of what we showed:
  24. 24. Fitting into existing real-time analytics Producer Apache Kafka KCL AWS Lambda Spark Streaming Apache Storm Amazon SNS Notifications Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES Alert Analytics Output KPI Serverless Managed DynamoDB Streams Kinesis Streams Virtualized Kinesis Analytics Apache FlinkSQS
  25. 25. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena Amazon QuickSight Glue Data Catalog Analyze & Reason
  26. 26. Serverless interactive analytics Prepare & TransformIngest & Store AWS Glue Amazon Athena QuickSight S3 AWS Glue Data Catalog Analyze & Reason
  27. 27. Demonstration
  28. 28. Interactive analytics Producer Amazon S3 Amazon Redshift Amazon EMR Presto Impala Spark Amazon Athena Serverless Managed Virtualized Amazon QuickSight
  29. 29. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena QuickSight AWS Glue Data Catalog Integrated Pipeline Analyze & Reason
  30. 30. Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight Amazon AI EMR Redshift Athena Kinesis RDS Central Storage Secure, cost-effective Storage in Amazon S3 S3 Snowball Database Migration Service Kinesis Firehose Direct Connect Data Ingestion Get your data into S3 Quickly and securely Protect and Secure Use entitlements to ensure data is secure and users’ identities are verified Processing & Analytics Use of predictive and prescriptive analytics to gain better understanding Security Token Service CloudWatch CloudTrail Key Management Service Data lake reference architecture Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight A Central Storage Secure, cost-effective Storage in Amazon S3 Glue ETL
  31. 31. The right tool for the right job Machine Learning/Deep Learning Business Reporting Data Scientists Data Engineer IDE Data Catalog Central Storage
  32. 32. What about existing Hadoop Clusters?
  33. 33. The right tool for the right job Machine Learning/Deep Learning Business Reporting Data Scientists Data Engineer IDE Data Catalog Central Storage
  34. 34. Serverless nicely fits into big data platforms • Mix and match serverless, managed, and virtualized services • Leverage services to easily • Rapidly ingest, categorize, and discover your data • Allow easy query and analysis of your data • Transform and Load data • Provide custom event based handlers • Serverless allows you to focuses more analytics and not on infrastructure or servers • Pay only for what you use
  35. 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! C L I C K T O A D D T E X T C L I C K T O A D D T E X T
  36. 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!

×