Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ben Snively, AWS Sr. SA, Data & Analytics
April ...
Agenda
Cloud architecture evolution – Why serverless
Data and analytics flow
Key services overview
Design patterns
Call to...
Cloud architecture evolution
Virtualized Managed Serverless
Virtualized
servers
Managed
platforms
Serverless
analytics
No servers to provision
or manage
Scales with usage
Never pay for idle Availability and fault
tolerance built in
Serverles...
Data and analytics flow
Ingest/
Collect
Store
Analyze/
Process
Visualization/
Consume
Orchestrate/Transform
What Is the temperature of your data / access ?
Orchestration/Transform
AWS Big Data services
Ingest/ Collect Store Analyze/ Process
Visualization/
Consume
Batch
ETL/ELT
...
Orchestration/Transform
AWS Big Data services
Ingest/ Collect Store Analyze/ Process
Visualization/
Consume
= Serverless
S...
Orchestration/Transform
AWS Big Data services
EMR EC2
S3
Amazon
RedshiftDynamoDB
AWS DMS (CDC)
AWS Lambda
Kinesis Analytic...
Key Services Overview
Big Data storage for virtually all AWS services
Amazon S3
• Store anything
• Object storage
• Scalable
• 99.999999999% dur...
Amazon
DynamoDB
Fast & flexible NoSQL database service
• NoSQL database
• Seamless scalability
• Zero admin
• Single digit...
Amazon
Kinesis
Real-time streaming platform
• Streams, Firehose, Analytics
• Real-time processing
• High throughput, elast...
Amazon Kinesis
Streams
• For technical developers
• Build your own custom
applications that process
or analyze streaming
d...
AWS Lambda
• Run your code in the cloud - fully
managed and highly available
• Triggered through API or state
changes in y...
Amazon
Athena
Interactive query service
• Query directly from Amazon S3
• Use ANSI SQL
• Serverless
• Multiple data format...
AWS Glue
Fully managed ETL service
• Catalog data sources
• Identify data formats & data types
• Error handling
• Manage a...
AWS Glue: Services
Data catalog
 Hive metastore-compatible metadata repository of data sources.
 Crawls data source to i...
•Fast and cloud-powered
•Easy to use, no infrastructure to
manage
•Scales to hundreds of thousands of
users
•Quick calcula...
Serverless design patterns
Real-time analytics
Producer
Apache
Kafka
KCL
AWS
Lambda
Spark
Streaming
Apache
Storm
Amazon
SNS
Notifications
Amazon
Elas...
Interactive Queries
Ingest/ Collect Store Analyze/ Process
Visualization/
Consume
Producer Amazon S3
Amazon
Redshift
Amazo...
Data lake reference architecture
= Serverless
Amazon S3
Data Lake
Amazon Kinesis
Streams & Firehose
Hadoop / Spark
Streaming Analytics Tools
Amazon Redshift
Data Wareho...
Serverless ETL
Store Transform Store Analyze/ Process
Visualize/
Consume
Amazon S3
Apache
Kafka
Kinesis
Streams Amazon EMR...
Serverless nicely fits into big data platforms
• AWS serverless Big Data services
• Complements existing big data flows
• ...
Thank you!
Upcoming SlideShare
Loading in …5
×

BDA303 Serverless big data architectures: Design patterns and best practices

1,713 views

Published on

Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. But how can you incorporate serverless concepts into your big data architectures?
In this session, we explore the key concepts and benefits of serverless architectures for big data, diving into design patterns to ingest, store, process, and visualize your data. Along the way, we explain when and how you can use serverless technologies to streamline data processing, minimize infrastructure management, and improve agility and robustness. We will share reference architectures using a combination of services that include AWS Lambda, Amazon Kinesis, Amazon Athena, Amazon QuickSight, and AWS Glue.

Published in: Technology

BDA303 Serverless big data architectures: Design patterns and best practices

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ben Snively, AWS Sr. SA, Data & Analytics April 19, 2017 Serverless Big Data Architectures Serverless Streaming Data Analytics
  2. 2. Agenda Cloud architecture evolution – Why serverless Data and analytics flow Key services overview Design patterns Call to action
  3. 3. Cloud architecture evolution Virtualized Managed Serverless Virtualized servers Managed platforms Serverless analytics
  4. 4. No servers to provision or manage Scales with usage Never pay for idle Availability and fault tolerance built in Serverless characteristics
  5. 5. Data and analytics flow Ingest/ Collect Store Analyze/ Process Visualization/ Consume Orchestrate/Transform
  6. 6. What Is the temperature of your data / access ?
  7. 7. Orchestration/Transform AWS Big Data services Ingest/ Collect Store Analyze/ Process Visualization/ Consume Batch ETL/ELT Realtime ETL/ELT Transactional / CDC B.I. Tools Data Science Notebooks Bulk Transport File/Object Upload Streaming Ingest Commits Transactional NoSQL Data Lake Streaming Storage Dashboards Batch Analytics Interactive Querying Machine Learning/ Deep Learning Realtime Analytics …
  8. 8. Orchestration/Transform AWS Big Data services Ingest/ Collect Store Analyze/ Process Visualization/ Consume = Serverless Serverless Managed Virtualized Batch ETL/ELT Realtime ETL/ELT Transactional / CDC B.I. Tools Data Science Notebooks Bulk Transport File/Object Upload Streaming Ingest Commits Transactional NoSQL Data Lake Streaming Storage Dashboards Batch Analytics Interactive Querying Machine Learning/ Deep Learning Realtime Analytics
  9. 9. Orchestration/Transform AWS Big Data services EMR EC2 S3 Amazon RedshiftDynamoDB AWS DMS (CDC) AWS Lambda Kinesis Analytics Amazon Athena Amazon QuickSight Aurora AWS Glue AWS Step Functions Kinesis Streams Ingest/ Collect Store Analyze/ Process Visualization/ Consume AWS Snowball ISV Connectors Kinesis Firehose S3 Transfer Acceleration = Serverless Amazon Elasticsearch
  10. 10. Key Services Overview
  11. 11. Big Data storage for virtually all AWS services Amazon S3 • Store anything • Object storage • Scalable • 99.999999999% durability • Extremely low cost
  12. 12. Amazon DynamoDB Fast & flexible NoSQL database service • NoSQL database • Seamless scalability • Zero admin • Single digit millisecond latency
  13. 13. Amazon Kinesis Real-time streaming platform • Streams, Firehose, Analytics • Real-time processing • High throughput, elastic • Easy to use • Integration with S3, EMR, Amazon Redshift, Amazon DynamoDB
  14. 14. Amazon Kinesis Streams • For technical developers • Build your own custom applications that process or analyze streaming data Amazon Kinesis Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into S3, Amazon Redshift and Amazon Elasticsearch Service Amazon Kinesis Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries Amazon Kinesis: Streaming data made easy Services make it easy to capture, deliver, and process streams on AWS
  15. 15. AWS Lambda • Run your code in the cloud - fully managed and highly available • Triggered through API or state changes in your setup • Scales automatically to match the incoming event rate • Node.js (JavaScript), Python, Java, and C# • Charged per 100ms execution time Serverless compute
  16. 16. Amazon Athena Interactive query service • Query directly from Amazon S3 • Use ANSI SQL • Serverless • Multiple data formats • Pay per query
  17. 17. AWS Glue Fully managed ETL service • Catalog data sources • Identify data formats & data types • Error handling • Manage and scale resources • Generate ETL code • Schedules & executes ETL jobs
  18. 18. AWS Glue: Services Data catalog  Hive metastore-compatible metadata repository of data sources.  Crawls data source to infer table, data type, partition format. Job execution  Runs jobs in Spark containers – automatic scaling based on SLA.  AWS Glue is serverless – only pay for the resources you consume. Job authoring  Generates Python code to move data from source to destination.  Edit with your favorite IDE; share code snippets using Git.
  19. 19. •Fast and cloud-powered •Easy to use, no infrastructure to manage •Scales to hundreds of thousands of users •Quick calculations with SPICE •1/10th the cost of legacy BI software Business Intelligence Amazon QuickSight
  20. 20. Serverless design patterns
  21. 21. Real-time analytics Producer Apache Kafka KCL AWS Lambda Spark Streaming Apache Storm Amazon SNS Notifications Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES Alert Analytics Output KPI Serverless Managed DynamoDB Streams Kinesis Streams Virtualized Kinesis Analytics Ingest/ Collect Store Analyze/ Process Visualization/ Consume Apache FlinkSQS
  22. 22. Interactive Queries Ingest/ Collect Store Analyze/ Process Visualization/ Consume Producer Amazon S3 Amazon Redshift Amazon EMR Presto Impala Spark Interactive Amazon Athena Serverless Managed Virtualized QuickSight
  23. 23. Data lake reference architecture = Serverless
  24. 24. Amazon S3 Data Lake Amazon Kinesis Streams & Firehose Hadoop / Spark Streaming Analytics Tools Amazon Redshift Data Warehouse Amazon DynamoDB NoSQL Database AWS Lambda Spark Streaming on EMR Amazon Elasticsearch Service Relational Database Amazon EMR Amazon Aurora Amazon Machine Learning Predictive Analytics Any Open Source Tool of Choice on EC2 Data Science Sandbox Visualization / Reporting Apache Storm on EMR Apache Flink on EMR Amazon Kinesis Analytics Serving Tier Clusterless SQL Query Amazon Athena DataSourcesTransactionalData AWS Glue Clusterless ETL Amazon ElastiCache Redis Data Lake and Real-time Analytics
  25. 25. Serverless ETL Store Transform Store Analyze/ Process Visualize/ Consume Amazon S3 Apache Kafka Kinesis Streams Amazon EMR Spark Flink AWS Glue AWS Lambda ISV Amazon S3 Apache Kafka Amazon Redshift Kinesis Streams Data CatalogAWS Glue DynamoDB Streams DynamoDB Hive M/D
  26. 26. Serverless nicely fits into big data platforms • AWS serverless Big Data services • Complements existing big data flows • Focus on the analytics and not on infrastructure or servers • Don’t focus on the scaling, availability, and undifferentiated heavy lifting • Pay only for what you use • Easily try out different tools, analytics, and solutions
  27. 27. Thank you!

×