Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Matt Yanchyshyn
Sr. Manager, Solutions Architect...
Ingest/
Collect
Consume/
visualize
Store Process/
analyze
Data
1 4
0 9
5
Answers &
insights
START HERE
WITH A BUSINESS CASE
AWS Data PipelineAWS Database Migration Service
EMR
Analyze
Amazon
Glacier
S3
StoreCollect
Amazon Kinesis
Direct Connect
A...
Building a Big Data Application
web clients
mobile clients
DBMS
Amazon Redshift
AWS Cloudcorporate data center
Build a dat...
Structured Data Processing
• Petabyte-scale relational, MPP, data warehousing
• Fully managed with SSD and HDD platforms
•...
How do you get your (big) data into AWS?
Building a Big Data Application
web clients
mobile clients
DBMS
Amazon Redshift
AWS Cloudcorporate data center
Migrate you...
Start your first migration in 10 minutes or less
Keep your apps running during the migration
Migrate to databases running ...
AWS Snowball: PB-scale Data Transport
E-ink shipping
label
Ruggedized
case
“8.5G Impact”
All data encrypted
end-to-end
50T...
Your CEO doesn’t want to look at
raw SQL query output
Business Intelligence
• Fast and cloud-powered
• Easy to use, no infrastructure to manage
• Scales to 100s of thousands of...
Building a Big Data Application
web clients
mobile clients
DBMS
Amazon Redshift
Amazon
QuickSight
AWS Cloudcorporate data ...
What if your data isn’t structured?
What if you don’t need all the raw data?
What if you need to combine multiple data set...
Serverless Event Processing
• Serverless compute service that runs your code in
response to events
• Extend AWS services w...
Building a Big Data Application
web clients
mobile clients
DBMS
Amazon Redshift
Amazon
QuickSight
AWS Cloud
Event-driven d...
How will this work at scale?
What if the data processing exceeds the timeout?
Semi-structured/Unstructured Data Processing
• Hadoop, Hive, Presto, Spark, Tez, Impala etc.
• Release 5.2: Hadoop 2.7.3, ...
Building a Big Data Application
web clients
mobile clients
DBMS
Amazon Redshift
Amazon
QuickSight
AWS Cloud
Transform your...
What about ad-hoc queries when you are
exploring new data?
Serverless Query Processing
• Serverless query service for querying data in S3 using standard SQL with
no infrastructure t...
Building a Big Data Application
Extend your data warehouse to S3 with Amazon Athena
web clients
mobile clients
DBMS
Raw da...
Building a Big Data Application
Extend your data warehouse to S3 with Amazon Athena
web clients
mobile clients
DBMS
Amazon...
What if I want to run custom code or
multiple frameworks?
Building a Big Data Application
Extend your Data Warehouse to S3 with Presto, Spark SQL, etc. on Amazon EMR
web clients
mo...
What about real-time data?
Stream Processing
• Real-time stream processing
• High throughput; elastic
• Highly available; data replicated across mult...
Building a Big Data Application
web clients
mobile clients
DBMS
Amazon Redshift
Orc/Parquet
(Columnar Data Format)
Amazon
...
Building a Big Data Application
web clients
mobile clients
DBMS
Amazon Redshift
Amazon
QuickSight
AWS Cloud
React to real-...
Building a Big Data Application
web clients
mobile clients
DBMS
Amazon Redshift
Amazon
QuickSight
AWS Cloud
React intellig...
What if you need encryption and network
isolation to meet industry regulations?
Building a Big Data Application
web clients
mobile clients
DBMS
Amazon Redshift
Amazon
QuickSight
Amazon Kinesis
Streams
A...
Building a Big Data Application
web clients
mobile clients
DBMS
Amazon Redshift
Amazon
QuickSight
Amazon Kinesis
Streams
A...
Which customers are doing this?
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Amazon S3
Data Lake
Amazon EMR
Amazon
Kinesis
Ama...
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Outcomes
& insights
Personalized
recommendations ...
Data Marts
(Amazon
Redshift)
Query Cluster
(EMR)
Query Cluster
(EMR)
Auto Scaling
EC2
Analytics
App
Normalization
ETL Clus...
web clients
mobile clients
DBMS
Amazon Redshift
Amazon
QuickSight
AWS Cloudcorporate data center
Amazon Kinesis
Firehose
A...
Thank you!
Remember to complete
your evaluations!
Upcoming SlideShare
Loading in …5
×

AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Platform (BDA206)

1,112 views

Published on

Building big data applications often requires integrating a broad set of technologies to store, process, and analyze the increasing variety, velocity, and volume of data being collected by many organizations. In this session, we show how you can build entire big data applications using a core set of managed services including Amazon S3, Amazon Kinesis, Amazon EMR, Amazon Elasticsearch Service, Amazon Redshift, and Amazon QuickSight.

We walk you through the steps of building and securing a big data application using the AWS Big Data Platform. We also share best practices and common use cases for AWS big data services, including tips to help you choose the best services for your specific application.

Published in: Technology

AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Platform (BDA206)

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Matt Yanchyshyn Sr. Manager, Solutions Architecture, AWS November 30, 2016 Building Big Data Applications with the AWS Big Data Platform BDA206
  2. 2. Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Answers & insights START HERE WITH A BUSINESS CASE
  3. 3. AWS Data PipelineAWS Database Migration Service EMR Analyze Amazon Glacier S3 StoreCollect Amazon Kinesis Direct Connect Amazon Machine Learning Amazon Redshift DynamoDBAWS IoT AWS Snowball QuickSight Amazon Athena EC2 Amazon Elasticsearch Service Lambda
  4. 4. Building a Big Data Application web clients mobile clients DBMS Amazon Redshift AWS Cloudcorporate data center Build a data warehouse with Amazon Redshift
  5. 5. Structured Data Processing • Petabyte-scale relational, MPP, data warehousing • Fully managed with SSD and HDD platforms • Built-in end-to-end security, including customer-managed keys • Fault-tolerant. Automatically recovers from disk and node failures • Data automatically backed up to Amazon S3 with cross-region backup capability for global disaster recovery • Over 140 new features added since launch • $1,000/TB/Year; start at $0.25/hour. Provision in minutes; scale from 160 GB to 2 PB of compressed data with just a few clicks Amazon Redshift
  6. 6. How do you get your (big) data into AWS?
  7. 7. Building a Big Data Application web clients mobile clients DBMS Amazon Redshift AWS Cloudcorporate data center Migrate your data to AWS AWS Database Migration Service AWS Direct Connect AWS Import/Export & Snowball
  8. 8. Start your first migration in 10 minutes or less Keep your apps running during the migration Migrate to databases running on Amazon EC2, Amazon RDS, or Amazon Redshift AWS Database Migration Service
  9. 9. AWS Snowball: PB-scale Data Transport E-ink shipping label Ruggedized case “8.5G Impact” All data encrypted end-to-end 50TB & 80TB 10G network Rain & dust resistant Tamper-resistant case & electronics
  10. 10. Your CEO doesn’t want to look at raw SQL query output
  11. 11. Business Intelligence • Fast and cloud-powered • Easy to use, no infrastructure to manage • Scales to 100s of thousands of users • Quick calculations with SPICE • 1/10th the cost of legacy BI software Amazon QuickSight
  12. 12. Building a Big Data Application web clients mobile clients DBMS Amazon Redshift Amazon QuickSight AWS Cloudcorporate data center Visualize your data with Amazon QuickSight AWS Database Migration Service AWS Direct Connect AWS Import/Export & Snowball
  13. 13. What if your data isn’t structured? What if you don’t need all the raw data? What if you need to combine multiple data sets?
  14. 14. Serverless Event Processing • Serverless compute service that runs your code in response to events • Extend AWS services with user-defined custom logic • Write custom code in Node.js, Python, and Java • Pay only for the requests served and compute time required - billing in increments of 100 milliseconds AWS Lambda
  15. 15. Building a Big Data Application web clients mobile clients DBMS Amazon Redshift Amazon QuickSight AWS Cloud Event-driven data transformations with AWS Lambda corporate data center AWS Lambda Structured Data In Amazon S3 Raw data In Amazon S3
  16. 16. How will this work at scale? What if the data processing exceeds the timeout?
  17. 17. Semi-structured/Unstructured Data Processing • Hadoop, Hive, Presto, Spark, Tez, Impala etc. • Release 5.2: Hadoop 2.7.3, Hive 2.1, Spark 2.02, Zeppelin, Presto, HBase 1.2.3 and HBase on S3, Phoenix, Tez, Flink. • New applications added within 30 days of their open source release • Fully managed, Auto Scaling clusters with support for on-demand and spot pricing • Support for HDFS and S3 file systems enabling separated compute and storage; multiple clusters can run against the same data in S3 • HIPAA-eligible. Support for end-to-end encryption, IAM/VPC, S3 client- side encryption with customer managed keys and AWS KMS Amazon EMR
  18. 18. Building a Big Data Application web clients mobile clients DBMS Amazon Redshift Amazon QuickSight AWS Cloud Transform your and explore your data at scale with Amazon EMR corporate data center Amazon EMR Structured Data In Amazon S3 Raw data In Amazon S3
  19. 19. What about ad-hoc queries when you are exploring new data?
  20. 20. Serverless Query Processing • Serverless query service for querying data in S3 using standard SQL with no infrastructure to manage • No data loading required; query directly from Amazon S3 • Use standard ANSI SQL queries with support for joins, JSON, and window functions • Support for multiple data formats include text, CSV, TSV, JSON, Avro, ORC, Parquet • Pay per query only when you’re running queries based on data scanned. If you compress your data, you pay less and your queries run faster Amazon Athena
  21. 21. Building a Big Data Application Extend your data warehouse to S3 with Amazon Athena web clients mobile clients DBMS Raw data In Amazon S3 Amazon Redshift Staging Data in Amazon S3 Amazon QuickSight AWS Cloudcorporate data center Amazon EMR Amazon Athena
  22. 22. Building a Big Data Application Extend your data warehouse to S3 with Amazon Athena web clients mobile clients DBMS Amazon Redshift Amazon QuickSight AWS Cloudcorporate data center Amazon EMR Orc/Parquet in Amazon S3 (Columnar Data Format) Amazon EMR Raw data In Amazon S3 Staging Data in Amazon S3 Amazon Athena
  23. 23. What if I want to run custom code or multiple frameworks?
  24. 24. Building a Big Data Application Extend your Data Warehouse to S3 with Presto, Spark SQL, etc. on Amazon EMR web clients mobile clients DBMS Amazon Redshift Orc/Parquet in Amazon S3 (Columnar Data Format) Amazon QuickSight AWS Cloudcorporate data center Amazon EMR Amazon EMR Amazon EMR Raw data In Amazon S3 Staging Data in Amazon S3
  25. 25. What about real-time data?
  26. 26. Stream Processing • Real-time stream processing • High throughput; elastic • Highly available; data replicated across multiple Availability Zones with configurable retention • S3, Amazon Redshift, DynamoDB integrations • Amazon Kinesis Streams for custom streaming applications; Amazon Kinesis Firehose for easy integration with Amazon S3 and Amazon Redshift; Amazon Kinesis Analytics for streaming SQL Amazon Kinesis
  27. 27. Building a Big Data Application web clients mobile clients DBMS Amazon Redshift Orc/Parquet (Columnar Data Format) Amazon QuickSight Amazon Kinesis Streams AWS Cloud Add a real-time layer with Amazon Kinesis + Spark on Amazon EMR corporate data center Amazon EMR Amazon EMR Amazon EMR Raw data In Amazon S3 Staging Data In Amazon S3 Amazon Athena
  28. 28. Building a Big Data Application web clients mobile clients DBMS Amazon Redshift Amazon QuickSight AWS Cloud React to real-time data with Amazon Kinesis Analytics and AWS Lambda corporate data center Amazon Kinesis Firehose Amazon Kinesis Analytics AWS Lambda Amazon Kinesis Streams Amazon SNS Reference data in Amazon S3 Amazon Athena
  29. 29. Building a Big Data Application web clients mobile clients DBMS Amazon Redshift Amazon QuickSight AWS Cloud React intelligently in real-time with Amazon Machine Learning corporate data center Amazon Kinesis Firehose Amazon Kinesis Analytics AWS Lambda Amazon Kinesis Streams Reference data in Amazon S3 Amazon Machine Learning Amazon SNS Amazon Athena
  30. 30. What if you need encryption and network isolation to meet industry regulations?
  31. 31. Building a Big Data Application web clients mobile clients DBMS Amazon Redshift Amazon QuickSight Amazon Kinesis Streams AWS Cloud Add encryption at rest with AWS KMS corporate data center AWSKMS Amazon EMR Amazon EMR Raw data in S3 Staging Data in S3 Orc/Parquet in Amazon S3 (Columnar data)
  32. 32. Building a Big Data Application web clients mobile clients DBMS Amazon Redshift Amazon QuickSight Amazon Kinesis Streams AWS Cloud AWSKMS VPC subnet SSL/TLS SSL/TLS Protect data in transit & add network isolation corporate data center Raw data in S3 Staging Data in S3 Orc/Parquet in Amazon S3 (Columnar data)
  33. 33. Which customers are doing this?
  34. 34. Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Amazon S3 Data Lake Amazon EMR Amazon Kinesis Amazon Redshift Answers & insights Hot HomesUsers Properties Agents User Profile Recommendation Hot Homes Similar Homes Agent Follow-up Agent Scorecard Marketing A/B Testing Real Time Data … Amazon DynamoDB BI / Reporting Redfin
  35. 35. Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Outcomes & insights Personalized recommendations within seconds (from 15-20 min) Scale the expertise of stylists to all shoppers Reduce costs by 2X order of magnitude … Mobile Users Desktop Users Analytics Tools Online Stylist Amazon Redshift Amazon Kinesis AWS Lambda Amazon DynamoDB AWS Lambda Amazon S3 Data Storage NORDSTROM
  36. 36. Data Marts (Amazon Redshift) Query Cluster (EMR) Query Cluster (EMR) Auto Scaling EC2 Analytics App Normalization ETL Clusters (EMR) Batch Analytic Clusters Ad Hoc Query Cluster (EMR) Auto Scaling EC2 Analytics App Users Data Providers Auto Scaling EC2 Data Ingestion Services Optimization ETL Clusters (EMR) Shared Metastore (RDS) Query Optimized (S3) Auto Scaling EC2 Data Catalog & Lineage Services Reference Data (RDS) Shared Data Services Auto Scaling EC2 Cluster Mgt & Workflow Services Source of Truth (S3) >5 PB, up to 75 billion events per day
  37. 37. web clients mobile clients DBMS Amazon Redshift Amazon QuickSight AWS Cloudcorporate data center Amazon Kinesis Firehose Amazon Kinesis Analytics AWS Lambda Amazon Kinesis Streams Reference data in Amazon S3 Amazon Machine Learning Amazon SNS <YOUR COMPANY NAME HERE> Amazon Athena
  38. 38. Thank you!
  39. 39. Remember to complete your evaluations!

×