SlideShare a Scribd company logo
1 of 62
AWS big data platform
agenda overview
10:00 AM Registration
10:30 AM Introduction to Big Data @ AWS
12:00 PM Lunch + Registration for Technical Sessions
12:30 PM Data Collection and Storage
1:45PM Real-time Event Processing
3:00PM Analytics (incl Machine Learning)
4:30 PM Open Q&A Roundtable
global footprint
Over 1 million active customers
across 190 countries
800+ government agencies
3,000+ educational institutions
11 regions
28 availability zones
52 edge locations
Everyday, AWS adds enough new server capacity to support
Amazon.com when it was a $7 billion global enterprise.
2014 laaS Magic Quadrant
“AWS is the overwhelming
market share leader, with
more than 5X the compute
capacity in use than the
aggregate total of the
other 14 providers.”
Enterprise
Applications
Virtual Desktop Sharing & Collaboration
Platform
Services
Analytics
Hadoop
Real-time
Streaming Data
Data
Warehouse
Data
Pipelines
App Services
Queuing &
Notifications
Workflow
App streaming
Transcoding
Email
Search
Deployment & Management
One-click web
app deployment
Dev/ops resource
management
Resource
Templates
Mobile Services
Identity
Sync
Mobile
Analytics
Push
Notifications
Administration
& Security
Identity
Management
Access
Control
Usage
Auditing
Key
Storage
Monitoring
And Logs
Core
Services
Compute
(VMs, Auto-scaling
and Load Balancing)
Storage
(Object, Block
and Archival)
CDN
Databases
(Relational, NoSQL,
Caching)
Networking
(VPC, DX, DNS)
Infrastructure Regions Availability Zones Points of Presence
broad & deep core services
rich platform services
big data pipeline
Data Answers
Collect Process Analyze
Store
Collect Process Analyze
Store
primitive patterns
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
primitive patterns
S3
Kinesis
DynamoDB
RDS (Aurora)
AWS Lambda
KCL Apps
EMR Redshift
Machine
Learning
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
primitive patterns
S3
Kinesis
DynamoDB
RDS (Aurora)
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
data collection and storage
File: media, log files (sets of records)
Stream: records (eg: device stats)
Transactional: database reads/writes
AppsDevicesLoggingFrameworks
AWS services – data collection and storage
S3
Kinesis
DynamoDB
RDS (Aurora)
benefits of streamlined data collection
Increase velocity of data
• Upgrade existing applications to log records rather
than files – driven by need for greater agility
• Build new applications that are designed for
streaming data from the outset
Example:
S3
$0.030/GB-Mo
Redshift
Starts at
$0.25/hour
EC2
Starts at
$0.02/hour
Glacier
$0.010/GB-Mo
Kinesis
$0.015/shard 1MB/s in; 2MB/out
$0.028/million puts
500MM tweets/day = ~ 5,800 tweets/sec
2k/tweet is ~12MB/sec (~1TB/day)
$0.015/hour per shard, $0.028/million PUTS
Kinesis cost is $0.765/hour
Redshift cost is $0.850/hour (for a 2TB node)
S3 cost is $1.28/hour (no compression)
Total: $2.895/hour
cost &
scale
benefits of streamlined data collection
• Instrument existing applications
• Inject code to log activity – “new big data”
• Example: WAPO Labs Social Reader (now Trove)
Existing
Application
DynamoDB table(s)
GET calls & Queries
PUT calls
Query(…
PutItem(…
benefits of streamlined data collection
Increase data granularity
Customers Devices Data Items Item Size Frequency
Challenge: compounding scale
Benefit: improved data quality
primitive patterns
AWS Lambda
KCL Apps
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
event processing – enabling capabilities
S3 Event Notifications
Kinesis stream
DynamoDB Streams
AWS Lambda
KCL Apps
real-time event processing
• Event-driven programming
• Trigger activities based on real-time input
Examples:
 Proactively detect hardware errors in device logs
 Identify fraud from activity logs
 Monitor performance SLAs
 Notify when inventory drops below a threshold
benefits of event processing
• Build / add real-time events
 Take action between data collection and analytics
• Alerts and notifications, performance and security
• Automated data enrichment (eg: aggregations)
• De-couple application modules
 Streamline development and maintenance
 Increase agility
• MVP + iterate on discrete components
Collect | Store | Analyze
Alert
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
primitive patterns
EMR Redshift
Machine
Learning
NASDAQ
• 5.5B Records are loaded to
Amazon Redshift every day
• Security Requirements for
Client Side Encryption
• Historical Data - HDFS became
too expensive
 S3 + EMR to the Rescue
Retail and
POS Analytics
Process 10’s of TB
in hours vs. 2
weeks
80-90% reduction
in costs
big data use cases
Internet of Things
Digital Advertising
Online Gaming
Log Analytics
Customer Value Scoring
Personalization Engine
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
TempTracker
bee hive monitoring
in the AWS cloud
temperature
sensors
network card
waterproof
housing
Python (boto)
DynamoDBKinesis
App
Kinesis
ingestion
dashboard
Lambda
event
source
SNS
TempTracker: IoT sensor ingestion example
DynamoDB schema
hash range attributes
internal temperature
outside temperature
big data case study: Kaiten Sushiro
Kaiten Sushiro
• Kaiten Sushi Chain restaurant
• Gathering sensor data into Kinesis
Kaiten Sushiro data flow
2009 2010 2011 2012 2013
Move to AWS
cameras
Switch to
DynamoDB
IoT / connected devices
Simple video monitoring & security
Fast growth – “suddenly petabytes”
EC2 (live streaming)
S3 (CVR data)
DynamoDB (meta data)
CloudFront (CDN)
EMR (activity recognition)
applying analytics to
connected device data
VPC Subnet
MQTT Broker on
EC2 Instance
VPC
Internet
Gateway
EMR
Kinesis
DynamoDB
Redshift
Lambda
SNS
S3
Data Pipeline
backend analytics architecture for
connected device data
AWS big data ecosystem
S3
Kinesis
EMR
Redshift
Data Pipeline
DynamoDB
Collect Process & Analyze Visualize
AWS Professional Services
Partnering in Your Journey
Technical
Specialists
Specialty practices for
AWS skills transfer,
security, infrastructure
architecture,
application
optimization, analytics,
big data, and
operational integration
Advisory
Services
Portfolio strategy and
planning, cost/benefit
modeling, governance,
change management
and risk management
as it relates to
implementing the AWS
platform
Collaboration
Working together with
you and APN Premier
Partners you already
trust to provide you
with access to all
resources needed to
realize breakthrough
results
Proven
Process
Best practices and
patterns to help your
teams get the
foundation right, deploy
and migrate workloads,
and create a modern IT
operating model to
support your business
big data partner solutions
Solutions vetted by the AWS Partner Competency Program
Data
Enablement
Move, synchronize,
cleanse, and manage data
Data Analysis &
Visualization
Turn data into actionable
insight, enhance decision
making
Infrastructure
Intelligence
Harness data generated
from your systems and
infrastructure
Advanced
Analytics
Anticipate future events and
behaviors, conduct what-if
analysis
big data service offers
Service expertise vetted by the AWS Partner Competency Program
AWS marketplace
Advanced Analytics
Database and Data Enablement
Business Intelligence
1-click deployment to launch, on
multiple regions around the world
Pay-as-you-go pricing with no
long term contracts required
2,000+ product listings to
browse, test and buy software
Enterprise software store for business users who need simplified procurement
Amazon Machine
Learning
Amazon Aurora
smart applications
e-commerce: recommendations
made based on your past purchases
finance: alerts from your bank when
they suspect fraudulent transactions
retail: emails when items related to
things you typically buy are on sale
Amazon Machine Learning
1. Build & Train Model
 Create a datasource object (connect to Redshift, RDS, S3)
 Explore and understand your data
 Transform and train your model
2. Evaluate the Model & Optimize
 Assess model quality
 Fine-tune the model
3. Retrieve Predictions
 Batch: asynchronous, large volume prediction
 Real-time: synchronous, single-item prediction
Amazon Machine Learning
example use cases
• Fraud detection
• Demand forecasting
• Predictive customer support
• Click prediction
• Content personalization
• Document classification
Amazon Machine Learning
Currently Available in US-East-1
Amazon Aurora
Amazon’s New Relational Database Engine
a service-oriented architecture applied to
the database
• Moved the logging and storage layer
into a multi-tenant, scale-out
database-optimized storage service
• Integrated with other AWS services
like Amazon EC2, Amazon VPC,
Amazon DynamoDB, Amazon SWF,
and Amazon Route 53 for control
plane operations
• Integrated with Amazon S3 for
continuous backup with
99.999999999% durability
Control PlaneData Plane
Amazon
DynamoDB
Amazon SWF
Amazon Route 53
Logging + Storage
SQL
Transactions
Caching
Amazon S3
simplify data security
• Encryption to secure data at rest
 AES-256; hardware accelerated
 All blocks on disk and in Amazon S3 are encrypted
 Key management via AWS KMS
• SSL to secure data in transit
• Network isolation via Amazon VPC by default
• No direct access to nodes
• Supports industry standard security and data
protection certifications
Storage
SQL
Transactions
Caching
Amazon S3
Application
simplify storage management
• Read replicas are available as failover targets—no data loss
• Instantly create user snapshots—no performance impact
• Continuous, incremental backups to S3
• Automatic storage scaling up to 64 TB—no performance or
availability impact
• Automatic restriping, mirror repair, hot spot management,
encryption
Aurora storage
Highly available by default
• 6-way replication across 3 AZs
• 4 of 6 write quorum
 Automatic fallback to 3 of 4 if an AZ is unavailable
• 3 of 6 read quorum
SSD, scale-out, multi-tenant storage
• Seamless storage scalability
• Up to 64 TB database size
• Only pay for what you use
Log-structured storage
• Many small segments, each with
their own redo logs
• Log pages used to generate data pages
• Eliminates chatter between database and storage
SQL
Transactions
AZ 1 AZ 2 AZ 3
Caching
Amazon S3
self-healing, fault-tolerant
• Lose two copies or an AZ failure without read or write
availability impact
• Lose three copies without read availability impact
• Automatic detection, replication, and repair
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
Read and write availabilityRead availability
instant crash recovery
Traditional databases
• Have to replay logs since
the last checkpoint
• Single-threaded in
MySQL; requires a large
number of disk accesses
Amazon Aurora
• Underlying storage replays
redo records on demand as
part of a disk read
• Parallel, distributed,
asynchronous
Checkpointed Data Redo Log
Crash at T0 requires
a re-application of the
SQL in the redo log since
last checkpoint
T0 T0
Crash at T0 will result in redo
logs being applied to each segment
on demand, in parallel, asynchronously
write performance (console screenshot)
• MySQL Sysbench
• R3.8XL with 32 cores
and 244 GB RAM
• 4 client machines with
1,000 threads each
read performance (console screenshot)
• MySQL Sysbench
• R3.8XL with 32 cores
and 244 GB RAM
• Single client with
1,000 threads
read replica lag (console screenshot)
• Aurora Replica with 7.27 ms replica lag at 13.8 K updates/second
• MySQL 5.6 on the same hardware has ~2 s lag at 2 K updates/second
Aurora – current state
• Sign up for preview access at:
https://aws.amazon.com/rds/aurora/preview
• Now available in US West (Oregon) and EU (Ireland), in
addition to US East (N. Virginia)
• Thousands of customers already in the limited preview
• Unlimited preview: accepting all requests from late May
• Full service launch in the coming months
AWS big data platform
• Choice – platform breadth supports many use cases
• Specialization – optimal application experiences
• Managed Services – eliminate undifferentiated effort
S3
Kinesis
DynamoDB
RDS (Aurora)
AWS Lambda
KCL Apps
EMR Redshift
Machine
Learning
Thank you
Questions?

More Related Content

What's hot

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...Simplilearn
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentHostedbyConfluent
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPDatabricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for DummiesRodney Joyce
 

What's hot (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Azure purview
Azure purviewAzure purview
Azure purview
 
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, Confluent
 
Modern Data Platform on AWS
Modern Data Platform on AWSModern Data Platform on AWS
Modern Data Platform on AWS
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 

Similar to The AWS Big Data Platform – Overview

AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 
¿Quién es Amazon Web Services?
¿Quién es Amazon Web Services?¿Quién es Amazon Web Services?
¿Quién es Amazon Web Services?Software Guru
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewAmazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017Amazon Web Services
 
Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100Amazon Web Services
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsAmazon Web Services
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakAmazon Web Services
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSAmazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAmazon Web Services
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAmazon Web Services Korea
 

Similar to The AWS Big Data Platform – Overview (20)

AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
¿Quién es Amazon Web Services?
¿Quién es Amazon Web Services?¿Quién es Amazon Web Services?
¿Quién es Amazon Web Services?
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam Elmalak
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

The AWS Big Data Platform – Overview

  • 1. AWS big data platform
  • 2. agenda overview 10:00 AM Registration 10:30 AM Introduction to Big Data @ AWS 12:00 PM Lunch + Registration for Technical Sessions 12:30 PM Data Collection and Storage 1:45PM Real-time Event Processing 3:00PM Analytics (incl Machine Learning) 4:30 PM Open Q&A Roundtable
  • 3. global footprint Over 1 million active customers across 190 countries 800+ government agencies 3,000+ educational institutions 11 regions 28 availability zones 52 edge locations Everyday, AWS adds enough new server capacity to support Amazon.com when it was a $7 billion global enterprise.
  • 4. 2014 laaS Magic Quadrant “AWS is the overwhelming market share leader, with more than 5X the compute capacity in use than the aggregate total of the other 14 providers.”
  • 5. Enterprise Applications Virtual Desktop Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Data Warehouse Data Pipelines App Services Queuing & Notifications Workflow App streaming Transcoding Email Search Deployment & Management One-click web app deployment Dev/ops resource management Resource Templates Mobile Services Identity Sync Mobile Analytics Push Notifications Administration & Security Identity Management Access Control Usage Auditing Key Storage Monitoring And Logs Core Services Compute (VMs, Auto-scaling and Load Balancing) Storage (Object, Block and Archival) CDN Databases (Relational, NoSQL, Caching) Networking (VPC, DX, DNS) Infrastructure Regions Availability Zones Points of Presence
  • 6. broad & deep core services
  • 8. big data pipeline Data Answers Collect Process Analyze Store
  • 9. Collect Process Analyze Store primitive patterns Data Collection and Storage Data Processing Event Processing Data Analysis
  • 10. primitive patterns S3 Kinesis DynamoDB RDS (Aurora) AWS Lambda KCL Apps EMR Redshift Machine Learning Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 11. primitive patterns S3 Kinesis DynamoDB RDS (Aurora) Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 12. data collection and storage File: media, log files (sets of records) Stream: records (eg: device stats) Transactional: database reads/writes AppsDevicesLoggingFrameworks
  • 13. AWS services – data collection and storage S3 Kinesis DynamoDB RDS (Aurora)
  • 14. benefits of streamlined data collection Increase velocity of data • Upgrade existing applications to log records rather than files – driven by need for greater agility • Build new applications that are designed for streaming data from the outset Example:
  • 15.
  • 17. 500MM tweets/day = ~ 5,800 tweets/sec 2k/tweet is ~12MB/sec (~1TB/day) $0.015/hour per shard, $0.028/million PUTS Kinesis cost is $0.765/hour Redshift cost is $0.850/hour (for a 2TB node) S3 cost is $1.28/hour (no compression) Total: $2.895/hour cost & scale
  • 18. benefits of streamlined data collection • Instrument existing applications • Inject code to log activity – “new big data” • Example: WAPO Labs Social Reader (now Trove) Existing Application DynamoDB table(s) GET calls & Queries PUT calls Query(… PutItem(…
  • 19. benefits of streamlined data collection Increase data granularity Customers Devices Data Items Item Size Frequency Challenge: compounding scale Benefit: improved data quality
  • 20. primitive patterns AWS Lambda KCL Apps Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 21. event processing – enabling capabilities S3 Event Notifications Kinesis stream DynamoDB Streams AWS Lambda KCL Apps
  • 22. real-time event processing • Event-driven programming • Trigger activities based on real-time input Examples:  Proactively detect hardware errors in device logs  Identify fraud from activity logs  Monitor performance SLAs  Notify when inventory drops below a threshold
  • 23. benefits of event processing • Build / add real-time events  Take action between data collection and analytics • Alerts and notifications, performance and security • Automated data enrichment (eg: aggregations) • De-couple application modules  Streamline development and maintenance  Increase agility • MVP + iterate on discrete components Collect | Store | Analyze Alert
  • 24. Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis primitive patterns EMR Redshift Machine Learning
  • 25. NASDAQ • 5.5B Records are loaded to Amazon Redshift every day • Security Requirements for Client Side Encryption • Historical Data - HDFS became too expensive  S3 + EMR to the Rescue
  • 26. Retail and POS Analytics Process 10’s of TB in hours vs. 2 weeks 80-90% reduction in costs
  • 27. big data use cases Internet of Things Digital Advertising Online Gaming Log Analytics Customer Value Scoring Personalization Engine Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 33. big data case study: Kaiten Sushiro
  • 34. Kaiten Sushiro • Kaiten Sushi Chain restaurant • Gathering sensor data into Kinesis
  • 36. 2009 2010 2011 2012 2013 Move to AWS cameras Switch to DynamoDB IoT / connected devices Simple video monitoring & security Fast growth – “suddenly petabytes”
  • 37. EC2 (live streaming) S3 (CVR data) DynamoDB (meta data) CloudFront (CDN) EMR (activity recognition)
  • 38. applying analytics to connected device data VPC Subnet MQTT Broker on EC2 Instance VPC Internet Gateway EMR Kinesis DynamoDB Redshift Lambda SNS S3 Data Pipeline
  • 39. backend analytics architecture for connected device data
  • 40. AWS big data ecosystem S3 Kinesis EMR Redshift Data Pipeline DynamoDB Collect Process & Analyze Visualize
  • 41. AWS Professional Services Partnering in Your Journey Technical Specialists Specialty practices for AWS skills transfer, security, infrastructure architecture, application optimization, analytics, big data, and operational integration Advisory Services Portfolio strategy and planning, cost/benefit modeling, governance, change management and risk management as it relates to implementing the AWS platform Collaboration Working together with you and APN Premier Partners you already trust to provide you with access to all resources needed to realize breakthrough results Proven Process Best practices and patterns to help your teams get the foundation right, deploy and migrate workloads, and create a modern IT operating model to support your business
  • 42. big data partner solutions Solutions vetted by the AWS Partner Competency Program Data Enablement Move, synchronize, cleanse, and manage data Data Analysis & Visualization Turn data into actionable insight, enhance decision making Infrastructure Intelligence Harness data generated from your systems and infrastructure Advanced Analytics Anticipate future events and behaviors, conduct what-if analysis
  • 43. big data service offers Service expertise vetted by the AWS Partner Competency Program
  • 44. AWS marketplace Advanced Analytics Database and Data Enablement Business Intelligence 1-click deployment to launch, on multiple regions around the world Pay-as-you-go pricing with no long term contracts required 2,000+ product listings to browse, test and buy software Enterprise software store for business users who need simplified procurement
  • 46. smart applications e-commerce: recommendations made based on your past purchases finance: alerts from your bank when they suspect fraudulent transactions retail: emails when items related to things you typically buy are on sale
  • 47. Amazon Machine Learning 1. Build & Train Model  Create a datasource object (connect to Redshift, RDS, S3)  Explore and understand your data  Transform and train your model 2. Evaluate the Model & Optimize  Assess model quality  Fine-tune the model 3. Retrieve Predictions  Batch: asynchronous, large volume prediction  Real-time: synchronous, single-item prediction
  • 48. Amazon Machine Learning example use cases • Fraud detection • Demand forecasting • Predictive customer support • Click prediction • Content personalization • Document classification
  • 49. Amazon Machine Learning Currently Available in US-East-1
  • 50. Amazon Aurora Amazon’s New Relational Database Engine
  • 51. a service-oriented architecture applied to the database • Moved the logging and storage layer into a multi-tenant, scale-out database-optimized storage service • Integrated with other AWS services like Amazon EC2, Amazon VPC, Amazon DynamoDB, Amazon SWF, and Amazon Route 53 for control plane operations • Integrated with Amazon S3 for continuous backup with 99.999999999% durability Control PlaneData Plane Amazon DynamoDB Amazon SWF Amazon Route 53 Logging + Storage SQL Transactions Caching Amazon S3
  • 52. simplify data security • Encryption to secure data at rest  AES-256; hardware accelerated  All blocks on disk and in Amazon S3 are encrypted  Key management via AWS KMS • SSL to secure data in transit • Network isolation via Amazon VPC by default • No direct access to nodes • Supports industry standard security and data protection certifications Storage SQL Transactions Caching Amazon S3 Application
  • 53. simplify storage management • Read replicas are available as failover targets—no data loss • Instantly create user snapshots—no performance impact • Continuous, incremental backups to S3 • Automatic storage scaling up to 64 TB—no performance or availability impact • Automatic restriping, mirror repair, hot spot management, encryption
  • 54. Aurora storage Highly available by default • 6-way replication across 3 AZs • 4 of 6 write quorum  Automatic fallback to 3 of 4 if an AZ is unavailable • 3 of 6 read quorum SSD, scale-out, multi-tenant storage • Seamless storage scalability • Up to 64 TB database size • Only pay for what you use Log-structured storage • Many small segments, each with their own redo logs • Log pages used to generate data pages • Eliminates chatter between database and storage SQL Transactions AZ 1 AZ 2 AZ 3 Caching Amazon S3
  • 55. self-healing, fault-tolerant • Lose two copies or an AZ failure without read or write availability impact • Lose three copies without read availability impact • Automatic detection, replication, and repair SQL Transaction AZ 1 AZ 2 AZ 3 Caching SQL Transaction AZ 1 AZ 2 AZ 3 Caching Read and write availabilityRead availability
  • 56. instant crash recovery Traditional databases • Have to replay logs since the last checkpoint • Single-threaded in MySQL; requires a large number of disk accesses Amazon Aurora • Underlying storage replays redo records on demand as part of a disk read • Parallel, distributed, asynchronous Checkpointed Data Redo Log Crash at T0 requires a re-application of the SQL in the redo log since last checkpoint T0 T0 Crash at T0 will result in redo logs being applied to each segment on demand, in parallel, asynchronously
  • 57. write performance (console screenshot) • MySQL Sysbench • R3.8XL with 32 cores and 244 GB RAM • 4 client machines with 1,000 threads each
  • 58. read performance (console screenshot) • MySQL Sysbench • R3.8XL with 32 cores and 244 GB RAM • Single client with 1,000 threads
  • 59. read replica lag (console screenshot) • Aurora Replica with 7.27 ms replica lag at 13.8 K updates/second • MySQL 5.6 on the same hardware has ~2 s lag at 2 K updates/second
  • 60. Aurora – current state • Sign up for preview access at: https://aws.amazon.com/rds/aurora/preview • Now available in US West (Oregon) and EU (Ireland), in addition to US East (N. Virginia) • Thousands of customers already in the limited preview • Unlimited preview: accepting all requests from late May • Full service launch in the coming months
  • 61. AWS big data platform • Choice – platform breadth supports many use cases • Specialization – optimal application experiences • Managed Services – eliminate undifferentiated effort S3 Kinesis DynamoDB RDS (Aurora) AWS Lambda KCL Apps EMR Redshift Machine Learning

Editor's Notes

  1. So, I wanted to get started by taking a look at how the AWS business is progressing. We now have over 1 million active customers, this is non-Amazon customers with AWS account usage activity in the past month. TALKING POINTS We define an “active customer” as non-Amazon customers who have account usage activity within the past month To support global business, we maintain 11 regions across the US, South America, Europe (Ireland and Germany), Japan, China, Singapore, and Australia. We count hundreds of thousands of customers across 190 countries This includes over 800 government agencies and over 3,000 educational institutions Scale and capacity matter. Every day, we add enough new server capacity to support Amazon.com when it was a $7B global enterprise.
  2. 500MM tweets/day = 5,800 tweets/second 2.5k/tweet =
  3. 500MM tweets/day = 5,800 tweets/second 2.5k/tweet =
  4. 12 MB/s Cost/hour Shard 1 $0.015 Requests 1000000 $0.028 Total Cost 12 shards $0.180 Hourly request $0.585 $0.765 $4.65 6.081293158 $5.25 for a peppermint mocha frappucino. You can run this app for an hour and 48 minutes per frappucino Network Transfer Rate: if you pull 1TB down per day, cost is $0.09/GB = $92.16/day.
  5.   The Nasdaq Group has been a user of Amazon Redshift since it was released and they are extremely happy with it. We’ve discussed our usage of that system at re:Invent several times, the most recent of which wasFIN401 Seismic Shift: Nasdaq’s Migration to Amazon Redshift. Currently, our system is moving an average of 5.5 billion rows into Amazon Redshift every day (14 billion on a peak day in October of 2014). In addition to our Amazon Redshift data warehouse, we have a large historical data footprint that we would like to access as a single, gigantic data set. Currently, this historical archive is spread across a large number of disparate systems, making it difficult to use. Our aim for a new, unified warehouse platform is two-fold: increase the accessibility of this historic data set to a growing number of internal groups at Nasdaq and to gain cost efficiencies in the process. For this platform, Hadoop is a clear choice: it supports a number of different SQL and other interfaces for accessing data and has an active and growing ecosystem of tools and projects.
  6. The AWS Partner Network (APN) is the global partner program for AWS. It is focused on helping our thousands of partners build successful AWS-based businesses by providing great technical and GTM support. APN Big Data Competency Partners have demonstrated technical and customer success in building solutions to help customers work with data productively, at any scale. Big data partner solutions are available across key big data use cases and solution areas, including: data enablement, data analysis/visualization, infrastructure intelligence, and advanced analytics.
  7. The APN Competency Program is designed to provide you with qualified APN Partners who have demonstrated technical proficiency and proven success in specialized solutions and vertical areas. Partners who’ve attained an APN Competency offer a variety of validated services and solutions on the AWS Cloud. We have a number of consulting partners that provide specific services and delivery capabilities around big data.
  8. Key talking points: No BI/Big data products solves all the scenarios and use cases, go to https://aws.amazon.com/marketplace/ to find a wide variety of analytical solutions. No need for multiple sales contract!!! Just Chose your type of EC2, No delays on using the product Pay-as you go. No long term commitment Success story One example is that a data scientist from Philips went into the Marketplace and processed their 37 million records in less than an hour on Saturday night. The consulting company estimated that the same task would take 3 months of engagement to deliver.  
  9. Control plane applies to storage Use Icons Another Box Collapsing The relational database reinvented using a service-oriented approach Make the layers of the database scale out and multi-tenant. Started with storage Retained drop-in compatibility with MySQL 5.6. Your existing applications should just work
  10. Kill storage label