Modern Data Architectures for Business Insights at Scale

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Olivier Klein 奧樂凱
Emerging Technologies Solutions
Architect, Asia-Pacific
Modern Data Architectures for
Business Insights at Scale

Data analysis for a better customer experience
• Your business creates and stores
data and logs all the time
• Data points and logs allow you to
understand individual customer
experience and improve it
• Analysis of logs and trails help
gain insights

Ever Increasing Big Data
Volume
Velocity
Variety

95% of the 1.2 zettabytes
of data in the digital
universe is unstructured
70% of of this is user-
generated content
Unstructured data growth
explosive, with estimates
of compound annual
growth (CAGR) at 62%
from 2008 – 2012.
Source: IDC
GB TB
PB
ZB
EB
Big Data: Unconstrained data growth

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Available for analysis
Generated data
Data volume - Gap
1990 2000 2010 2020

Big Data Evolution
Batch
Reports
Real-time
Alerts
Prediction
Forecasts

Plethora of Tools
Amazon
Glacier
S3 DynamoDB
RDS
EMR
Amazon
Redshift
Data Pipeline
Amazon
Kinesis
Kinesis-enabled
app
Lambda ML
SQS
ElastiCache
DynamoDB
Streams
Amazon Elasticsearch
Service

Big Data Challenges
Is there a reference architecture?
What tools should I use?
How?
Why?

Outcome 1 : Modernize and consolidate
• Insights to enhance business applications and
create new digital services
Outcome 2 : Innovate for new revenues
• Personalization, demand forecasting, risk analysis
Outcome 3 : Real-time engagement
• Interactive customer experience, event-driven
automation, fraud detection
Outcome 4 : Automate for expansive reach
• Automation of business processes and physical
infrastructure
Driving Business Outcomes via Data Analytics

Deliver continuous differentiation
Personalization InteractiveModernize/consolidate

Personalization InteractiveModernize/consolidate
Deliver continuous differentiation

A full-service residential real estate brokerage
Redfin manages data on
hundreds of millions
of properties and
millions of customers
The Hot Homes algorithm
automatically calculates
the likelihood by analyzing
more than 500 attributes
of each home
Was fully AWS-native
since day one
https://aws.amazon.com/solutions/case-studies/redfin/

Hot Homes
There's an 80% chance this home will sell in the next 11 days – go tour it soon.

Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Amazon S3
Data lake
Amazon EMR
Amazon
Kinesis
Amazon RedShift
Answers &
Insights
Hot HomesUsers
Properties
Agents
User Profile
Recommendation
Hot Homes
Similar Homes
Agent Follow-up
Agent Scorecard
Marketing
A/B Testing
Real Time Data
…
Amazon
DynamoDB
BI / Reporting

Redfin Manages Data on Hundreds of Millions of Properties Using AWS
.
Once we solved the
infrastructure problem, we
could dream a little bigger. Now
we can deliver results without
worrying about how to scale.
Yong Huang, Director, Big Data and Analytics
”
“ • Zero on-premises infrastructure
• Using spot pricing for EC2, Redfin saved 90%
compared to running on-demand
• Using AWS, Redfin maintains a small technical team,
allowing much simplified server management and
allowing the transition to DevOps
• Redfin is able to launch products like Hot Homes to
greatly increase the buyer experience, by leveraging
the agility and scale of AWS

American upscale fashion retailer
Nordstrom has
323 stores operating
in 38 of the United States
and also in Canada; the
largest in number of
stores and
geographic footprint
of its retail competitors
Fashion retailer that sells
clothing, shoes,
cosmetics, and
accessories
Nordstrom is
going all in on AWS
https://aws.amazon.com/solutions/case-studies/nordstrom/
NORDSTROM

Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Outcomes
& Insights
Personalized
recommendations within
seconds (from 15-20 min)
Scale the expertise of
stylists to all shoppers
Reduce costs by 2X order
of magnitude
…
Mobile Users
Desktop Users
Analytics
Tools
Online Stylist
Amazon
RedShift
Amazon
Kinesis
AWS
Lambda
Amazon
DynamoDB
AWS
Lambda
Amazon S3
Data Storage
NORDSTROM

Nordstrom gives personalized style recommendations in seconds
.
Alert me when the
internet is down ...
Keith Homewood
Cloud Product Owner, Nordstrom
”
“ • Nordstrom Recommendation is the online version of a
stylist. It can analyze and deliver personalized
recommendations in seconds
• Going All-In on AWS has resulted in reducing costs
by 2X
• Continuous delivery allows Nordstrom to deliver
multiple production launches a day in a single
application
• Can now create a personalized recommendation in
seconds, in what used to take 15-20 minutes of
processing
• Nordstrom Cloud Product Owner finds the reliability
and availability of AWS so suitable that as long as the
internet is working, Nordstrom Recommendation is
working
Nordstrom

Technology that helps brick-and-mortar retailers optimize performance
Trusted by over
500 global brands
in 45 countries worldwide
and counting
Euclid analyzes customer
movement data to
correlate traffic with
marketing campaigns and
to help retailers optimize
hours for peak traffic
Was fully AWS-native
since day one
https://aws.amazon.com/solutions/case-studies/euclid/

Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Answers &
Insights
Euclid Analytics
Campaigns
WiFi - Foot traffic
Transactions
Walk-Bys
New & Return Visitors
Visit Duration
Engagement Rate
Bounce Rate
Storefront Potential &
Conversion
Customer segmentation
and loyalty assessment
Regional and categorical
roll-up reporting
Zoning for large-format
locations
Euclid EventIQAmazon S3
Data lake
Amazon RDS
for MySQL
Amazon EMR
Amazon
RedShift
Amazon EC2
Amazon
Elastic
Beanstalk
Elastic Load
Balancing

Euclid analytics processes POS analytics for 600 global brands in hours
.
We were totally amazed at the
speed - a simple count of rows
that would take 5½ hours
using MySQL only took 30
seconds with Amazon Redshift
Dexin Wang, Director of Platform Engineering, Euclid
”
“ • Process 10’s of TB in hours vs. 2 weeks
• 80-90% reduction in costs
• Euclid has a network of traffic counting sensors in
nearly 400 shopping centers, malls, and street
locations
• Euclid analyzes 10+ billion events monthly and 300
million shopping sessions yearly
• "We might have to re-compute up to 18 months of
customer data. That requires a lot of computational
power, which spikes traffic. We need resources that
can scale up on demand and scale down when we
don’t need it.”

Experiment and scale based on your business needs
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Answers &
Insights
SHORT LIST
BUSINESS CASES
Modernization Automation

MATCH
AVAILABLE DATA
Metrics and
Monitoring
Workflow
Logs
ERP
Transactions
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Answers &
Insights

AWS
Import/ Export
Amazon S3
Amazon
Kinesis
Amazon
EMR
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Answers &
Insights
Amazon
Redshift
Amazon
QuickSight
Amazon
SQS
CHOOSE
BEST FIT

A platform to build business outcomes from data
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
1 4
0 9
5

Types of Data
Database records
Search documents
Log files
Messaging events
Devices / sensors / IoT stream
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Stream
storage
IoT
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Database
Applications
AWS Import/Export
Snowball
Logging
AWS
CloudTrail
DOCUMENTS
FILES
Search
File store
LoggingTransport
Messaging
Message MESSAGES Queue
Messaging

Amazon Kinesis
Firehose
Amazon Kinesis
Streams
Apache Kafka
Amazon DynamoDB
Streams
Amazon SQS
Amazon SQS
• Managed message queue service
Apache Kafka
• High throughput distributed messaging
system
Amazon Kinesis Streams
• Managed stream storage + processing
Amazon Kinesis Firehose
• Managed data delivery
Amazon DynamoDB
• Managed NoSQL database
• Tables can be stream-enabled
Message & Stream Storage
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
IoT
Messaging
Message MESSAGES
Messaging
Queue
Stream

Why Stream Storage?
• Decouple producers & consumers
• Persistent buffer
• Collect multiple streams
• Preserve client ordering
• Streaming MapReduce
• Parallel consumption
4 4 3 3 2 2 1 1
4 3 2 1
4 3 2 1
4 3 2 1
4 3 2 1
4 4 3 3 2 2 1 1
Producer 1
shard 1 / partition 1
shard 2 / partition 2
Consumer 1
Count of
red = 4
Count of
violet = 4
Consumer 2
Count of
blue = 4
Count of
green = 4
Producer 2
Producer 3
Producer n
Key = violet
DynamoDB stream Amazon Kinesis stream Kafka topic

Amazon Kinesis Firehose
• Fully managed data streaming service to ingest and
capture data into your storage or data warehouse
• Ability to batch load, compress or encrypt streaming
data
• Elastic to scale to any throughput (no more sharding)
• Charged only per GB processed ($0.035 per GB)

What Stream Storage should I use?
Amazon
DynamoDB
Streams
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Apache
Kafka
Amazon
SQS
AWS managed
service
Yes Yes Yes No Yes
Guaranteed
ordering
Yes Yes Yes Yes No
Delivery exactly-once at-least-once exactly-once at-least-once at-least-once
Data retention
period
24 hours 7 days N/A Configurable 14 days
Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ
Scale /
throughput
No limit /
~ table IOPS
No limit /
~ shards
No limit /
automatic
No limit /
~ nodes
No limits /
automatic
Parallel clients Yes Yes No Yes No
Stream MapReduce Yes Yes N/A Yes N/A
Record/object size 400 KB 1 MB Amazon Redshift row size Configurable 256 KB
Cost Higher (table cost) Low Low Low (+admin) Low-medium
Hot Warm

COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Database
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Search
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Hot
Stream
Amazon S3
Amazon SQS
Message
Amazon S3
File
LoggingIoTApplicationsTransportMessaging
File Storage

Amazon S3
• Highly available object storage
• Designed for 99.999999999% annual
data durability
• Replicated across 3 facilities
• Virtually unlimited scale
• Pay only for what you use, you don’t
need to pre-provision
• Allows event notifications to trigger
further action
• Native support by big data frameworks
Amazon S3

Cost Conscious Design
Example: Should I use Amazon S3 or Amazon DynamoDB?
“I’m currently scoping out a project that will greatly increase
my team’s use of Amazon S3. Hoping you could answer
some questions. The current iteration of the design calls for
many small files, perhaps up to a billion during peak. The
total size would be on the order of 1.5 TB per month…”
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per month
300 2048 1483 777,600,000

Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per
month
300 2,048 1,483 777,600,000
Amazon S3 or
Amazon
DynamoDB?

Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per
month
Scenario 1300 2,048 1,483 777,600,000
Scenario 2300 32,768 23,730 777,600,000
Amazon S3
Amazon DynamoDB
use
use

No need to
move data
Query S3 directly
& right away
No infrastructure to
setup & manage
Fast results
within seconds
Pay for just the
queries you run
Amazon Athena
Interactive query service that makes it
easy to analyze data in Amazon S3
using standard SQL

What about HDFS & Amazon Glacier?
• Use HDFS for very frequently accessed (hot)
data
• Use Amazon S3 Standard for frequently
accessed data
• Use Amazon S3 Standard – IA for infrequently
accessed data
• Use Amazon Glacier for archiving cold data

Cache, database, search
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Hot
Stream
Amazon SQS
Message
Service
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
SearchSQLNoSQLCacheFile
HotWarm

Database Anti-pattern
Database tier

Best Practice - Use the Right Tool for the Job
Data Tier
Search
Amazon
Elasticsearch
Service
Cache
Amazon
ElastiCache
• Redis
• Memcached
SQL
• Amazon Aurora
• MySQL
• PostgreSQL
• Oracle
• SQL Server
NoSQL
• Amazon
DynamoDB
• Cassandra
• HBase
• MongoDB
Database tier options

BREAK
Next up: Real-Time Analytics and Engagement

COLLECT STORE
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
HotHotWarm
SearchSQLNoSQLCacheFileMessage
Stream
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
Process /
analyze
Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
PROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
FastSlowFast
BatchMessageInteractiveStreamML
Amazon EMR
Amazon EC2
Amazon EC2

Tools and Frameworks
Machine Learning
• Amazon ML, Amazon EMR (Spark ML), Amazon Rekognition
Interactive
• Amazon Redshift, Amazon EMR (Presto, Spark)
Batch
• Amazon EMR (MapReduce, Hive, Pig, Spark)
Messaging
• Amazon SQS application on Amazon EC2
Streaming
• Micro-batch: Spark Streaming, KCL
• Real-time: Amazon Kinesis Analytics, Storm,
AWS Lambda, KCL
Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
PROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
FastSlowFast
Amazon EC2
Amazon EC2

Amazon EMR
• Amazon EMR is a fully managed
Hadoop cluster
• Transient and long running clusters
• Direct integration into Amazon S3
• Easy to scale and enable burstable
capacity
• Integration with AWS Spot Market

Amazon EMR
• Amazon EMR supports all common
Hadoop Frameworks such as:
• Spark, Pig, Hive, Hue, Oozie …
• Hbase, Presto, Impala …
• Decouples storage from compute
• Allows independent scaling
• Direct Integration with DynamoDB
and S3
Amazon S3Amazon
DynamoDB
Amazon EMR

1 instance x 100 hours = 100 instances x 1 hour
(and with Spot Pricing not only faster but also cheaper)

Amazon Redshift
• Fully managed petabyte-scale data
warehouse
• Scalable amount of cluster nodes
• ODBC/JDBC connector for BI tools
using SQL
• Supports Amazon DynamoDB and
Amazon S3 to load data
• Less than a 10th of a cost of traditional
solutions
Amazon Redshift

Intel® Processor Technologies
Intel® AVX – Dramatically increases performance for highly parallel HPC workloads
such as life science engineering, data mining, financial analysis, media processing
Intel® AES-NI – Enhances security with new encryption instructions that reduce the
performance penalty associated with encrypting/decrypting data
Intel® Turbo Boost Technology – Increases computing power with performance that
adapts to spikes in workloads
Intel Transactional Synchronization (TSX) Extensions – Enables execution of
transactions that are independent to accelerate throughput
P state & C state control – provides granular performance tuning for cores and sleep
states to improve overall application performance

New X1 Instance - Tons of Memory
• Designed for large-scale, in-memory
applications in the cloud
• Ideal for in-memory databases like SAP
HANA and big data processing apps like
Spark and Presto
• Powered by Intel® Xeon® E7 8880 v3
Haswell processors
• Features up to 2TB of memory and up to
128 vCPUs per instance
• 8X the memory offered by any other Amazon EC2
instance

3. Affordable Petabyte-scale Analytics
AWS helps customers maximize the value of Big Data
investments while reducing overall IT costs
Secure,
Highly Durable storage
$28.16 / TB / month
Data
Archiving
$7.16 / TB / month
Real-time
streaming data load
$0.035 / GB
10-node
Spark Cluster
$0.15 / hr
Petabyte-scale
Data Warehouse
$0.25 / hr
Amazon Glacier Amazon S3 Amazon RedshiftAmazon EMRAmazon Kinesis

Predictions via Machine Learning
ML gives computers the ability to learn without being explicitly
programmed
Machine learning algorithms:
• Supervised learning ← “teach” program
- Classification ← Is this transaction fraud? (yes / no)
- Regression ← Customer life-time value?
• Unsupervised learning ← Let it learn by itself
- Clustering ← Market segmentation

Amazon Machine Learning
• Easy to use, managed machine
learning service built for developers
• Machine learning technology based
on Amazon’s internal systems
• Create models using data stored in
Amazon S3, Amazon RDS or Amazon
Redshift
• Request predictions on batch or real-
time
Amazon Machine
Learning

Machine Learning Algorithms
• Classification
• Sentiment analysis – Do people like my new product?
• Linear Regression
• Trend prediction – How much revenue next month?
• Clustering
• Recommendation - Other people bought this!
• Association
• Market basket analysis – Bundled products
• Neural Networks
• Pattern recognition - Speech recognition
Amazon Machine
Learning
Amazon EMR +
Spark Mlib
GPU Optimized
EC2 Instance

Amazon Rekognition
Image Recognitions and Analysis
powered by Deep Learning which
allows to search, verify and organize
millions of images
Easy to use Batch Analysis Real-time
Analysis
Continually Improving Low Cost

Maple
Villa
Plant
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard

Demographic Data
Facial Landmarks
Sentiment Expressed
Image Quality
Brightness: 25.84
Sharpness: 160
General Attributes

Serverless Rekognition Demo
Serverless website that uses Rekognition to identify
faces and classify pictures
Amazon S3
AWS Lambda
Amazon API
Gateway
Amazon
DynamoDB
Amazon
Rekognition
Mobile
CodeFor.Cloud/image

Unlimited
Replays
Returns an MP3
or audio stream
Lightning Fast
Response
Fully Managed and
Low Cost
Amazon Polly
Turn text into lifelike speech using deep
learning technologies to synthesize
speech that sounds like a human voice

Amazon Polly
“The temperature
in WA is 75°F”
“The temperature
in Washington is 75 degrees
Fahrenheit”
Amazon Polly: Text In, Life-like Speech Out

Amazon Lex
Conversational interfaces for your
applications, powered by the same
Natural Language Understanding
(NLU) & Automatic Speech Recognition
(ASR) models as Alexa
Integrated
development in
AWS console
Trigger AWS
Lambda
functions
Multi-step
conversations
Continually improving
ASR & NLU models
Enterprise
connectors
Fully Managed

Intents
A particular goal that the
user wants to achieve
Utterances
Spoken or typed phrases
that invoke your intent
Slots
Data the user must provide to fulfill the
intent
Prompts
Questions that ask the user to input
data
Fulfillment
The business logic required to fulfill the
user’s intent
BookHotel

Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
HotHotWarm
FastSlowFast
SearchSQLNoSQLCacheFileMessage
Stream
Amazon EC2
Amazon EC2
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
ETL

What About ETL?
https://aws.amazon.com/big-data/partner-solutions/
ETLSTORE PROCESS / ANALYZE

AWS Glue
Easily understand your data sources,
prepare the data, and load it reliably to
data stores and your analytics pipeline
Integrated with:
S3, RDS, Redshift & any JDBC-
compliant data store

Generate And Edit
Transformations

STORE CONSUMEPROCESS / ANALYZE
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
Applications & API
Analysis and visualization
Notebooks
IDE
Business
users
Data scientist,
developers
COLLECT ETL

Amazon Quicksight
• Fast, cloud-powered, BI service that
makes it easy to build visualizations,
perform ad-hoc analysis, and get insights
from data.
• Connectors for files, third party platforms,
AWS services and other partner BI tools
• In-memory calculation engine (SPICE)
to accelerate analysis and visualization
• $9 per user per month

Athena & Quicksight Demo
Amazon
S3
Amazon
Athena
Amazon
Quicksight
Analyze past flight performance data stored in S3
Bureau of Transportation Flight Data Statistics
www.transtats.bts.gov
Create visualizations from S3 with Athena & Quicksight

Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
HotHotWarm
FastSlowFast
SearchSQLNoSQLCacheFileQueueStream
Amazon EC2
Amazon EC2
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
Reference architecture
ETL

Let’s talk business outcomes of data analytics!

Suncorp is moving "all-in" on cloud.
Project Ignite will extract benefits of $170 million
- Group CEO Patrick Snowball
Insurance Policy Insurance Claim Core Banking Life Admin

Kinesis
for Real-
Time
10TB/day
Amazon
S3

AdRoll: AWS Lambda for log files
Valentino Volonghi
CTO, AdRoll
“Polling is not a scalable strategy to
figure out when new files are added to S3,
especially when you add 17M of them per
month. So we moved Lambda in front of
S3.”
• Cross-platform, cross-device
advertising platform
• Offers retargeting based on
clickstream data
300TB
new
data/mont
h

Rethink how to become a data-driven business
• Business outcomes - start with the insights and actions you
want to drive, then work backwards to a streamlined design
• Experimentation - start small, test many ideas, keep the
good ones and scale those up, paying only for what you
consume
• Agile and timely - deploy data processing infrastructure in
minutes, not months. take advantage of a rich platform of
services to respond quickly to changing business needs

Modern Data Architectures for Business Insights at Scale

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Modern Data Architectures for Business Insights at Scale

Similar to Modern Data Architectures for Business Insights at Scale (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

Modern Data Architectures for Business Insights at Scale

Editor's Notes