Is Lambda Architecture really a
new normal for cloud native apps?
λ
+
:~ whoami:
Antons Kranga
Full stack developer ~ 15years
Cloud Architect
DevOps evangelist
Innovation Center of Accenture Cloud Platform
Speaker
Marathon runner
Motivation
What is Streaming?
We often want to deploy data models based on new data that
continuously arrive from the multiple sources
0
1
0
1
0
10
1
0
1
0
1
0
10
1
Challenges
Users expect data will appear immediately after it arrived
Fault tolerant
Distributed data consistency
Scalability (how not to lose data when scale down)
What is “λ”
0
1
0 10
10
1 00
0
1
110
1
Speed Layer Batch Layer
new data
master
data
realtime
view
Serving Layer
view View View…
map-red
query query
realtime
view
What is “λ” architecture
Batch Layer: Master Data sets and Pre-compute aggregations
• Slow Data Ingestion – minutes to days intervals
• Append-only data sets eventually supersedes data
captured in speed layer
Speed Layer: High throughput, near-real-time data ingestion
• Fast Data Ingestion – seconds interval
• Concurrent information processing
• Retrieval of most recent information
Serving Layer: Provide query capability over the Batch Layer
• Low-latency ad-hoc query
• May also provide assess to speed layer views
Why go Cloud Native?
Cloud Provider Lock-In
Avoid “Yak shaving”
Rely on managed services
devops automation
Lower operating costs
Transparent integration with
other “Cloud Native” services
AWS Blueprint for Lambda Architectures
https://d0.awsstatic.com/whitepapers/lambda-architecure-on-for-batch-aws.pdf
Published at July 2015
Amazon
Kinesis
AmazonKinesis–
enabledapp
S3 buckets
Amazon EMR
speed layer
batch layer
emr on serving
and merging layer
Data services form AWS
Kinesis
aws region
az1 az2 az3
Lambda
S3 storage
Redshift
consumers
EC2 Instance
EMR
producers
Kinesis
producers
aws region
az1 az2 az3
Lambda
S3 storage
Redshift
consumers
EC2 Instance
EMR
AmazonKinesis kinesis = ...
...
PutRecordRequest putRecord = new PutRecordRequest();
putRecord.setStreamName(streamName);
putRecord.setData(ByteBuffer.wrap(bytes));
putRecord.setSequenceNumberForOrdering(null);
...
kinesis.putRecord(putRecord);
Producer
Kinesis
aws region
az1 az2 az3
Lambda
S3 storage
Redshift
consumers
EC2 Instance
EMR
AmazonKinesis kinesis = ...
...
PutRecordRequest putRecord = new PutRecordRequest();
putRecord.setStreamName(streamName);
putRecord.setData(ByteBuffer.wrap(bytes));
putRecord.setSequenceNumberForOrdering(null);
...
kinesis.putRecord(putRecord);
Producer
AmazonKinesisClient kinesisClient = ...
GetShardIteratorRequest req = ...
req.setStreamName("my-kinesis");
req.setShardIteratorType("TRIM_HORIZON");
...
GetRecordsResult result = kinesisClient.getRecords(req);
records = result.getRecords();
for (Record record : records) {
... = record.getData();
}
Consumerproducers
Kinesis streams
What: Enables to build near-real-time data processing
applications
Use cases:
• Real time analytics
• Log files processing
• Reporting
Durability: data streams replicated across 3AZ
Kinesis streams
Cost Model:
Shard Hour:
• 5 read transaction per second
• 2 MB data read per second
• 100 write transactions per second
• 1 MB data write per second
aprox 12.5USD/Mo
Extended data retention
• Up to 7 days
Kinesis streams
Not good when:
• Small scale throughput less than 200KB/sec
• Long term data storage (more than 24H)
Lambda
What: Lambda allows to write function without having actual
server
Use cases:
• Real time Stream processing
• Tiny ETL
• In few cases can replace EC2
• Process IaaS Events
Runtimes: Java8, NodeJS, Python
Backed by: provides /tmp for ephemeral storage.
Durability: No maintenance windows, 3 retries before failure
Lambda
Cost Model:
Requests per function:
• GB/seconds
• Step 100 millisec
• 0.20 USD Mill-Requests; $0.00001667 per GB
Lambda
Not good when:
• Timeout 300 sec (cannot be changed)
• Forces developer to think stateless
• Highly dynamic web-sites.
• Competes with t2.nano ($4.75/month)
S3 storage
SNS
consumers
Kinesis
Lambda
…Lambda
S3 storage
SNS
consumers
Kinesis
…
myApp.
ZIP
Java8
Python
NodeJS
EMR
What: Managed service of Apache Hadoop
Use cases:
• MapRed data processing
• Large data ETL jobs
• Data movement
• Log processing and analytics
Backed by: 1 or cluster of EC2 instances
Durability: on storage level provides by S3
See more:
https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf
EMR
Cost Model:
• Charges apply per EC2 sizes model
• S3 storage charges applies (0.03 GB/Mo)
EMR
Not good when:
• Small to Medium data sets
• ACID (atomicity, consistency, isolation, durability)
• Competes with RDS: Dynamo DB, Aurora DB
S3
What: Highly fully managed persistent storage
• Static content web sites
• File storage (primarily for reading)
• Archives storage
Backed by: covered by AWS S3 SLA
Durability: storage: 99.999999999%; availability: 99.99%
S3
Cost Model: GB/Mo
• Standard Storage: $0.03 GB/Mo
• Infrequent Access Storage: $0.0125 GB/Mo
• Glacier Storage: $0.007 GB/Mo
S3
Not good when:
• S3 write can be slow
• Glacier can restore up to 5% of storage per months
Redshift
What: Petabytes scale Data Warehouse as managed service
• Data warehouse (OLAP)
• BI and ETL
• Store large historical data
Backed by: AWS provides automatic data backup
Durability: on storage level provides by S3
Scaling: Start with 160GB node and then you can scale
Redshift
Cost Model:
• Charges apply per EC2 sizes model
• S3 storage charges applies (0.03 GB/Mo)
Redshift
Not good when:
• OLTP (On-line transaction processing)
• Unstructured data
• Blob storage
Kinesis
shard
shard
shard
producer
batch layer
speed layerec2
S3 Bucket Map Red
Process Stream
serving layer
View
DynamoDB
Primer Lambda
(every hour)
Kinesis
shard
shard
shard
producer
batch layer
speed layerec2
S3 Bucket Map Red
Process Stream
serving layer
View
DynamoDB
Lambda
(every hour)
computation per hour
Lambda
(every hour)
h0 h1 h2 h3
batch layerSpeed layer
t
Kinesis
shard
shard
shard
producer
batch layer
speed layerfec2
S3 Bucket Map Red
Process Stream
serving layer
View
DynamoDB
Primer Lambda
(every hour)
Lambda
(every hour)
Presentation Layer
JSappLambda
Lesions learned
It is better but not simple
Not everything is automated
Questions?
Thank you!

Riga dev day: Lambda architecture at AWS

  • 1.
    Is Lambda Architecturereally a new normal for cloud native apps? λ +
  • 2.
    :~ whoami: Antons Kranga Fullstack developer ~ 15years Cloud Architect DevOps evangelist Innovation Center of Accenture Cloud Platform Speaker Marathon runner
  • 3.
  • 4.
    What is Streaming? Weoften want to deploy data models based on new data that continuously arrive from the multiple sources 0 1 0 1 0 10 1 0 1 0 1 0 10 1
  • 5.
    Challenges Users expect datawill appear immediately after it arrived Fault tolerant Distributed data consistency Scalability (how not to lose data when scale down)
  • 6.
    What is “λ” 0 1 010 10 1 00 0 1 110 1 Speed Layer Batch Layer new data master data realtime view Serving Layer view View View… map-red query query realtime view
  • 7.
    What is “λ”architecture Batch Layer: Master Data sets and Pre-compute aggregations • Slow Data Ingestion – minutes to days intervals • Append-only data sets eventually supersedes data captured in speed layer Speed Layer: High throughput, near-real-time data ingestion • Fast Data Ingestion – seconds interval • Concurrent information processing • Retrieval of most recent information Serving Layer: Provide query capability over the Batch Layer • Low-latency ad-hoc query • May also provide assess to speed layer views
  • 8.
    Why go CloudNative?
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    Transparent integration with other“Cloud Native” services
  • 15.
    AWS Blueprint forLambda Architectures https://d0.awsstatic.com/whitepapers/lambda-architecure-on-for-batch-aws.pdf Published at July 2015 Amazon Kinesis AmazonKinesis– enabledapp S3 buckets Amazon EMR speed layer batch layer emr on serving and merging layer
  • 16.
  • 17.
    Kinesis aws region az1 az2az3 Lambda S3 storage Redshift consumers EC2 Instance EMR producers
  • 18.
    Kinesis producers aws region az1 az2az3 Lambda S3 storage Redshift consumers EC2 Instance EMR AmazonKinesis kinesis = ... ... PutRecordRequest putRecord = new PutRecordRequest(); putRecord.setStreamName(streamName); putRecord.setData(ByteBuffer.wrap(bytes)); putRecord.setSequenceNumberForOrdering(null); ... kinesis.putRecord(putRecord); Producer
  • 19.
    Kinesis aws region az1 az2az3 Lambda S3 storage Redshift consumers EC2 Instance EMR AmazonKinesis kinesis = ... ... PutRecordRequest putRecord = new PutRecordRequest(); putRecord.setStreamName(streamName); putRecord.setData(ByteBuffer.wrap(bytes)); putRecord.setSequenceNumberForOrdering(null); ... kinesis.putRecord(putRecord); Producer AmazonKinesisClient kinesisClient = ... GetShardIteratorRequest req = ... req.setStreamName("my-kinesis"); req.setShardIteratorType("TRIM_HORIZON"); ... GetRecordsResult result = kinesisClient.getRecords(req); records = result.getRecords(); for (Record record : records) { ... = record.getData(); } Consumerproducers
  • 20.
    Kinesis streams What: Enablesto build near-real-time data processing applications Use cases: • Real time analytics • Log files processing • Reporting Durability: data streams replicated across 3AZ
  • 21.
    Kinesis streams Cost Model: ShardHour: • 5 read transaction per second • 2 MB data read per second • 100 write transactions per second • 1 MB data write per second aprox 12.5USD/Mo Extended data retention • Up to 7 days
  • 22.
    Kinesis streams Not goodwhen: • Small scale throughput less than 200KB/sec • Long term data storage (more than 24H)
  • 23.
    Lambda What: Lambda allowsto write function without having actual server Use cases: • Real time Stream processing • Tiny ETL • In few cases can replace EC2 • Process IaaS Events Runtimes: Java8, NodeJS, Python Backed by: provides /tmp for ephemeral storage. Durability: No maintenance windows, 3 retries before failure
  • 24.
    Lambda Cost Model: Requests perfunction: • GB/seconds • Step 100 millisec • 0.20 USD Mill-Requests; $0.00001667 per GB
  • 25.
    Lambda Not good when: •Timeout 300 sec (cannot be changed) • Forces developer to think stateless • Highly dynamic web-sites. • Competes with t2.nano ($4.75/month)
  • 26.
  • 27.
    EMR What: Managed serviceof Apache Hadoop Use cases: • MapRed data processing • Large data ETL jobs • Data movement • Log processing and analytics Backed by: 1 or cluster of EC2 instances Durability: on storage level provides by S3 See more: https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf
  • 28.
    EMR Cost Model: • Chargesapply per EC2 sizes model • S3 storage charges applies (0.03 GB/Mo)
  • 29.
    EMR Not good when: •Small to Medium data sets • ACID (atomicity, consistency, isolation, durability) • Competes with RDS: Dynamo DB, Aurora DB
  • 30.
    S3 What: Highly fullymanaged persistent storage • Static content web sites • File storage (primarily for reading) • Archives storage Backed by: covered by AWS S3 SLA Durability: storage: 99.999999999%; availability: 99.99%
  • 31.
    S3 Cost Model: GB/Mo •Standard Storage: $0.03 GB/Mo • Infrequent Access Storage: $0.0125 GB/Mo • Glacier Storage: $0.007 GB/Mo
  • 32.
    S3 Not good when: •S3 write can be slow • Glacier can restore up to 5% of storage per months
  • 33.
    Redshift What: Petabytes scaleData Warehouse as managed service • Data warehouse (OLAP) • BI and ETL • Store large historical data Backed by: AWS provides automatic data backup Durability: on storage level provides by S3 Scaling: Start with 160GB node and then you can scale
  • 34.
    Redshift Cost Model: • Chargesapply per EC2 sizes model • S3 storage charges applies (0.03 GB/Mo)
  • 35.
    Redshift Not good when: •OLTP (On-line transaction processing) • Unstructured data • Blob storage
  • 36.
    Kinesis shard shard shard producer batch layer speed layerec2 S3Bucket Map Red Process Stream serving layer View DynamoDB Primer Lambda (every hour)
  • 37.
    Kinesis shard shard shard producer batch layer speed layerec2 S3Bucket Map Red Process Stream serving layer View DynamoDB Lambda (every hour) computation per hour Lambda (every hour) h0 h1 h2 h3 batch layerSpeed layer t
  • 38.
    Kinesis shard shard shard producer batch layer speed layerfec2 S3Bucket Map Red Process Stream serving layer View DynamoDB Primer Lambda (every hour) Lambda (every hour) Presentation Layer JSappLambda
  • 39.
  • 40.
    It is betterbut not simple
  • 41.
  • 42.
  • 43.