Architetture serverless e pattern avanzati per AWS Lambda
1. AWS Online Tech Talks
La serie di webinar AWS in Italiano
Architetture serverless e pattern avanzati per AWS Lambda
Speaker
Alex Casalboni
Technical Evangelist, AWS
Obiettivi
§ Comprendere le best practice applicative legate a fattori quali
performance, testabilità e sicurezza
§ Approfondire le architetture di riferimento per sistemi di analisi dei
dati, real-time streaming, microservizi e intelligenza artificiale
§ Analizzare i benefici e i dettagli architetturali dei pattern di
riferimento principali
2. About me
• Software Engineer & Web Developer
• Worked in a startup for 4.5 years
• ServerlessDays Organizer
• AWS customer since 2013
3. Agenda
Serverless foundations (quickly, I promise!)
Advanced serverless patterns:
1. Web application / API
2. Stream processing
3. Data lakes
4. Machine learning
7. Tune your function’s resources
Only a memory control - % of CPU core and network capacity
allocated to a function proportionally
Is your code CPU, network or memory-bound? If so, it
could be cheaper to choose more memory
> Memory, > Cores, > Network
8. “AWS Lambda Power Tuning”
Data-driven cost & performance
optimization for AWS Lambda
github.com/alexcasalboni/aws-lambda-power-tuning
Don’t guesstimate!
9. Lambda best practices
Minimize your package size & use only needed SDK modules
Put your dependency (e.g. jar files) in a separate directory
Improve dependency injection with smaller and simpler IoC
frameworks that load quickly on startup, like Dagger2
Leverage smaller and faster frameworks like jackson-jr for
Java data binding
Use environment variables to modify operational behavior
Secure secrets/tokens/passwords with Parameter Store and
AWS Secrets Manager
10. AWS Serverless Application Model (SAM)
AWS CloudFormation extension
(Macro) to simplify serverless apps
New serverless resource types:
functions, APIs, and tables
Local testing with SAM CLI
github.com/awslabs/serverless-application-model
11. Source Build Test Deploy
AWS CodeCommit AWS CodeBuild Third Party
Tooling
AWS CodeDeploy
AWS CodePipeline
AWS CodeStar
AWS code services
14. Choose the right API endpoint type
Edge optimized: reduce latency from anywhere on the Internet
AWS Region
API Gateway
Internet
edge location
edge location
edge location
CloudFront
Distribution
API Gateway Managed
16. Choose the right API endpoint type
Regional AWS us-east-2
API Gateway
Internet AWS us-west-2
API Gateway
Route 53
Lambda DynamoDB
Lambda DynamoDB
GlobalTables
23. Streaming with Amazon Kinesis
Collect, process, and analyze video and data streams in real time
Kinesis Data
Firehose
SQL
Kinesis Data
Analytics
Kinesis Data
Streams
Kinesis Video
Streams
24. Streaming data ingestion
Amazon S3:
Buffered files
Kinesis
Agent
Record
producers Amazon Redshift:
Table loads
Amazon Elasticsearch Service:
Domain loads
Amazon S3:
Source record backup
Transformed recordsPut Records
Kinesis Firehose:
Delivery stream
AWS Lambda:
Transformations &
enrichment
Amazon DynamoDB:
Lookup tables
Raw
Lookup
Transformed
25. Streaming data ingestion (HTTP)
HTTP
POST/PUT
API
Gateway
Browser
Amazon S3:
Buffered files
Amazon Redshift:
Table loads
Amazon Elasticsearch Service:
Domain loads
Amazon S3:
Source record backup
AWS Lambda:
Transformations &
enrichment
Amazon DynamoDB:
Lookup tables
Raw
Lookup
Transformed
Transformed records
Kinesis Firehose:
Delivery stream
26. Streaming data ingestion (at the edge)
Amazon S3:
Buffered files
Amazon Redshift:
Table loads
Amazon Elasticsearch Service:
Domain loads
Amazon S3:
Source record backup
AWS Lambda:
Transformations &
enrichment
Amazon DynamoDB:
Lookup tables
Raw
Lookup
Transformed
Transformed records
Kinesis Firehose:
Delivery stream
HTTP
POST/PUT
CloudFront
Lambda@Edge
Browser
27. Kinesis Best practices
Tune Firehose buffer size and buffer interval
• Larger objects = fewer Lambda invocations & Amazon S3 PUTs
Enable compression to reduce storage costs
Enable Parquet format transformation (columnar)
Enable Source Record Backup for transformations
• Recover from transformation errors
28. Kinesis Data Streams and Lambda
# of shards corresponds to concurrent invocations of Lambda function
Batch size sets maximum # of records per invocation (min 1, max 10K)
Data Stream Processor Function
Streaming source Other AWS services
29. Fan-out pattern
Trade strict message ordering for higher throughput & lower latency
Kinesis Data Streams:
Stream
Lambda:
Dispatcher function
Lambda:
Processor function
Increase throughput, reduce processing latency
Streaming source
github.com/aws-samples/aws-lambda-fanout
30. Real-time analytics
Data Stream Kinesis Data Analytics:
Time window aggregation
Kinesis Data Firehose:
Error stream
S3:
Error records
Record
producers
Lambda:
Alert function
DynamoDB
SNS:
Notifications
31. CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM
"device_id",
STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '10' MINUTE) as "window_ts",
SUM("measurement") as "sample_sum",
COUNT(*) AS "sample_count"
FROM "SOURCE_SQL_STREAM_001"
GROUP BY
"device_id",
STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '10' MINUTE);
Kinesis Data Analytics
Aggregation
10-minute tumbling window
Kinesis Data Analytics:
Time window aggregation
Source stream Destination stream(s)
33. Data lake characteristics
Collect, store, process, consume, and analyze organizational data
Structured, semi-structured, and unstructured data
Decoupled compute and storage
Fast automated ingestion
Schema on-read
Complementary to data warehouses
34. Serverless data lake
S3
ElasticsearchGlueDynamoDB
Catalog & search
Cognito
API
Gateway
API/UI
Athena QuickSight
Redshift
Spectrum
Analytics & processing
LambdaKinesis
Streams
Kinesis
Firehose
Direct
Connect
Ingest
AWS
IoT
KMS CloudTrailIAM Macie
Security & auditing
36. Athena – Just SQL (presto)
Query duration: 44.66 seconds
Data scanned: 169.53GB
Cost*: $0.85
* $5/TB or $0.005/GB
SELECT gram, year, sum(count)
FROM ngram
WHERE gram = 'just say no'
GROUP BY gram, year
ORDER BY year ASC;
37. Athena best practices
Partition data
s3://my-bucket/my-data/parquet/year=2018/month=11/day=25/
Use columnar formats – Apache Parquet, AVRO, ORC
Compress files with splittable compression (bzip2)
Optimize file sizes
aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena
39. Pywren
Python library developed by University of California, Berkeley
10 TFLOPS of peak compute power (default of 1000 concurrent functions)
Over 80 GB/sec of read and 60 GB/sec of write performance using S3
http://pywren.io
41. M L F R A M E W O R K S &
I N F R A S T R U C T U R E
The Amazon ML Stack: Broadest & Deepest Set of Capabilities
A I S E R V I C E S
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D
C O M P R E H E N D
M E D I C A L
L E XR E K O G N I T I O N
V I D E O
Vision Speech Chatbots
A M A Z O N S A G E M A K E R
B U I L D T R A I N
F O R E C A S TT E X T R A C T P E R S O N A L I Z E
D E P L O Y
Pre-built algorithms & notebooks
Data labeling (G R O U N D T R U T H )
One-click model training & tuning
Optimization ( N E O )
One-click deployment & hosting
M L S E R V I C E S
F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e
E C 2 P 3
& P 3 d n
E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C
I N F E R E N C E
Models without training data (REINFORCEMENT LEARNING)
Algorithms & models ( A W S M A R K E T P L A C E )
Language Forecasting Recommendations
NEW NEWNEW
NEW
NEW
NEWNEW
NEW
NEW
43. Media analysis solution
S3:
Web interface
Cognito
Amazon Rekognition Video:
Detect objects, scenes,
faces, & celebrities
Elasticsearch:
Search index
API Gateway:
REST APIs
https://aws.amazon.com/answers/media-entertainment/media-analysis-solution/
AWS Elemental MediaConvert:
Transcode videos
S3:
Media storage
Step Functions:
Orchestrate
analysis
Transcribe Comprehend
44. Amazon Connect
(Serverless contact center)
Real time and
historical analytics
High-quality
voice capability
Call
recording
Skills-based routing
[Automatic Call Distribution (ACD)]
45. Intelligent call center chatbot
Amazon
Connect
Customer
Amazon
Lex
Lambda:
Chatbot
Processing
DynamoDB:
Customer
Data
SNS:
SMS Messaging
Customer calls
Connect to
reschedule an
appointment
Connect calls
Lex chatbot
Lex chatbot calls
Lambda function to
get customer
preferences and
fulfil Intents
Lambda function
sends text message
confirmation via SNS
Customer receives
appointment
confirmation text
message
Lambda
function writes
updates to
DynamoDB
46. Call center analytics
Amazon
Connect
Customers
Agents
Call
recordings
S3: Call
recordings
S3: Call
transcripts
Step Functions TranscribeLambda
S3: Sentiment,
key phrases,
entities
Step Functions
S3 Notifications
for call
transcripts
ComprehendLambda
Athena
QuickSight
Contact trace records (CTRs)
Kinesis Data
Streams
Kinesis Data
Firehose
S3: CTRs