2. Quick one about myself
• Background in Consulting, coding for about 20 years now
• Second gig as CTO, third startup
• Wasn’t keen on Dynamo nor Node.js
• Always wanted to leverage events
• Tried microservices
6. lab-stream-process
• We receive data every 10 seconds from each IOT device
• Firehose buffers it
• From API Gateway it goes into DynamoDB
• Kappa architecture
8. lab-stream-process
• Cascading DynamoDB table
• 10s, 5min, 30min, 1h, 1d, 1m
• A function of the time stamp of an energy reading determines the key
of the next dynamo key and so on
10. lessons learnt:
• If you want you can do immutable with any db, but I’d probably choose
a truly immutable one right now (kinesis?)
• If you are doing IOT you should strongly consider aggregating at edge
• 360*24*1000 = 8.64M requests…
• 360*24*1000000 = 8.64B requests to Api Gateway / min writes to DDB
11. lab-stream-crm
• An index in Kibana for Each table in Dynamo
• Plus ‘some’ aggregation
• The elastic search indexes power a bunch of Kibana Dashboard and
the internal user management system
14. Lessons learnt
• 1:1 mapping is easy and already quite powerful
• aggregating is hard (esp. mutable objects)
• Candidate #1 for 2019 rewrite
• Use Kinesis / Build a proper event log?
• Do a first round of aggregation in DDB?
• Leave it, it w and use EMR/Athena (sad face)
18. Lessons Learnt
• It would be hard to actually list all the streams, functions and draw a
flow diagram by now!
• Will we end up recessively calling a function updating the same record
and triggering the same event?!
• Needs either more visualisation of what’s going on or introduce more
order (SQS/Kinesis)
19. lab-cron-jobs
• Not everything can be real time…
• Sort of map-reduce using dynamo + streams + cron lambda
• Mainly to integrate with 3rd parties requiring ‘daily’/‘weekly’ batches
21. Lessons Learnt
• This one works quite well, very simple, very observable
• Helps a lot also de-duping executions (such as double save on s3)
• Definitely a winner for us, but we need to be careful to do not overuse
it
22. lab-sqs-jobs
• Not everything should be decentralised
• Various different table source events…
• …Sending sqs messages to a single lambda
25. Lessons Learnt
• SQS doesn’t show you in flight messages (sigh)
• We did avoid writing shared libraries for 1.5 years but now it’s time
• Usecase for https://bitsrc.io/ ?
29. Notes from AWS docs…
• No more than 2 processes at most should be reading from the same
Streams shard at the same time. Having more than 2 readers per
shard may result in throttling.
• All data in DynamoDB Streams is subject to a 24 hour lifetime
31. Further considerations
• Too decentralised: hard to keep track of
• Too centralised: next monolith YES if you are processing not if you are
event sourcing
• Streams make ‘complex’ architectures easy peasy (CQRS, Event
Sourcing, Kappa Architecture, etc)