4. RADU VUNVULEA MCTS MCP BANK HOME AUTOMATION
MVP ENTERPRISE
AUTOMOTIVE PHARMA
LEAN AND AGILE E-COMMERCE
WEB iQuest
AZURE JAVASCRIPT VUNVULEARADU.BLOGSPOT.COM
MOBILE DOTNET @RaduVunvulea
WCF WPF ENTHUSIASTIC
5. 20.000 AWS VMs used to simulate the load
250.000 RPS (normal load)
100.000 device registered and active in 15 minutes
400.000 file of 5 MB uploaded in 30 minutes
102M of commands send to devices in 5 hours
101.7M of commands processed by devices in 5 hours
9 M of “I’m alive” events every 5 minutes (80GB/h)
Load Test Output
16. • Input:
– Processing an event takes to long (>1s)
– Number of events per second – >13.000 events/s
• Worker Roles
– ~100 events in parallel per instance
– 13.000/100 = 130 instances (theoretically)
– Max. number of partitions on Event Hub is 32
Problem
17. • Two types of events
– 95% were only Heartbeats
– 5% were other types of events
Investigation
18.
19. • Input:
– How to process 13.000 events/s
– We don’t need real time processing
– All the input data is stored in Azure Storage
Problem
20.
21. • Input:
– Processing an event takes to long (>100ms)
– CPU level is high
– There are times when the system freeze
Problem
22. • Input:
– Processing an event takes to long (>100ms)
– CPU level is high
– There are times when system freeze
– ~1000 events/s on each Worker Role instance
• Even with batch processing
Problem
23. • Bottleneck is the logger
• Writing logs is very expensive
• Having a high throughput and in the same time
to have a very good logging level activated is
impossible
Investigation
24.
25.
26. Postulate:
–Once an event is consumed from Event Hub is
not removed
We can reset the cursor as many time we want
We can analyze and process the same events over
and over again
Event Hub
30. • Input:
– During the load we started to see a lot of
Throttling Exception - “quota exceeded exception”
Problem
31.
32.
33. • Throughput Units (TU) under the same
namespace as shared between the Event Hubs
from the same namespace
Event Hub and Namespaces
34. • Input:
– The size of an event is 256 KB
– Unit of measure is 64 KB
– How we should handle events with payload bigger
than 256 KB or 64 KB
Problem
35.
36. • Input:
– Things can go wrong
– Azure Event Hub or an Azure Datacenter (Regions) can
go down for a short period of time or we can even
lose connection (our cause)
– How we can define and create a failover mechanism
for cover this use cases
Problem
37.
38.
39.
40.
41.
42.
43.
44. • Input:
– Internal review & external review arch. review reviled that
Service Bus Topic/Queues are not recommended for this uses
case
Problem
45.
46. • Input:
– Redis Cache is extremely fast (>120.000 reads/s)
– … but… when you have a lot of writes also…
is not so fast as you expect
– Latency for read operations went up (>2s)
Problem
47.
48. • Input:
– Even we scale a WebApp to multiple instances when you have a load of
more than 5.000 requests per second…. system will not behave as you
expect
• You can even discover that you WebApp was suspended and the only
thing that you can do is to make a new deploy and delete the existing
one
Problem
49.
50.
51. • Grouping resources together and defining the
quality attributes of that scaling unit
• When the limits are hit
another scaling unit is
added, without adding
more resources to the
current one(s)
Scaling Unit