SlideShare a Scribd company logo
1
Self-adaptive container monitoring with
performance-aware load-shedding policies
NECST Group Conference 2017 @ Oracle Labs
07/06/2017
Rolando Brondolin
rolando.brondolin@polimi.it
DEIB, Politecnico di Milano
Cloud trends
• 2017	State	of	the	cloud	[1]:	
– 79%	of	workloads	run	in	cloud	(41%	public,	38%	private)	
– Operations	focusing	on:		
• moving	more	workloads	to	cloud	
• existing	cloud	usage	optimization	(cost	reduction)
2
• Nowadays	Docker	is	becoming	the	de-facto	standard	for	Cloud	deployments	
– lightweight	abstraction	on	system	resources	
– fast	deployment,	management	and	maintenance	
– large	deployments	and	automatic	orchestration
[1]	Cloud	Computing	Trends:	2017	State	of	the	Cloud	Survey,	Kim	Weins,	Rightscale
3
#requests/s
heap	size
							CPU	usage Q(t)	λ(t)	μ(t)	
											#store/s											#load/s
Infrastructure monitoring (1)
• Container	complexity	demands	strong	monitoring	capabilities	
– Systematic	approach	for	monitoring	and	troubleshooting	
– Tradeoff	on	data	granularity	and	resource	consumption
4
#requests/s
heap	size
							CPU	usage
Q(t)	λ(t)	μ(t)	
											#store/s
											#load/s
high	visibility	on	system	state	
non	negligible	cost
few	information	on	system	state	
cheap	monitoring
VS
• Container	complexity	demands	strong	monitoring	capabilities	
– Systematic	approach	for	monitoring	and	troubleshooting	
– Tradeoff	on	data	granularity	and	resource	consumption
few	information	on	system	state	
cheap	monitoring
high	visibility	on	system	state	
non	negligible	cost
Infrastructure monitoring (2) 5
#requests/s
heap	size
							CPU	usage
Q(t)	λ(t)	μ(t)	
											#store/s
											#load/s
VS
High	data	granularity Good	data	granularity High	data	granularity
Code	instrumentation Code	instrumentation No	instrumentation
Low	metrics	rate High	metrics	rate High	metrics	rate
Sysdig Cloud monitoring 6
http://www.sysdig.org
• Infrastructure	for	container	monitoring
• Collects	aggregated	metrics	and	shows	system	state:	
– “Drill-down”	from	cluster	to	single	application	metrics	
– Dynamic	network	topology	
– Alerting	and	anomaly	detection	
• Monitoring	agent	deployed	on	each	machine	in	the	cluster	
– Traces	system	calls	in	a	“streaming	fashion”	
– Aggregates	data	for	Threads,	FDs,	applications,	containers	and	hosts
IssuesEffectCause
Problem definition
• The	Sysdig	Cloud	agent	can	be	modelled	as	a	server	with	a	finite	queue	
• characterized	by	its	arrival	rate	λ(t)	and	its	service	rate	μ(t)	
• Subject	to	overloading	conditions
7
Events	arrives	at	
really	high	frequency Queues	grow		
indefinitely
High	usage	of	system	
resources
Uncontrolled	

loss	of	events
S
λ(t) φ(t)
μ(t)
Λ Φ
Q
S
φ(t)
μ(t)
Φ
Q
of a streaming system with queue, processing element and streaming
output flow . A server S, fed by a queue Q, is in overloading
eater than the service rate µ(t). The stability condition stated
he necessary and sufficient condition to avoid overloading. A
ncing overloading should discard part of the input to increase
to match the arrival rate (t).
µ(t)  (t) (2.1)
rmalizing is twofold, as we are interested not only in controlling
t also in maximizing the accuracy of the estimated metrics. To
which represents the input flow at a given time t; and ˜x, which
ut flow considered in case of overloading at the same time t. If
Output	quality	
degradation
Proposed solution: FFWD
Fast	Forward	With	Degradation	(FFWD)	is	a	framework	that	tackles	load	peaks	
in	streaming	applications	via	load-shedding	techniques	
general	approach	but	leveraging	domain-specific	details
8
Load	Manager	
			*when*
aggregated	
metrics	
correction
LS	Filter	
*where*
Policy	
wrapper	
shedding	
plan
Mitigate	high	usage	of	
system	resources
Avoid	uncontrolled	

loss	of	events
minimize	output	quality	
degradation
Utilization-based	Load	Manager
The system in Figure 1 can be modeled by means of
Queuing Theory: the application is a single server node fed
by a queue, which provides the input jobs at a variable arrival
rate (t); the application is able to serve jobs at a service
rate µ(t). The system measures (t) and µ(t) in events per
second, where the events are respectively the input tweets and
the serviced tweets.
Starting from this, the simplest way to model the system
behavior is by means of the Little’s law (1), which states that
the number of jobs inside a system is equal to the input arrival
rate times the system response time:
N(t) = (t) · R(t) (1)
Q(t) = Q(t 1) + (t) µ(t) (2)
U(t) =
(t)
µmax
+
Q(t)
µmax
(3)
Q(t) = µmax · U(t) (t) (4)
e(t) = U(t) U(t 1) (5)
he system in Figure 1 can be modeled by means of
euing Theory: the application is a single server node fed
a queue, which provides the input jobs at a variable arrival
(t); the application is able to serve jobs at a service
µ(t). The system measures (t) and µ(t) in events per
ond, where the events are respectively the input tweets and
serviced tweets.
tarting from this, the simplest way to model the system
avior is by means of the Little’s law (1), which states that
number of jobs inside a system is equal to the input arrival
times the system response time:
N(t) = (t) · R(t) (1)
Q(t) = Q(t 1) + (t) µ(t) (2)
U(t) =
(t)
µmax
+
Q(t)
µmax
(3)
Q(t) = µmax · U(t) (t) (4)
e(t) = U(t) U(t 1) (5)
S:
Control	error:
4.3. Policy wrapper and
equation (4.13). This leads to the final formulation of the Loa
(4.14), where the throughput at time t + 1 is a function of th
the maximum available throughput times the feedback error.
e(t) = U(t) ¯U
µ(t + 1) = (t) + µmax · e(t)
The Load Manager formulation just obtained is compose
the one hand, when the contribution of the feedback error e(
Requested	throughput:
4.3. Policy wrapper and L
equation (4.13). This leads to the final formulation of the Load
(4.14), where the throughput at time t + 1 is a function of the
the maximum available throughput times the feedback error.
e(t) = U(t) ¯U
µ(t + 1) = (t) + µmax · e(t)
The Load Manager formulation just obtained is composed
the one hand, when the contribution of the feedback error e(t
condition of equation (4.15) is met; on the other hand, the secon
The	system	can	be	characterized		
by	its	utilization	and	its	queue	size
Load Manager 9
Metrics	
• The	Load	Manager	computes	the	throughput	μ(t)	that	
ensures	stability	such	that:
we analyze the formulation for the Load Manager’s actuation µ(t+1) just obtained,
ice that it is a sum of two different contributions. On the one hand, as the error e(t)
to zero, the stability condition (4.7) is met. On the other hand, the contribution:
(t) ensures a fast actuation in case of a significant deviation from the actual system
rium.
(t)  µ(t) (4.7)
course, during the lifetime of the system, the arrival rate (t) can vary unpre-
ly and can be greater than the system capacity µc(t), defined as the rate of events
ted per second. Given the control action µ(t) (i.e., the throughput of the system)
e system capacity, we can define µd(t) as the dropping rate of the LS. As we did
), we can estimate the current system capacity as the number of events analyzed
last time period. Thus, for a given time t, equation (4.8) shows that the service
the sum of the system capacity estimated and the number of events that we need
p to achieve the required stability:
µ(t) = µc(t 1) + µd(t) (4.8)
Utilization
s section we describe the Utilization based Load Manager, which becomes of use
e of streaming applications which should operate with a limited overhead. The
tion based Load Manager, which is showed in Figure 4.4, resorts to queuing theory
CPU	utilization Arrived	events Residual	events
Current	utilization Target	utilization
Arrival	rate
Max	theoretical	
throughput
Control	errorThe	requested	throughput	is	used	by	the	load	shedding	policies	to	derive	the	LS	probabilities
Policy wrapper and policies
• The	policy	wrapper	provides	access	to	statistics	of	processes,	the	
requested	throughput	μ(t+1)	and	the	system	capacity	μc(t)
10
Fair	policy	
• Assign	to	each	process	the	“same"	number	

of	events	
• Save	metrics	of	small	processes,	still	
accurate	results	on	big	ones
Priority-based	policy	
• Assign	a	static	priority	to	each	process	
• Compute	a	weighted	priority	to	partition	
the	system	capacity	
• Assign	a	partition	to	each	process	and	
compute	the	probabilities
Metrics	Baseline	policy	
• Compute	one	LS	probability	for	all	processes	(with	μ(t+1)	and
Load Shedding Filter
• The	Load	Shedding	Filter	applies	the	probabilities	

computed	by	the	policies	to	the	input	stream	
• For	each	event:	
• Look	for	load	shedding	probability	depending	on	input	class	
• If	no	data	is	found	we	can	drop	the	event	
• Otherwise,	apply	the	Load	Shedding	probability	computed	by	the	policy	
• The	dropped	events	are	reported	to	the	application	for	metrics	correction
11
Metrics	
Load Shedding
Filter
Shedding
Plan
event buffers
ok
drop probability
Event
Capture
ko
• We	evaluated	FFWD	within	Sysdig	
with	2	goals:	
• System	stability	(slide	13)	
• Output	quality			(slides	14	15	16	17)	
• Results	compared	with	the	reference	
filtering	system	of	Sysdig	
• Evaluation	setup	
• 2x	Xeon	E5-2650	v3,	

20	cores	(40	w/HT)	@	2.3Ghz	
• 128	GB	DDR4	RAM	
• Test	selected	from	Phoronix	test	suite
Experimental setup 12
test	ID name priority #	evts/s
A nginx 3 800K
B postmark 4 1,2M
C fio 4 1,3M
D simplefile 2 1,5M
E apache 2 1,9M
test	ID instances #	evts/s
F 3x	nginx,	1x	fio 1,3M
G 1x	nginx,	1x	simplefile 1,3M
H
1x	apache,	2x	postmark,	
1x	fio
1,8M
Homogeneous	benchmarks
Heterogeneous	benchmarks
Syscall	intensive	benchmarks	
from	Phoronix	test	suite
System stability 13
• We	evaluated	the	Load	Manager	with	all	the	tests	(A,	B,	C,	D,	E,	F,	G)	
• With	3	different	set	points	(Ut	1.0%,	1.1%,	1.2%	w.r.t.	system	capacity)	
• Measuring	the	CPU	load	of	the	sysdig	agent	with:	
• reference	implementation	
• FFWD	with	fair	and	priority	policy	
• We	compared	the	actual	CPU	load

with	the	QoS	requirement	(Ut)	
• Error	measured	with	MAPE	(lower	

is	better)	obtained	running	20	times	

each	benchmark		
• 3.51x	average	MAPE	improvement,

average	MAPE	below	5%	
Test
Ut	=	1.1%
reference fair priority
A 7,12% 1,78% 3,78%
B 34,06% 4,37% 4,46%
C 28,03% 2,27% 2,24%
D 11,52% 1,41% 1,54%
E 26,02% 8,51% 8,99%
F 22,67% 8,11% 3,74%
G 16,42% 3,37% 2,73%
H 19,92% 8,41% 8,01%
Output quality - heterogeneous
• We	tried	to	mix	the	homogeneous	tests	
• simulate	co-located	environment	
• add	OS	scheduling	uncertainty	and	noise	
• QoS	requirement	Ut	1.1%	
• MAPE	(lower	is	better)	between	exact	and	approximated	metrics	
• Compare	metrics	from	reference,	FFWD	fair,	FFWD	priority	
• Three	tests	with	different	syscall	mix:	
• Network	based	mid-throughput:	1x	Fio,	3x	Nginx,	1.3M	evt/s	
• Mixed	mid-throughput:	1x	Simplefile,	1x	Nginx,	1.3M	evt/s	
• Mixed	high-throughput:	1x	Apache,	1x	Fio,	2x	Postmark,	1.8M	evt/s
14
1x Fio, 3x Nginx, 1.3M evt/s 15
0.1
1
10
100
1000
10000
100000
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
MAPE(%)log
kernel-drop
fair
priority
nginx-3nginx-2nginx-1fio
0.1
1
10
100
1000
10000
100000
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
MAPE(%)log kernel-drop
fair
priority
nginx-3nginx-2nginx-1fio
0.1
1
10
100
1000
10000
100000
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
MAPE(%)log
kernel-drop
fair
priority
nginx-3nginx-2nginx-1fio
0.1
1
10
100
1000
10000
100000
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
MAPE(%)log
kernel-drop
fair
priority
nginx-3nginx-2nginx-1fio
reference
0.1
1
10
100
1000
10000
100000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
nginx-3nginx-2nginx-1fio
0.1
1
10
100
1000
10000
100000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
nginx-3nginx-2nginx-1fio
0.1
1
10
100
1000
10000
100000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
nginx-3nginx-2nginx-1fio
0.1
1
10
100
1000
10000
100000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
nginx-3nginx-2nginx-1fio
referenceVolume	metrics	(byte	r/w)
Latency	metrics
MAPE	lower	is	better
1x Apache, 1x Fio, 2x Postmark, 1.8M evt/s 16
0.1
1
10
100
1000
10000
100000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
0.1
1
10
100
1000
10000
100000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
0.1
1
10
100
1000
10000
100000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
0.1
1
10
100
1000
10000
100000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
reference
0.1
1
10
100
1000
10000
100000
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
0.1
1
10
100
1000
10000
100000
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
0.1
1
10
100
1000
10000
100000
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
MAPE(%)log kernel-drop
fair
priority
postmark-2postmark-1fioapache
0.1
1
10
100
1000
10000
100000
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
reference
Volume	metrics	(byte	r/w)
Latency	metrics
MAPE	lower	is	better
17
0.1
1
10
100
1000
10000
100000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
0.1
1
10
100
1000
10000
100000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
0.1
1
10
100
1000
10000
100000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
0.1
1
10
100
1000
10000
100000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
reference
0.1
1
10
100
1000
10000
100000
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
0.1
1
10
100
1000
10000
100000
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
0.1
1
10
100
1000
10000
100000
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
MAPE(%)log kernel-drop
fair
priority
postmark-2postmark-1fioapache
0.1
1
10
100
1000
10000
100000
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
latency-filelatency-net
MAPE(%)log
kernel-drop
fair
priority
postmark-2postmark-1fioapache
reference
Volume	metrics	(byte	r/w)
Latency	metrics
MAPE	lower	is	better
Test	H,	mixed	workloads:	1x	apache,	1x	fio,	2x	postmark,	1.8M	evt/s
• Fair	policy	outperforms	reference	in	almost	all	cases	
• the	LS	Filter	works	at	the	single	event	level	
• reference	drops	events	in	batches	
• Priority	policy	improves	the	Fair	policy	results	in	most	cases	
• the	prioritized	processes	are	privileged	
• other	processes	treated	as	“best-effort”
1x Apache, 1x Fio, 2x Postmark, 1.8M evt/s
Conclusion
• We	saw	the	main	challenges	of	Load	Shedding	for	container	monitoring	
– Low	overhead	monitoring	
– High	quality	and	granularity	of	metrics	
• Fast	Forward	With	Degradation	(FFWD)	
– Heuristic	controller	for	bounded	CPU	usage	
– Pluggable	policies	for	domain-specific	load	shedding		
– Accurate	computation	of	output	metrics	
– Load	Shedding	Filter	for	fast	drop	of	events
18
19
Questions?
Rolando Brondolin, rolando.brondolin@polimi.it
DEIB, Politecnico di Milano
NGC VIII 2017 @ SF
FFWD: Latency-aware event stream processing via domain-specific load-shedding policies. R. Brondolin, M. Ferroni, M. D.
Santambrogio. In Proceedings of 14th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC 2016)
20
BACKUP	SLIDES
21
Output quality - homogeneous
• QoS	requirement	Ut	1.1%,	standard	set-point	for	the	agent	
• MAPE	(lower	is	better)	between	exact	and	approximated	metrics	
• Output	metrics	on	latency	and	volume	for	file	and	network	operations	
• Similar	or	better	results	of	FFWD	fair	policy	w.r.t	reference	
• FFWD	accurate	even	if	drops	more	events	
• Predictable	and	repetitive	behavior	of	nginx,	fio	and	apache	
22
1
10
100
1000
latency-file
latency-net
volum
e-file
volum
e-net
MAPE(%)log
kernel-drop
fair
1
10
100
1000
latency-file
latency-net
volum
e-file
volum
e-net
MAPE(%)log
kernel-drop
fair
1
10
100
1000
latency-file
latency-net
volum
e-file
volum
e-net
MAPE(%)log
kernel-drop
fair
1
10
100
1000
latency-file
latency-net
volum
e-file
volum
e-net
MAPE(%)log
kernel-drop
fair
reference
1
10
100
1000
latency-file
latency-net
volum
e-file
volum
e-net
MAPE(%)log
kernel-drop
fair
apache	
1.9M	evt/s
postmark	
1.2M	evt/s
simplefile	
1.5M	evt/s
fio	
1.3M	evt/s
nginx	
800K	evt/s
1x simplefile, 1x nginx, 1.3M evt/s 23
1
10
100
1000
latency-filelatency-net
latency-filelatency-net
MAPE(%)log
kernel-drop
fair
priority
nginxsimplefile
reference
1
10
100
1000
volum
e-filevolum
e-net
volum
e-filevolum
e-net
MAPE(%)log
kernel-drop
fair
priority
nginx-1simplefile
reference
Volume	metrics	(byte	r/w)Latency	metrics
MAPE	lower	is	better
Response time Load Manager 24
S:
(Little’s	Law)
(Jobs	in	the	system)
The	system	can	be	characterized	by	its	response	time	and	the	jobs	in	the	system
Control	error:
Requested	throughput:
The	requested	throughput	is	used	by	the	load	shedding	policies	to	derive	the	LS	probabilities
25
S:
(Little’s	Law)
(Jobs	in	the	system)
The	system	can	be	characterized	by	its	response	time	and	the	jobs	in	the	system
Control	error:
Requested	throughput:
The	requested	throughput	is	used	by	the	load	shedding	policies	to	derive	the	LS	probabilities
Old	response	time Target	response	time
Response time Load Manager
26
S:
(Little’s	Law)
(Jobs	in	the	system)
The	system	can	be	characterized	by	its	response	time	and	the	jobs	in	the	system
Control	error:
Requested	throughput:
The	requested	throughput	is	used	by	the	load	shedding	policies	to	derive	the	LS	probabilities
Requested	throughput Arrival	rate
Control	error
Response time Load Manager
Case studies 27
System	monitoring	[2]	
• Goal:	Distributed	monitoring	of	systems	
and	applications	w/syscalls	
• Constraint:	CPU	utilization	
• Based	on:	Sysdig	monitoring	agent	
• Output:	aggregated	performance	metrics	
for	applications,	containers,	hosts	
• FFWD	ensures	low	CPU	overhead	
• policies	based	on	processes	in	the	system
[1]	http://nlp.stanford.edu [2]	http://www.sysdig.org
Sentiment	analysis	[1]	
• Goal:	perform	real-time	analysis	on	tweets
Case studies 28
System	monitoring	[2]	
• Goal:	Distributed	monitoring	of	systems	
[1]	http://nlp.stanford.edu [2]	http://www.sysdig.org
Sentiment	analysis	[1]	
• Goal:	perform	real-time	analysis	on	tweets	
• Constraint:	Latency	
• Based	on:	Stanford	NLP	toolkit	
• Output:	aggregated	sentiment	score	for	
each	keyword	and	hashtag	
• FFWD	maintains	limited	the	response	time	
• policies	on	tweet	keyword	and	#hashtag
Real-time sentiment analysis 29
• Real-time sentiment analysis allows to:
– Track the sentiment of a topic over time
– Correlate real world events and related sentiment, e.g.
• Toyota crisis (2010) [1]
• 2012 US Presidential Election Cycle [2]
– Track online evolution of companies reputation, derive social
profiling and allow enhanced social marketing strategies
[1] Bifet Figuerol, Albert Carles, et al. "Detecting sentiment change in Twitter streaming data." Journal of Machine Learning Research:
Workshop and Conference Proceedings Series. 2011.
[2] Wang, Hao, et al. "A system for real-time twitter sentiment analysis of 2012 us presidential election cycle." Proceedings of the ACL
2012 System Demonstrations.
Sentiment analysis: case study 30
• Simple Twitter streaming sentiment analyzer with Stanford NLP
• System components:
– Event producer
– RabbitMQ queue
– Event consumer
• Consumer components:
– Event Capture
– Sentiment Analyzer
– Sentiment Aggregator
• Real-time queue consumption, aggregated metrics emission each second
(keywords and hashtag sentiment)
FFWD: Sentiment analysis 31
• FFWD adds four components:
– Load shedding filter at the beginning of the pipeline
– Shedding plan used by the filter
– Domain-specific policy wrapper
– Application controller manager to detect load peaks
Producer
Load Shedding
Filter
Event
Capture
Sentiment
Analyzer
Sentiment
Aggregator
Policy
Wrapper
Load Manager
Shedding
Plan
real-time queue
batch queue
ok
ko
ko count
account metrics
R(t)
stream statsupdated plan
μ(t+1)
event output metricsinput tweets
drop probability
Component
Data structure
Internal information flow
External information flow
Queue
analyze event
λ(t)
Rt
Sentiment - experimental setup 32
• Separate tests to understand FFWD behavior:
– System stability
– Output quality
• Dataset: 900K tweets of 35th week of Premier League
• Performed tests:
– Controller: synthetic and real tweets at various λ(t)
– Policy: real tweets at various λ(t)
• Evaluation setup
– Intel core i7 3770, 4 cores @ 3.4 Ghz + HT, 8MB LLC
– 8 GB RAM @ 1600 Mhz
System stability 33
case	A:	λ(t)	=	λ(t-1)
case	B:	λ(t)	=	avg(λ(t))
λ(t)	estimation:
Load Manager showcase (1)
• Load Manager demo (Rt = 5s):
– λ(t) increased after 60s and 240s
– response time:
34
0
1
2
3
4
5
6
7
0 50 100 150 200 250 300
Responsetime(s)
time (s)
Controller performance
QoS = 5s
R
Load Manager showcase (2)
• Load Manager demo (Rt = 5s):
– λ(t) increased after 60s and 240s
– throughput:
35
0
100
200
300
400
500
0 50 100 150 200 250 300
#Events
time (s)
Actuation
lambda
dropped
computed
mu
Output Quality 36
• Real tweets, μc(t) ≃ 40 evt/s
• Evaluated policies:
• Baseline
• Fair
• Priority
• R = 5s, λ(t) = 100 evt/s, 200 evt/s, 400 evt/s
• Error metric: Mean Absolute Percentage
Error (MAPE %) (lower is better)
0
10
20
30
40
50
A B C D
MAPE(%)
Groups
baseline_error
fair_error
priority_error
λ(t) = 100 evt/s
0
10
20
30
40
50
A B C D
MAPE(%)
Groups
baseline_error
fair_error
priority_error
λ(t) = 200 evt/s
0
10
20
30
40
50
A B C D
MAPE(%)
Groups
baseline_error
fair_error
priority_error
λ(t) = 400 evt/s

More Related Content

What's hot

Morales, Randulph: Spatio-temporal kriging in estimating local methane source...
Morales, Randulph: Spatio-temporal kriging in estimating local methane source...Morales, Randulph: Spatio-temporal kriging in estimating local methane source...
Morales, Randulph: Spatio-temporal kriging in estimating local methane source...
Integrated Carbon Observation System (ICOS)
 
D0341015020
D0341015020D0341015020
D0341015020
inventionjournals
 
IRJET- Different Data Mining Techniques for Weather Prediction
IRJET-  	  Different Data Mining Techniques for Weather PredictionIRJET-  	  Different Data Mining Techniques for Weather Prediction
IRJET- Different Data Mining Techniques for Weather Prediction
IRJET Journal
 
B04402016018
B04402016018B04402016018
B04402016018
ijceronline
 
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
Matteo Ferroni
 
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...ijcsit
 
An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling
IJECEIAES
 
Chpt7
Chpt7Chpt7
Pdcs2010 balman-presentation
Pdcs2010 balman-presentationPdcs2010 balman-presentation
Pdcs2010 balman-presentation
balmanme
 
Query optimization for_sensor_networks
Query optimization for_sensor_networksQuery optimization for_sensor_networks
Query optimization for_sensor_networks
Harshavardhan Achrekar
 
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
IJCNCJournal
 
Amdahl`s law -Processor performance
Amdahl`s law -Processor performanceAmdahl`s law -Processor performance
Amdahl`s law -Processor performance
COMSATS Institute of Information Technology
 
Airspace configuration using_air_traffic_complexity_metrics
Airspace configuration using_air_traffic_complexity_metricsAirspace configuration using_air_traffic_complexity_metrics
Airspace configuration using_air_traffic_complexity_metrics
xiaofeng007
 
The Queue M/M/1 with Additional Servers for a Longer Queue
The Queue M/M/1 with Additional Servers for a Longer QueueThe Queue M/M/1 with Additional Servers for a Longer Queue
The Queue M/M/1 with Additional Servers for a Longer Queue
IJMER
 
ICC paper
ICC paperICC paper
ICC paperQi Chen
 
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural NetworksImproving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Aditya N Deshmukh
 
Aggarwal Draft
Aggarwal DraftAggarwal Draft
Aggarwal Draft
Deanna Kosaraju
 

What's hot (20)

Morales, Randulph: Spatio-temporal kriging in estimating local methane source...
Morales, Randulph: Spatio-temporal kriging in estimating local methane source...Morales, Randulph: Spatio-temporal kriging in estimating local methane source...
Morales, Randulph: Spatio-temporal kriging in estimating local methane source...
 
D0341015020
D0341015020D0341015020
D0341015020
 
IRJET- Different Data Mining Techniques for Weather Prediction
IRJET-  	  Different Data Mining Techniques for Weather PredictionIRJET-  	  Different Data Mining Techniques for Weather Prediction
IRJET- Different Data Mining Techniques for Weather Prediction
 
B04402016018
B04402016018B04402016018
B04402016018
 
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
 
L09
L09L09
L09
 
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
 
An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling
 
Chpt7
Chpt7Chpt7
Chpt7
 
Pdcs2010 balman-presentation
Pdcs2010 balman-presentationPdcs2010 balman-presentation
Pdcs2010 balman-presentation
 
Query optimization for_sensor_networks
Query optimization for_sensor_networksQuery optimization for_sensor_networks
Query optimization for_sensor_networks
 
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
 
Amdahl`s law -Processor performance
Amdahl`s law -Processor performanceAmdahl`s law -Processor performance
Amdahl`s law -Processor performance
 
Airspace configuration using_air_traffic_complexity_metrics
Airspace configuration using_air_traffic_complexity_metricsAirspace configuration using_air_traffic_complexity_metrics
Airspace configuration using_air_traffic_complexity_metrics
 
The Queue M/M/1 with Additional Servers for a Longer Queue
The Queue M/M/1 with Additional Servers for a Longer QueueThe Queue M/M/1 with Additional Servers for a Longer Queue
The Queue M/M/1 with Additional Servers for a Longer Queue
 
assignment_3
assignment_3assignment_3
assignment_3
 
ICC paper
ICC paperICC paper
ICC paper
 
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural NetworksImproving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
 
Poster_submitted_final
Poster_submitted_finalPoster_submitted_final
Poster_submitted_final
 
Aggarwal Draft
Aggarwal DraftAggarwal Draft
Aggarwal Draft
 

Similar to Self-adaptive container monitoring with performance-aware Load-Shedding policies

Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policies
NECST Lab @ Politecnico di Milano
 
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
IJERA Editor
 
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
csandit
 
Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check  Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check
IJECEIAES
 
Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...
Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...
Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...
International Journal of Engineering Inventions www.ijeijournal.com
 
GEOframe-NewAge: documentation for probabilitiesbackward component
GEOframe-NewAge: documentation for probabilitiesbackward componentGEOframe-NewAge: documentation for probabilitiesbackward component
GEOframe-NewAge: documentation for probabilitiesbackward component
Marialaura Bancheri
 
Data Structure and Algorithm chapter two, This material is for Data Structure...
Data Structure and Algorithm chapter two, This material is for Data Structure...Data Structure and Algorithm chapter two, This material is for Data Structure...
Data Structure and Algorithm chapter two, This material is for Data Structure...
bekidea
 
JGrass-NewAge probabilities backward component
JGrass-NewAge probabilities backward component JGrass-NewAge probabilities backward component
JGrass-NewAge probabilities backward component
Marialaura Bancheri
 
V01 i010401
V01 i010401V01 i010401
V01 i010401
IJARBEST JOURNAL
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policies
NECST Lab @ Politecnico di Milano
 
Design Of A PI Rate Controller For Mitigating SIP Overload
Design Of A PI Rate Controller For Mitigating SIP OverloadDesign Of A PI Rate Controller For Mitigating SIP Overload
Design Of A PI Rate Controller For Mitigating SIP Overload
Yang Hong
 
Time alignment techniques for experimental sensor data
Time alignment techniques for experimental sensor dataTime alignment techniques for experimental sensor data
Time alignment techniques for experimental sensor data
IJCSES Journal
 
M017419499
M017419499M017419499
M017419499
IOSR Journals
 
Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing SystemsLatency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Zbigniew Jerzak
 
02 physical.system.modelling mechanical.systems.
02 physical.system.modelling mechanical.systems.02 physical.system.modelling mechanical.systems.
02 physical.system.modelling mechanical.systems.
Mahmoud Hussein
 
1 Aminullah Assagaf_Estimation-of-domain-of-attraction-for-the-fract_2021_Non...
1 Aminullah Assagaf_Estimation-of-domain-of-attraction-for-the-fract_2021_Non...1 Aminullah Assagaf_Estimation-of-domain-of-attraction-for-the-fract_2021_Non...
1 Aminullah Assagaf_Estimation-of-domain-of-attraction-for-the-fract_2021_Non...
Aminullah Assagaf
 
Dynamic Kohonen Network for Representing Changes in Inputs
Dynamic Kohonen Network for Representing Changes in InputsDynamic Kohonen Network for Representing Changes in Inputs
Dynamic Kohonen Network for Representing Changes in InputsJean Fecteau
 
JGrass-NewAge probabilities forward component
JGrass-NewAge probabilities forward component JGrass-NewAge probabilities forward component
JGrass-NewAge probabilities forward component
Marialaura Bancheri
 

Similar to Self-adaptive container monitoring with performance-aware Load-Shedding policies (20)

Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policies
 
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
 
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
 
Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check  Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check
 
Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...
Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...
Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...
 
GEOframe-NewAge: documentation for probabilitiesbackward component
GEOframe-NewAge: documentation for probabilitiesbackward componentGEOframe-NewAge: documentation for probabilitiesbackward component
GEOframe-NewAge: documentation for probabilitiesbackward component
 
Data Structure and Algorithm chapter two, This material is for Data Structure...
Data Structure and Algorithm chapter two, This material is for Data Structure...Data Structure and Algorithm chapter two, This material is for Data Structure...
Data Structure and Algorithm chapter two, This material is for Data Structure...
 
JGrass-NewAge probabilities backward component
JGrass-NewAge probabilities backward component JGrass-NewAge probabilities backward component
JGrass-NewAge probabilities backward component
 
V01 i010401
V01 i010401V01 i010401
V01 i010401
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policies
 
Design Of A PI Rate Controller For Mitigating SIP Overload
Design Of A PI Rate Controller For Mitigating SIP OverloadDesign Of A PI Rate Controller For Mitigating SIP Overload
Design Of A PI Rate Controller For Mitigating SIP Overload
 
Time alignment techniques for experimental sensor data
Time alignment techniques for experimental sensor dataTime alignment techniques for experimental sensor data
Time alignment techniques for experimental sensor data
 
M017419499
M017419499M017419499
M017419499
 
solver (1)
solver (1)solver (1)
solver (1)
 
Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing SystemsLatency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems
 
02 physical.system.modelling mechanical.systems.
02 physical.system.modelling mechanical.systems.02 physical.system.modelling mechanical.systems.
02 physical.system.modelling mechanical.systems.
 
1 Aminullah Assagaf_Estimation-of-domain-of-attraction-for-the-fract_2021_Non...
1 Aminullah Assagaf_Estimation-of-domain-of-attraction-for-the-fract_2021_Non...1 Aminullah Assagaf_Estimation-of-domain-of-attraction-for-the-fract_2021_Non...
1 Aminullah Assagaf_Estimation-of-domain-of-attraction-for-the-fract_2021_Non...
 
Dynamic Kohonen Network for Representing Changes in Inputs
Dynamic Kohonen Network for Representing Changes in InputsDynamic Kohonen Network for Representing Changes in Inputs
Dynamic Kohonen Network for Representing Changes in Inputs
 
JGrass-NewAge probabilities forward component
JGrass-NewAge probabilities forward component JGrass-NewAge probabilities forward component
JGrass-NewAge probabilities forward component
 

More from NECST Lab @ Politecnico di Milano

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
NECST Lab @ Politecnico di Milano
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
NECST Lab @ Politecnico di Milano
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
NECST Lab @ Politecnico di Milano
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
NECST Lab @ Politecnico di Milano
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
NECST Lab @ Politecnico di Milano
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
NECST Lab @ Politecnico di Milano
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
NECST Lab @ Politecnico di Milano
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
NECST Lab @ Politecnico di Milano
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
NECST Lab @ Politecnico di Milano
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
NECST Lab @ Politecnico di Milano
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
NECST Lab @ Politecnico di Milano
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
NECST Lab @ Politecnico di Milano
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
NECST Lab @ Politecnico di Milano
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
NECST Lab @ Politecnico di Milano
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
NECST Lab @ Politecnico di Milano
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
NECST Lab @ Politecnico di Milano
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
NECST Lab @ Politecnico di Milano
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
NECST Lab @ Politecnico di Milano
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
NECST Lab @ Politecnico di Milano
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
NECST Lab @ Politecnico di Milano
 

More from NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Recently uploaded

一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
zwunae
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 

Recently uploaded (20)

一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 

Self-adaptive container monitoring with performance-aware Load-Shedding policies

  • 1. 1 Self-adaptive container monitoring with performance-aware load-shedding policies NECST Group Conference 2017 @ Oracle Labs 07/06/2017 Rolando Brondolin rolando.brondolin@polimi.it DEIB, Politecnico di Milano
  • 2. Cloud trends • 2017 State of the cloud [1]: – 79% of workloads run in cloud (41% public, 38% private) – Operations focusing on: • moving more workloads to cloud • existing cloud usage optimization (cost reduction) 2 • Nowadays Docker is becoming the de-facto standard for Cloud deployments – lightweight abstraction on system resources – fast deployment, management and maintenance – large deployments and automatic orchestration [1] Cloud Computing Trends: 2017 State of the Cloud Survey, Kim Weins, Rightscale
  • 4. Infrastructure monitoring (1) • Container complexity demands strong monitoring capabilities – Systematic approach for monitoring and troubleshooting – Tradeoff on data granularity and resource consumption 4 #requests/s heap size CPU usage Q(t) λ(t) μ(t) #store/s #load/s high visibility on system state non negligible cost few information on system state cheap monitoring VS
  • 5. • Container complexity demands strong monitoring capabilities – Systematic approach for monitoring and troubleshooting – Tradeoff on data granularity and resource consumption few information on system state cheap monitoring high visibility on system state non negligible cost Infrastructure monitoring (2) 5 #requests/s heap size CPU usage Q(t) λ(t) μ(t) #store/s #load/s VS High data granularity Good data granularity High data granularity Code instrumentation Code instrumentation No instrumentation Low metrics rate High metrics rate High metrics rate
  • 6. Sysdig Cloud monitoring 6 http://www.sysdig.org • Infrastructure for container monitoring • Collects aggregated metrics and shows system state: – “Drill-down” from cluster to single application metrics – Dynamic network topology – Alerting and anomaly detection • Monitoring agent deployed on each machine in the cluster – Traces system calls in a “streaming fashion” – Aggregates data for Threads, FDs, applications, containers and hosts
  • 7. IssuesEffectCause Problem definition • The Sysdig Cloud agent can be modelled as a server with a finite queue • characterized by its arrival rate λ(t) and its service rate μ(t) • Subject to overloading conditions 7 Events arrives at really high frequency Queues grow indefinitely High usage of system resources Uncontrolled 
 loss of events S λ(t) φ(t) μ(t) Λ Φ Q S φ(t) μ(t) Φ Q of a streaming system with queue, processing element and streaming output flow . A server S, fed by a queue Q, is in overloading eater than the service rate µ(t). The stability condition stated he necessary and sufficient condition to avoid overloading. A ncing overloading should discard part of the input to increase to match the arrival rate (t). µ(t)  (t) (2.1) rmalizing is twofold, as we are interested not only in controlling t also in maximizing the accuracy of the estimated metrics. To which represents the input flow at a given time t; and ˜x, which ut flow considered in case of overloading at the same time t. If Output quality degradation
  • 9. Utilization-based Load Manager The system in Figure 1 can be modeled by means of Queuing Theory: the application is a single server node fed by a queue, which provides the input jobs at a variable arrival rate (t); the application is able to serve jobs at a service rate µ(t). The system measures (t) and µ(t) in events per second, where the events are respectively the input tweets and the serviced tweets. Starting from this, the simplest way to model the system behavior is by means of the Little’s law (1), which states that the number of jobs inside a system is equal to the input arrival rate times the system response time: N(t) = (t) · R(t) (1) Q(t) = Q(t 1) + (t) µ(t) (2) U(t) = (t) µmax + Q(t) µmax (3) Q(t) = µmax · U(t) (t) (4) e(t) = U(t) U(t 1) (5) he system in Figure 1 can be modeled by means of euing Theory: the application is a single server node fed a queue, which provides the input jobs at a variable arrival (t); the application is able to serve jobs at a service µ(t). The system measures (t) and µ(t) in events per ond, where the events are respectively the input tweets and serviced tweets. tarting from this, the simplest way to model the system avior is by means of the Little’s law (1), which states that number of jobs inside a system is equal to the input arrival times the system response time: N(t) = (t) · R(t) (1) Q(t) = Q(t 1) + (t) µ(t) (2) U(t) = (t) µmax + Q(t) µmax (3) Q(t) = µmax · U(t) (t) (4) e(t) = U(t) U(t 1) (5) S: Control error: 4.3. Policy wrapper and equation (4.13). This leads to the final formulation of the Loa (4.14), where the throughput at time t + 1 is a function of th the maximum available throughput times the feedback error. e(t) = U(t) ¯U µ(t + 1) = (t) + µmax · e(t) The Load Manager formulation just obtained is compose the one hand, when the contribution of the feedback error e( Requested throughput: 4.3. Policy wrapper and L equation (4.13). This leads to the final formulation of the Load (4.14), where the throughput at time t + 1 is a function of the the maximum available throughput times the feedback error. e(t) = U(t) ¯U µ(t + 1) = (t) + µmax · e(t) The Load Manager formulation just obtained is composed the one hand, when the contribution of the feedback error e(t condition of equation (4.15) is met; on the other hand, the secon The system can be characterized by its utilization and its queue size Load Manager 9 Metrics • The Load Manager computes the throughput μ(t) that ensures stability such that: we analyze the formulation for the Load Manager’s actuation µ(t+1) just obtained, ice that it is a sum of two different contributions. On the one hand, as the error e(t) to zero, the stability condition (4.7) is met. On the other hand, the contribution: (t) ensures a fast actuation in case of a significant deviation from the actual system rium. (t)  µ(t) (4.7) course, during the lifetime of the system, the arrival rate (t) can vary unpre- ly and can be greater than the system capacity µc(t), defined as the rate of events ted per second. Given the control action µ(t) (i.e., the throughput of the system) e system capacity, we can define µd(t) as the dropping rate of the LS. As we did ), we can estimate the current system capacity as the number of events analyzed last time period. Thus, for a given time t, equation (4.8) shows that the service the sum of the system capacity estimated and the number of events that we need p to achieve the required stability: µ(t) = µc(t 1) + µd(t) (4.8) Utilization s section we describe the Utilization based Load Manager, which becomes of use e of streaming applications which should operate with a limited overhead. The tion based Load Manager, which is showed in Figure 4.4, resorts to queuing theory CPU utilization Arrived events Residual events Current utilization Target utilization Arrival rate Max theoretical throughput Control errorThe requested throughput is used by the load shedding policies to derive the LS probabilities
  • 10. Policy wrapper and policies • The policy wrapper provides access to statistics of processes, the requested throughput μ(t+1) and the system capacity μc(t) 10 Fair policy • Assign to each process the “same" number 
 of events • Save metrics of small processes, still accurate results on big ones Priority-based policy • Assign a static priority to each process • Compute a weighted priority to partition the system capacity • Assign a partition to each process and compute the probabilities Metrics Baseline policy • Compute one LS probability for all processes (with μ(t+1) and
  • 11. Load Shedding Filter • The Load Shedding Filter applies the probabilities 
 computed by the policies to the input stream • For each event: • Look for load shedding probability depending on input class • If no data is found we can drop the event • Otherwise, apply the Load Shedding probability computed by the policy • The dropped events are reported to the application for metrics correction 11 Metrics Load Shedding Filter Shedding Plan event buffers ok drop probability Event Capture ko
  • 12. • We evaluated FFWD within Sysdig with 2 goals: • System stability (slide 13) • Output quality (slides 14 15 16 17) • Results compared with the reference filtering system of Sysdig • Evaluation setup • 2x Xeon E5-2650 v3, 
 20 cores (40 w/HT) @ 2.3Ghz • 128 GB DDR4 RAM • Test selected from Phoronix test suite Experimental setup 12 test ID name priority # evts/s A nginx 3 800K B postmark 4 1,2M C fio 4 1,3M D simplefile 2 1,5M E apache 2 1,9M test ID instances # evts/s F 3x nginx, 1x fio 1,3M G 1x nginx, 1x simplefile 1,3M H 1x apache, 2x postmark, 1x fio 1,8M Homogeneous benchmarks Heterogeneous benchmarks Syscall intensive benchmarks from Phoronix test suite
  • 13. System stability 13 • We evaluated the Load Manager with all the tests (A, B, C, D, E, F, G) • With 3 different set points (Ut 1.0%, 1.1%, 1.2% w.r.t. system capacity) • Measuring the CPU load of the sysdig agent with: • reference implementation • FFWD with fair and priority policy • We compared the actual CPU load
 with the QoS requirement (Ut) • Error measured with MAPE (lower 
 is better) obtained running 20 times 
 each benchmark • 3.51x average MAPE improvement,
 average MAPE below 5% Test Ut = 1.1% reference fair priority A 7,12% 1,78% 3,78% B 34,06% 4,37% 4,46% C 28,03% 2,27% 2,24% D 11,52% 1,41% 1,54% E 26,02% 8,51% 8,99% F 22,67% 8,11% 3,74% G 16,42% 3,37% 2,73% H 19,92% 8,41% 8,01%
  • 14. Output quality - heterogeneous • We tried to mix the homogeneous tests • simulate co-located environment • add OS scheduling uncertainty and noise • QoS requirement Ut 1.1% • MAPE (lower is better) between exact and approximated metrics • Compare metrics from reference, FFWD fair, FFWD priority • Three tests with different syscall mix: • Network based mid-throughput: 1x Fio, 3x Nginx, 1.3M evt/s • Mixed mid-throughput: 1x Simplefile, 1x Nginx, 1.3M evt/s • Mixed high-throughput: 1x Apache, 1x Fio, 2x Postmark, 1.8M evt/s 14
  • 15. 1x Fio, 3x Nginx, 1.3M evt/s 15 0.1 1 10 100 1000 10000 100000 latency-filelatency-net latency-filelatency-net latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority nginx-3nginx-2nginx-1fio 0.1 1 10 100 1000 10000 100000 latency-filelatency-net latency-filelatency-net latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority nginx-3nginx-2nginx-1fio 0.1 1 10 100 1000 10000 100000 latency-filelatency-net latency-filelatency-net latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority nginx-3nginx-2nginx-1fio 0.1 1 10 100 1000 10000 100000 latency-filelatency-net latency-filelatency-net latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority nginx-3nginx-2nginx-1fio reference 0.1 1 10 100 1000 10000 100000 volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority nginx-3nginx-2nginx-1fio 0.1 1 10 100 1000 10000 100000 volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority nginx-3nginx-2nginx-1fio 0.1 1 10 100 1000 10000 100000 volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority nginx-3nginx-2nginx-1fio 0.1 1 10 100 1000 10000 100000 volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority nginx-3nginx-2nginx-1fio referenceVolume metrics (byte r/w) Latency metrics MAPE lower is better
  • 16. 1x Apache, 1x Fio, 2x Postmark, 1.8M evt/s 16 0.1 1 10 100 1000 10000 100000 volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache 0.1 1 10 100 1000 10000 100000 volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache 0.1 1 10 100 1000 10000 100000 volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache 0.1 1 10 100 1000 10000 100000 volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache reference 0.1 1 10 100 1000 10000 100000 latency-filelatency-net latency-filelatency-net latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache 0.1 1 10 100 1000 10000 100000 latency-filelatency-net latency-filelatency-net latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache 0.1 1 10 100 1000 10000 100000 latency-filelatency-net latency-filelatency-net latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache 0.1 1 10 100 1000 10000 100000 latency-filelatency-net latency-filelatency-net latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache reference Volume metrics (byte r/w) Latency metrics MAPE lower is better
  • 17. 17 0.1 1 10 100 1000 10000 100000 volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache 0.1 1 10 100 1000 10000 100000 volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache 0.1 1 10 100 1000 10000 100000 volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache 0.1 1 10 100 1000 10000 100000 volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache reference 0.1 1 10 100 1000 10000 100000 latency-filelatency-net latency-filelatency-net latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache 0.1 1 10 100 1000 10000 100000 latency-filelatency-net latency-filelatency-net latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache 0.1 1 10 100 1000 10000 100000 latency-filelatency-net latency-filelatency-net latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache 0.1 1 10 100 1000 10000 100000 latency-filelatency-net latency-filelatency-net latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority postmark-2postmark-1fioapache reference Volume metrics (byte r/w) Latency metrics MAPE lower is better Test H, mixed workloads: 1x apache, 1x fio, 2x postmark, 1.8M evt/s • Fair policy outperforms reference in almost all cases • the LS Filter works at the single event level • reference drops events in batches • Priority policy improves the Fair policy results in most cases • the prioritized processes are privileged • other processes treated as “best-effort” 1x Apache, 1x Fio, 2x Postmark, 1.8M evt/s
  • 18. Conclusion • We saw the main challenges of Load Shedding for container monitoring – Low overhead monitoring – High quality and granularity of metrics • Fast Forward With Degradation (FFWD) – Heuristic controller for bounded CPU usage – Pluggable policies for domain-specific load shedding – Accurate computation of output metrics – Load Shedding Filter for fast drop of events 18
  • 19. 19 Questions? Rolando Brondolin, rolando.brondolin@polimi.it DEIB, Politecnico di Milano NGC VIII 2017 @ SF FFWD: Latency-aware event stream processing via domain-specific load-shedding policies. R. Brondolin, M. Ferroni, M. D. Santambrogio. In Proceedings of 14th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC 2016)
  • 20. 20
  • 22. Output quality - homogeneous • QoS requirement Ut 1.1%, standard set-point for the agent • MAPE (lower is better) between exact and approximated metrics • Output metrics on latency and volume for file and network operations • Similar or better results of FFWD fair policy w.r.t reference • FFWD accurate even if drops more events • Predictable and repetitive behavior of nginx, fio and apache 22 1 10 100 1000 latency-file latency-net volum e-file volum e-net MAPE(%)log kernel-drop fair 1 10 100 1000 latency-file latency-net volum e-file volum e-net MAPE(%)log kernel-drop fair 1 10 100 1000 latency-file latency-net volum e-file volum e-net MAPE(%)log kernel-drop fair 1 10 100 1000 latency-file latency-net volum e-file volum e-net MAPE(%)log kernel-drop fair reference 1 10 100 1000 latency-file latency-net volum e-file volum e-net MAPE(%)log kernel-drop fair apache 1.9M evt/s postmark 1.2M evt/s simplefile 1.5M evt/s fio 1.3M evt/s nginx 800K evt/s
  • 23. 1x simplefile, 1x nginx, 1.3M evt/s 23 1 10 100 1000 latency-filelatency-net latency-filelatency-net MAPE(%)log kernel-drop fair priority nginxsimplefile reference 1 10 100 1000 volum e-filevolum e-net volum e-filevolum e-net MAPE(%)log kernel-drop fair priority nginx-1simplefile reference Volume metrics (byte r/w)Latency metrics MAPE lower is better
  • 24. Response time Load Manager 24 S: (Little’s Law) (Jobs in the system) The system can be characterized by its response time and the jobs in the system Control error: Requested throughput: The requested throughput is used by the load shedding policies to derive the LS probabilities
  • 27. Case studies 27 System monitoring [2] • Goal: Distributed monitoring of systems and applications w/syscalls • Constraint: CPU utilization • Based on: Sysdig monitoring agent • Output: aggregated performance metrics for applications, containers, hosts • FFWD ensures low CPU overhead • policies based on processes in the system [1] http://nlp.stanford.edu [2] http://www.sysdig.org Sentiment analysis [1] • Goal: perform real-time analysis on tweets
  • 28. Case studies 28 System monitoring [2] • Goal: Distributed monitoring of systems [1] http://nlp.stanford.edu [2] http://www.sysdig.org Sentiment analysis [1] • Goal: perform real-time analysis on tweets • Constraint: Latency • Based on: Stanford NLP toolkit • Output: aggregated sentiment score for each keyword and hashtag • FFWD maintains limited the response time • policies on tweet keyword and #hashtag
  • 29. Real-time sentiment analysis 29 • Real-time sentiment analysis allows to: – Track the sentiment of a topic over time – Correlate real world events and related sentiment, e.g. • Toyota crisis (2010) [1] • 2012 US Presidential Election Cycle [2] – Track online evolution of companies reputation, derive social profiling and allow enhanced social marketing strategies [1] Bifet Figuerol, Albert Carles, et al. "Detecting sentiment change in Twitter streaming data." Journal of Machine Learning Research: Workshop and Conference Proceedings Series. 2011. [2] Wang, Hao, et al. "A system for real-time twitter sentiment analysis of 2012 us presidential election cycle." Proceedings of the ACL 2012 System Demonstrations.
  • 30. Sentiment analysis: case study 30 • Simple Twitter streaming sentiment analyzer with Stanford NLP • System components: – Event producer – RabbitMQ queue – Event consumer • Consumer components: – Event Capture – Sentiment Analyzer – Sentiment Aggregator • Real-time queue consumption, aggregated metrics emission each second (keywords and hashtag sentiment)
  • 31. FFWD: Sentiment analysis 31 • FFWD adds four components: – Load shedding filter at the beginning of the pipeline – Shedding plan used by the filter – Domain-specific policy wrapper – Application controller manager to detect load peaks Producer Load Shedding Filter Event Capture Sentiment Analyzer Sentiment Aggregator Policy Wrapper Load Manager Shedding Plan real-time queue batch queue ok ko ko count account metrics R(t) stream statsupdated plan μ(t+1) event output metricsinput tweets drop probability Component Data structure Internal information flow External information flow Queue analyze event λ(t) Rt
  • 32. Sentiment - experimental setup 32 • Separate tests to understand FFWD behavior: – System stability – Output quality • Dataset: 900K tweets of 35th week of Premier League • Performed tests: – Controller: synthetic and real tweets at various λ(t) – Policy: real tweets at various λ(t) • Evaluation setup – Intel core i7 3770, 4 cores @ 3.4 Ghz + HT, 8MB LLC – 8 GB RAM @ 1600 Mhz
  • 34. Load Manager showcase (1) • Load Manager demo (Rt = 5s): – λ(t) increased after 60s and 240s – response time: 34 0 1 2 3 4 5 6 7 0 50 100 150 200 250 300 Responsetime(s) time (s) Controller performance QoS = 5s R
  • 35. Load Manager showcase (2) • Load Manager demo (Rt = 5s): – λ(t) increased after 60s and 240s – throughput: 35 0 100 200 300 400 500 0 50 100 150 200 250 300 #Events time (s) Actuation lambda dropped computed mu
  • 36. Output Quality 36 • Real tweets, μc(t) ≃ 40 evt/s • Evaluated policies: • Baseline • Fair • Priority • R = 5s, λ(t) = 100 evt/s, 200 evt/s, 400 evt/s • Error metric: Mean Absolute Percentage Error (MAPE %) (lower is better) 0 10 20 30 40 50 A B C D MAPE(%) Groups baseline_error fair_error priority_error λ(t) = 100 evt/s 0 10 20 30 40 50 A B C D MAPE(%) Groups baseline_error fair_error priority_error λ(t) = 200 evt/s 0 10 20 30 40 50 A B C D MAPE(%) Groups baseline_error fair_error priority_error λ(t) = 400 evt/s