MIST: Towards Large-Scale
IoT Stream Processing
이계원, 엄태건
Joint work with 전병곤, 조성우, 김경태, 이정길, 이산하
1
Many IoT devices
heartbeat		
location	
temperature
humidity	
….
Continuous	Data Streams
Various	Places
Icons made by Freepik, Icon Pond, Roundicons from www.flaticon.com is licensed by CC 3.0 BY
* : IoT stream query
Temperature data stream
Q1 Q2Adjust the air cond.
cooling temperature
Adjust the fan speed
of the electric fan
* : IoT stream query
Temperature data stream
Q1 Q2Adjust the air cond.
cooling temperature
Adjust the fan speed
of the electric fan
● Long-running
● Small data	streams
● Large numbers		
● Various types
Our	Scope	
IoT	Stream	Processing	System
* : IoT stream query
Our	Scope	
IoT	Stream	Processing	System
* : IoT stream query
Focus	of	this	work
:	How	to	handle	efficiently billions	of	
IoT	stream queries	in	a	cluster	of	
machines?
Current	Stream	Processing	Systems
● Optimized	for	handling	a	small number	of	big stream	
queries
MIST
User	&	Application
Building Manager (User)
Building Management Application
(Android, iOS, Web, ...)
MIST
“I want to monitor a room!”
User	&	Application
Building Manager (User)
Building Management Application
(Android, iOS, Web, ...)
MIST
“I want to monitor a room!”
“OK… I will submit the
necessary query for you
using MIST API”
User	&	Application
Building Manager (User)
Building Management Application
(Android, iOS, Web, ...)
MIST
“I want to monitor a room!”
“OK… I will submit the
necessary query for you
using MIST API”
“I will give you notifications
when something happens!”
MIST	Architecture Cluster
of
machines
MIST
Processing
Engine
MIST
Master
MIST
Processing
Engine
MIST
Processing
Engine
Query
Submit
(DAG,
CEP, …)U
s
e
r
App /
Client
App /
Client
MIST	Architecture Cluster
of
machines
MIST
Processing
Engine
MIST
Master
MIST
Processing
Engine
MIST
Processing
Engine
U
s
e
r
1. A query is submitted to
MIST Master
MIST	Architecture Cluster
of
machines
MIST
Processing
Engine
MIST
Master
MIST
Processing
Engine
MIST
Processing
Engine
2. MIST master assigns the
query to a MIST processing
engine
U
s
e
r
App /
Client
MIST	Architecture Cluster
of
machines
MIST
Processing
Engine
MIST
Master
MIST
Processing
Engine
MIST
Processing
Engine
U
s
e
r
App /
Client
3. Many IoT stream queries
are processed in a cluster of
machines
MIST	Architecture Cluster
of
machines
MIST
Processing
Engine
MIST
Master
MIST
Processing
Engine
MIST
Processing
Engine
Query
Submit
(DAG,
CEP, …)
App /
Client
U
s
e
r
MIST	Front-end
MIST	Query	API
●MIST provides query	API	for	application	developers
○ Implemented	in	Java	8
○ Support	UDFs	(User-Defined	Functions)	in	the	form	of	Java	
lambda	function
■ Ex)	Map,	Filter,	…
■ Provide	more	flexible	programming	model	than	SQL
MIST	Query	API
●MIST	supports	two	types	of	query	APIs
○ Dataflow	Model
■ Support	low-level	query	construction	using	UDF
○ Complex	Event	Processing	(CEP)
■ Support	high-level	pattern	detection
MIST	Dataflow	Query	Example
●Simple	Noise	Sensing	Query
Noise sensors
inside the
building MQTT Broker
Building
manager
MIST
MQTT
Publish
MQTT
Subscribe
(Noti)
MQTT
Pub/Sub
How	to	Define	and	Submit	a	IoT	Stream	Query?
●Configure	the	input	stream	source
●Define	operations	on	how	the	input	events	are	
transformed
●Configure	the	output	sink
●Submit	the	query	to	the	MIST	master
MIST	Dataflow	Query	Example
public static void main(final String args[]) {
final SourceConfiguration localMQTTSourceConf =
MQTTSourceConfiguration.newBuilder()
.setTopic("snu/building302/room420/noisesensor")
.setBroker("tcp://mqtt_broker_address:1883")
.build();
... Configure MQTT source
MIST	Dataflow	Query	Example
final MISTQueryBuilder queryBuilder =
new MISTQueryBuilder("room_noise_sensing");
final ContinuousStream<Integer> sensedData =
queryBuilder.mqttStream(mqttSourceConf)
.map((mqttMessage) -> new
String(mqttMessage.getPayload())))
.map(stringData ->
Integer.parseInt(stringData));
Set application name
final MISTQueryBuilder queryBuilder =
new MISTQueryBuilder("room_noise_sensing");
final ContinuousStream<Integer> sensedData =
queryBuilder.mqttStream(mqttSourceConf)
.map((mqttMessage) -> new
String(mqttMessage.getPayload())))
.map(stringData ->
Integer.parseInt(stringData));
MIST	Dataflow	Query	Example
Get data from MQTT source
final MISTQueryBuilder queryBuilder =
new MISTQueryBuilder("room_noise_sensing");
final ContinuousStream<Integer> sensedData =
queryBuilder.mqttStream(mqttSourceConf)
.map((mqttMessage) -> new
String(mqttMessage.getPayload())))
.map(stringData ->
Integer.parseInt(stringData));
MIST	Dataflow	Query	Example map() transforms the incoming
MQTT message into integer value
MIST	Dataflow	Query	Example
sensedData
.filter(value -> value < 200)
.map(value -> new MqttMessage("Noisy".getBytes()))
.mqttOutput("tcp://mqtt_broker_address:1883",
"snu/building302/room420/monitor")
final MISTQuery query = queryBuilder.build();
Notify if the room is noisy
MIST	Dataflow	Query	Example
sensedData
.filter(value -> value < 200)
.map(value -> new MqttMessage("Noisy".getBytes()))
.mqttOutput("tcp://mqtt_broker_address:1883",
"snu/building302/room420/monitor")
final MISTQuery query = queryBuilder.build();
Send the notification via MQTT
MIST	Dataflow	Query	Example
sensedData
.filter(value -> value < 200)
.map(value -> new MqttMessage("Noisy".getBytes()))
.mqttOutput("tcp://mqtt_broker_address:1883",
"snu/building302/room420/monitor")
final MISTQuery query = queryBuilder.build();
Build the query
MIST	Dataflow	Query	Example
final MISTExecutionEnvironment executionEnvironment
= new MISTDefaultExecutionEnvironmentImpl(
"mist_master_address", mistPort);
final QueryControlResult result =
executionEnvironment.submit(query, jarPath);
System.out.println(result);
}
Submit the query
to MIST Master
Demo:	Noise	Sensing	Query
MIST	CEP	Query
●Complex	Event	Processing	enables	higher-level	pattern	
detection	on	stream	data
●CEP	query	consists	of
○ Event	Pattern	which	meets	Qualification
○ Action
●MIST	transforms	high-level	CEP	queries	into	DAG	before	
running	them
MIST	CEP	Query	Example
●Find	a	sequence	of	heart	rates
○ Higher	than	the	normal	upper	heart	rate	limit	designated	by	a	
doctor
○ Showing	ascending	pattern
○ In	recent	5	minutes
●Notify	through	MQTT when	finding	the	abnormal	pattern
MIST	CEP	Query	Example
final MISTCepQuery<CepHRClass> cepQuery = new
MISTCepQuery.builder<CepHRClass>("bpm_monitor")
.input(mqttInput)
.setEventSequence(eventD, eventP)
.setQualifier( … )
.within(300000)
.setAction(mqttNotify)
.build();
Demo:	CEP	Abnormal	Heart	Rate	Detection
MIST	Back-end
MIST	Architecture	Revisited Cluster
of
machines
MIST
Processing
Engine
MIST
Master
MIST
Processing
Engine
MIST
Processing
Engine
App/
Client
Query
Submit
(DAG,
CEP, …)U
s
e
r
MIST	in	a	single	machine
MIST
Processing
Engine
MIST
Master
MIST
Processing
Engine
MIST
Processing
Engine
Cluster
of
machines
App/
Client
Query
Submit
(DAG,
CEP, …)U
s
e
r
How to process many IoT queries in a
single machine?
MIST	in	a	single	machine
MIST
Processing
Engine
MIST
Master
MIST
Processing
Engine
MIST
Processing
Engine
Cluster
of
machines
App/
Client
Query
Submit
(DAG,
CEP, …)U
s
e
r
How to process many
IoT queries in a
cluster of machines?
(In progress)
MIST	in	a	Single	Machine
Design	Principle:
Reuse	system	resources	as	much	as	possible!	
1. Code	sharing
2. Exploit	the	locality	of	code	references
3. Query	merging
Design	Principle:
Reuse	system	resources	as	much	as	possible!	
1. Code	sharing
2. Exploit	the	locality	of	code	references
3. Query	merging
1.Code	sharing
src map sink
User-Defined Function
(temp) -> {
if (temp > threshold) {
return "action:speed:fast";
} ...
}Query 1
Compiled
code
Room A
1.Code	sharing
src map sink
User-Defined Function
(temp) -> {
if (temp > threshold) {
return "action:speed:fast";
} ...
}Query 1
Compiled
code
src map sink
User-Defined Function
(temp) -> {
if (temp > threshold) {
return "action:speed:fast";
} ...
}
Query 2
Compiled
code
Bad!
Room A
Room B
1.Code	sharing
User-Defined Function
(temp) -> {
if (temp > threshold) {
return "action:speed:fast";
} ...
}
Compiled
code
User-Defined Function
(temp) -> {
if (temp > threshold) {
return "action:speed:fast";
} ...
}
Code	sharing
⇒ Reduce	working	set	
size	of	code	references	
Query1
Query2
Great!
Design	Principle:
Reuse	system	resources	as	much	as	possible!	
1. Code	sharing
2. Exploit	the	locality	of	code	references
3. Query	merging
Instruction cache
(size = 2)
e
1
e
2
e
3
e
4
e
5
e
6
e
7
e
8
Event queuee
9
UDF1 UDF2 UDF3
Query1 Query2 Query3 Query4Query5 Query6 Query7 Query8 Query9
Instruction cache
(size = 2)
e
1
e
2
e
3
e
4
e
5
e
6
e
7
e
8
Event queuee
9
UDF1 UDF2 UDF3
Query1 Query2 Query3 Query4Query5 Query6 Query7 Query8 Query9
UDF1
UDF2
UDF3
Bad!
Frequent cache misses!
Instruction cache
(size = 2)
e
1
e
2
e
3
e
4
e
5
e
6
e
7
e
8
Event queuee
9
UDF1 UDF2 UDF3
Query1 Query2 Query3 Query4Query5 Query6 Query7 Query8 Query9
UDF1
UDF2
Cache misses 9 ⇒ 3!
Great!
Instruction cache
(size = 2)
e
1
e
2
e
3
e
4
e
5
e
6
e
7
e
8
Event queuee
9
UDF1 UDF2 UDF3
Query1 Query2 Query3 Query4Query5 Query6 Query7 Query8 Query9
UDF1
UDF2
Cache misses 9 ⇒ 3!
Great!
How	to	realize	this	event	
processing	mechanism?
Exploit	the	locality	of	code	references:	
Group-Aware	Execution	Model	
● Fixed	Number	of	Threads
● Query	Grouping
● Group	Assignment	
● Group	Reassignment
Exploit	the	locality	of	code	references:	
Fixed	number	of	threads	(1/4)	
Thread 1 Thread 2
Exploit	the	locality	of	code	references:	
Query	Grouping	(2/4)	
Query1 Query4 Query7
UDF1
e
1
e
4
e
7
Query2 Query5 Query8
e
2
e
5
e
8
UDF2
Thread 1 Thread 2
Query1 Query4 Query7
UDF1
e
1
e
4
e
7
Exploit	the	locality	of	code	references:	
Group	Assignment	(3/4)	
Query2 Query5 Query8
e
2
e
5
e
8
UDF2
Query3 Query6 Query9
e
3
e
6
e
9
UDF3
Thread 1 Thread 2
Query1 Query4 Query7
UDF1
e
1
e
4
e
7
Exploit	the	locality	of	code	references:	
Group	Assignment	(3/4)	
Query2 Query5 Query8
e
2
e
5
e
8
UDF2
Query3 Query6 Query9
e
3
e
6
e
9
UDF3
Thread 1 Thread 2
UDF4
?
?
Exploit	the	locality	of	code	references:	
Group	Assignment	(3/4)	
λ:	Event	arrival	rate
μ:	Event	process	rate
Load	=	λ	/	μ
Query1 Query4 Query7
UDF1
e
1
e
4
e
7
Exploit	the	locality	of	code	references:	
Group	Assignment	(3/4)	
Query2 Query5 Query8
e
2
e
5
e
8
UDF2
Query3 Query6 Query9
e
3
e
6
e
9
UDF3
Thread 1 Thread 2
Load=0.3
Load=0.2
Load=0.2
Load=0.5 Load=0.2
Query1 Query4 Query7
UDF1
e
1
e
4
e
7
Exploit	the	locality	of	code	references:	
Group	Assignment	(3/4)	
Query2 Query5 Query8
e
2
e
5
e
8
UDF2
Query3 Query6 Query9
e
3
e
6
e
9
UDF3
Load=0.3
Load=0.2
Load=0.2
UDF4
Load=0.5 Load=0.2Thread 1 Thread 2
Query1 Query4 Query7
UDF1
e
1
e
4
e
7
Exploit	the	locality	of	code	references:	
Group	Reassignment	(4/4)	
Query2 Query5 Query8
e
2
e
5
e
8
UDF2
Query3 Query6 Query9
e
3
e
6
e
9
UDF3
Load=0.3
Load=0.2
Load=0.2
Load=0.5 Load=0.2Thread 1 Thread 2
Query1 Query4 Query7
UDF1
e
1
e
4
e
7
Exploit	the	locality	of	code	references:	
Group	Reassignment	(4/4)	
Query2 Query5 Query8
e
2
e
5
e
8
UDF2
Query3 Query6 Query9
e
3
e
6
e
9
UDF3
Load=0.7
Load=0.2
Load=0.2
Load=0.9
Load >= 0.9 ~ Overloaded
Load < 0.7 ~ Underloaded
Overloaded
Load=0.2Thread 1 Thread 2
Query1 Query4 Query7
UDF1
e
1
e
4
e
7
Exploit	the	locality	of	code	references:	
Group	Reassignment	(4/4)	
Query2 Query5 Query8
e
2
e
5
e
8
UDF2
Query3 Query6 Query9
e
3
e
6
e
9
UDF3
Load=0.7
Load=0.2
Load=0.2
Load=0.7 Load=0.4Thread 1 Thread 2
Design	Principle:
Reuse	system	resources	as	much	as	possible!	
1. Code	sharing
2. Exploit	the	locality	of	code	references
3. Query	merging
src map sink
Same UDF
Query 1
src map sink
Query 2
Query	Merging
Same UDF Bad!
src map sink
Same UDF
Query 1
src map sink
Query 2
Query	Merging
Same UDF
Process same data stream
src map sink
Same UDF
Query 1
src map sink
Query 2
Query	Merging
Same UDF
Have same operations
sink
Query 1
src map
sink
Query 2
Query	Merging
Same UDF
Merge two queries!
Great!
Single Machine Evaluation
Evaluation	Environment
● Environment:	28-core	NUMA	machine	(35M	
cache,	8x	16GB	RDIMM)
● Data	transfer	protocol:	MQTT	(A	lightweight	
messaging	protocol	for	IoT)
● Metrics:	Max.	#	of	queries	
● Baseline:	Flink	&	Thread-Per	Query	(TPQ)	
● #	of	queries	per	code:	100	
MQTT
Broker
(EMQ) MIST,
(Flink,
TPQ)
Data
Stream
Generator
Performance	Comparison
The	number	 of	queries	can	be	processed	with	<	10ms	latency
13.6x375x
3.18x87.5x
MIST	in	a	Cluster	of	
Machines	
(Ongoing	work)
Ongoing	work
●Distributed	Masters
○ Prevent	a	bottleneck,	No	single	point	of	failure
●Load	balancing	among	nodes
○ Query	Allocation,	Dynamic	Query	Migration
●Fault	tolerance
○ Checkpointing,	Upstream	backup
Summary
●Processes	a	large	number	of	IoT	stream	queries	efficiently
●Techniques	for	scaling	up	stream	processing
○ Code	sharing
○ Exploiting	the	locality	of	code	references
○ Query	merging	
●MIST	outperforms	375x	compared	to	Apache	Flink,	13.6x	
compared	to	TPQ	in	a	single	machine
We will make MIST as an open-source project
soon! We look forward contribution from many
developers!
Contact: mist@spl.snu.ac.kr
Software Platform Lab Site: http://spl.snu.ac.kr
MIST: Towards Large-Scale
IoT Stream Processing
이계원, 엄태건
Joint work with 전병곤, 조성우, 김경태, 이정길, 이산하
73

[232]mist 고성능 iot 스트림 처리 시스템