Edge Computing emerges as a stable and efficient solution for IoT data processing and analytics. With big data distributed engines to be deployed on edge infrastructures, users seek solutions to evaluate the performance of their analytics queries. In this paper, we introduce SparkEdgeEmu, an interactive framework designed for researchers and practitioners who need to inspect the performance of Spark analytic jobs without the edge topology setup burden. SparkEdgeEmu provides: (i) parameterizable template-based use cases for edge infrastructures, (ii) real-time emulated environments serving ready-to-use Spark clusters, (iii) a unified and interactive programming interface for the framework's execution and query submission, and (vi)~utilization metrics from the underlying emulated topology as well as performance and quantitative metrics from the deployed queries. We evaluate the usability of our framework in a smart city use case and extract useful performance hints for the Apache Spark code execution.
SparkEdgeEmu: An Emulation Framework for Edge-enabled Apache Spark Deployments
1. 29th International European Conference on Parallel and Distributed Computing (Euro-Par2023)
Moysis Symeonides∗, Demetris Trihinas†, George Pallis∗, Marios D. Dikaiakos∗
∗Department of Computer Science
University of Cyprus
{msymeo03, pallis, mdd}@ucy.ac.cy
†Department of Computer Science
University of Nicosia
trihinas.d@unic.ac.cy
An Emulation Framework for
Edge-enabled Apache Spark Deployments
2. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023
The Internet of Things
2
● Transforming the physical world into an information system.
● The predictions put the number of connected IoT devices up to 29.4
billion by 2030 and data generated from them to be 73.1 ZB by 2025.
Analytic Insights
cloud
● It only seems “natural” that IoT services offload analytic jobs to the cloud for processing.
https://www.idc.com/getdoc.jsp?containerId=prAP46737220, 2020
https://www.statista.com/statistics/1183457/iot-connected-devices-worldwide/, 2023
Data Analyst
3. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023
Apache Spark and Internet of Things
3
Apache Spark democratized the big data processing by hiding the complexity related to
machine communication, resource management, task scheduling, fault tolerance, etc…
● But… data analysis usually come with near real-time requirements and moving data
centrally for processing penalizes analytics timeliness.
cloud
Analytic Insights
Data Analyst
4. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023
Bringing Apache Spark to the Edge
4
Analytic Insights
cloud
The “Edge”
Deploying Apache Spark to the Edge and performing data processing in place or in
vicinity of data generating devices offers:
● Efficient processing
● Less network pressure
● Cost reduction
Data Analyst
5. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023
ARM 8 cores @2.2GHz
32GB RAM 512-core GPU
$695
Jetson Nano
Challenges of Edge-enabled Spark Deployments
5
5
ARM 4 cores@1.5GHz
4GB RAM
$54
Plethora of Devices and
Heterogeneity
ARM 4 cores @1.4GHz
4GB RAM + GPU No wifi
$69
AGX Xavier
Raspberry Pi 4
Scalable Geo-distributed
Deployments
Infrastructure and
System Monitoring
11. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023
Rausch, T., Lachner, C., Frangoudis, P. A., Raith, P., & Dustdar, S. Ether: Synthesizing Plausible Infrastructure Configurations for Evaluating Edge Computing Systems. In 3rd USENIX Workshop on
Hot Topics in Edge Computing (HotEdge 2020). USENIX Association
11
SparkEdgeEmu Overview
14. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 14
SparkEdgeEmu Overview
M. Symeonides, Z. Georgiou, D. Trihinas, G. Pallis and M. D. Dikaiakos, "Fogify: A Fog Computing Emulation Framework," IEEE/ACM Symposium on Edge Computing (SEC2020)
18. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 18
Modeling Abstractions
● Infrastructure components, that include:
○ Devices types that describe workers capabilities
including processor, memory, and disk.
○ Connections types that provide link characteristics
for uplink and downlink QoS, including data rate,
latency, error rate, etc.
● Use case
○ defines a scenario.
○ specifies use case parameters, like number of
devices, regions, etc., and the infrastructure
components.
19. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 19
Modeling Abstractions
● Infrastructure components, that include:
○ Devices types that describe workers capabilities
including processor, memory, and disk.
○ Connections types that provide link characteristics
for uplink and downlink QoS, including data rate,
latency, error rate, etc.
● Use case
○ defines a scenario.
○ specifies use case parameters, like number of
devices, regions, etc., and the infrastructure
components.
20. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 20
Modeling Abstractions
● Infrastructure components, that include:
○ Devices types that describe workers capabilities
including processor, memory, and disk.
○ Connections types that provide link characteristics
for uplink and downlink QoS, including data rate,
latency, error rate, etc.
● Use case
○ defines a scenario.
○ specifies use case parameters, like number of
devices, regions, etc., and the infrastructure
components.
21. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 21
Modeling Abstractions
● Infrastructure components, that include:
○ Devices types that describe workers capabilities
including processor, memory, and disk.
○ Connections types that provide link characteristics
for uplink and downlink QoS, including data rate,
latency, error rate, etc.
● Use case
○ defines a scenario.
○ specifies use case parameters, like number of
devices, regions, etc., and the infrastructure
components.
22. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 22
Interactive Programming Interface
SparkEdgeEmu offers a programming
interface that allows users to:
● load the modeling abstractions
● deploy the description and create an
emulated Edge-Spark testbed
● connect on Spark and submit queries
● retrieve metrics from infrastructure and
Spark cluster
23. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 23
Interactive Programming Interface
SparkEdgeEmu offers a programming
interface that allows users to:
● load the modeling abstractions
● deploy the description and create an
emulated Edge-Spark testbed
● connect on Spark and submit queries
● retrieve metrics from infrastructure and
Spark cluster
24. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 24
Interactive Programming Interface
SparkEdgeEmu offers a programming
interface that allows users to:
● load the modeling abstractions
● deploy the description and create an
emulated Edge-Spark testbed
● connect on Spark and submit queries
● retrieve metrics from infrastructure and
Spark cluster
25. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 25
Interactive Programming Interface
SparkEdgeEmu offers a programming
interface that allows users to:
● load the modeling abstractions
● deploy the description and create an
emulated Edge-Spark testbed
● connect on Spark and submit queries
● retrieve metrics from infrastructure and
Spark cluster
26. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 26
Interactive Programming Interface
SparkEdgeEmu offers a programming
interface that allows users to:
● load the modeling abstractions
● deploy the description and create an
emulated Edge-Spark testbed
● connect on Spark and submit queries
● retrieve metrics from infrastructure and
Spark cluster
27. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 27
Interactive Programming Interface
SparkEdgeEmu offers a programming
interface that allows users to:
● load the modeling abstractions
● deploy the description and create an
emulated Edge-Spark testbed
● connect on Spark and submit queries
● retrieve metrics from infrastructure and
Spark cluster
28. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 28
Infrastructure Use Case Generator
Ether is a framework for synthesizing plausible edge
infrastructure use cases configurations, grounded on
empirical data, like:
● Smart Cities Scenarios
● Industrial IoT deployment
● Mobile Edge clouds and vehicular networks, etc.
Ether creates an in-memory network with all information about the use case annotated with
resource and network QoS properties.
Rausch, T., Lachner, C., Frangoudis, P. A., Raith, P., & Dustdar, S. Ether: Synthesizing Plausible Infrastructure Configurations for Evaluating Edge Computing Systems. In 3rd USENIX Workshop on
Hot Topics in Edge Computing (HotEdge 2020). USENIX Association
29. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 29
Use Case to Emulation Model
SparkEdgeEmu generates the
execution configuration files by
traversing and translating the
network’s structure.
…
worker-nuc-0
image: emulator/spark-worker:3.0.0
environment:
- SPARK_WORKER_CORES=4
- SPARK_WORKER_MEMORY=8G
- SPARK_MASTER_HOST=spark-master
…
volumes:
- /data:{data_folder}
…
…
- from_node: nuc-0
to_node: cloudlet-vm-0
network: edge_net
properties:
bandwidth: 125Mbps
latency:
delay: 24.45ms
…
Typically, an Edge emulation framework forces users to define every emulated node, pairwise
connections, Spark configurations, etc…
A topology with 20 nodes generates a
YAML configuration file with more
than 100 configuration blocks!!!
…
- name: nuc
capabilities:
processor:
clock_speed: 2400
cores: 4
memory: 8G
disk: …
…
- label: nuc-0
networks:
- edge_net
node: nuc
replicas: 1
service: worker-nuc-0
…
…
- from_node: cloudlet-vm-0
to_node: nuc-0
network: edge_net
properties:
bandwidth: 125Mbps
latency:
delay: 24.45ms
…
30. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 30
Emulation & Deployment
M. Symeonides, Z. Georgiou, D. Trihinas, G. Pallis and M. D. Dikaiakos, "Fogify: A Fog Computing Emulation Framework," IEEE/ACM Symposium on Edge Computing (SEC2020)
SparkEdgeEmu utilizes Fogify as a scalable
underlying emulator since it provides:
● Resource Heterogeneity
● Network Link Heterogeneity
● Any-scale Experimentation
● Monitoring Capabilities
31. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 31
Emulation & Deployment
M. Symeonides, Z. Georgiou, D. Trihinas, G. Pallis and M. D. Dikaiakos, "Fogify: A Fog Computing Emulation Framework," IEEE/ACM Symposium on Edge Computing (SEC2020)
SparkEdgeEmu utilizes Fogify as a scalable
underlying emulator since it provides:
● Resource Heterogeneity
● Network Link Heterogeneity
● Any-scale Experimentation
● Monitoring Capabilities
32. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 32
Emulation & Deployment
M. Symeonides, Z. Georgiou, D. Trihinas, G. Pallis and M. D. Dikaiakos, "Fogify: A Fog Computing Emulation Framework," IEEE/ACM Symposium on Edge Computing (SEC2020)
SparkEdgeEmu utilizes Fogify as a scalable
underlying emulator since it provides:
● Resource Heterogeneity
● Network Link Heterogeneity
● Any-scale Experimentation
● Monitoring Capabilities
33. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 33
Emulation & Deployment
M. Symeonides, Z. Georgiou, D. Trihinas, G. Pallis and M. D. Dikaiakos, "Fogify: A Fog Computing Emulation Framework," IEEE/ACM Symposium on Edge Computing (SEC2020)
SparkEdgeEmu utilizes Fogify as a scalable
underlying emulator since it provides:
● Resource Heterogeneity
● Network Link Heterogeneity
● Any-scale Experimentation
● Monitoring Capabilities
34. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 34
Emulation & Deployment
SparkEdgeEmu utilizes Fogify as a scalable
underlying emulator since it provides:
● Resource Heterogeneity
● Network Link Heterogeneity
● Any-scale Experimentation
● Monitoring Capabilities
M. Symeonides, Z. Georgiou, D. Trihinas, G. Pallis and M. D. Dikaiakos, "Fogify: A Fog Computing Emulation Framework," IEEE/ACM Symposium on Edge Computing (SEC2020)
36. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 36
Smart City Use Case
We utilize the framework’s templates to
emulate a city-scale Apache Spark cluster with
22 heterogeneous Edge nodes deployed in three
regions.
The emulated topology includes 22 nodes:
● One cloudlet server, 8 cores@2.4GHz and 8GB RAM
● 8 raspberry Pi3b, 4 cores@1.2GHz and 1GB RAM
● 6 raspberry Pi4, 4 cores @1.8GHz and 4GB RAM
● 7 Jetson-Nano, 4 cores @1.43 GHz and 4GB RAM
We use the real-world dataset* of NYC For-Hire Vehicle (”FHV”) trip routes in the first half of 2022 from New York city.
*https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
37. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023 37
Performance Evaluation of Queries
We perform three queries on top of the deployed Spark
cluster, namely:
● Average payment of each drop-off location
● Number of trips per company
● The overall summarized tips
It seems that execution time follows Q1>Q2>Q3, while the
refetching of data harms the performance of the queries.
38. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023
CPU utilization & Task placement
38
Apache Spark tends to assign more
tasks to more powerful nodes, while
these nodes usually are underutilized in
Edge topologies…
39. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023
Memory Usage
39
Apache Spark occupied less memory on
edge memory-constrained devices, even
if it assigns to them a similar number of
tasks as the other Edge nodes.
40. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023
Network Utilization
40
The health-check and the task-assigning
messages contribute a non-negligible
extra overhead in network traffic.
42. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023
Conclusion
● We introduced SparkEdgeEmu an open-source emulation framework for Edge-enabled
Apache Spark Deployments, that…
○ features a powerful model specification for Edge use case deployments
○ offers a ease-to-use interactive programming interface
○ generates representative and realistic large-scale deployments
○ provides monitoring capabilities for both emulated infrastructure and Spark cluster
● We provided a detailed description of the implementation aspects, such as modeling,
deployment realization and Spark Edge cluster emulation.
● We demonstrate the use of SparkEdgeEmu utilizing a representative use case scenario
highlighting how it influences the performance of the queries, and their effects on
underlying resources.
42
45. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023
Testbeds & Emulators
45
To get the most accurate results in testing, you need a physical deployment, however…
● to build a physical testbed is costly, time-consuming, and for minor changes requires extra money
and re-configurations
Edge and Fog Emulators can help by providing scalable testbeds with resource and network
heterogeneity[1][2][3] but they do not…
● provide randomized large-scale Edge deployments, with users manually needing to configure and
design every single emulated node
● provide specialized deployments for Big Data Engines (e.g., Apache Spark), and consequently
● offer fine-grained monitoring metrics from the Big Data Engines
[1] Fogbed: A rapid-prototyping emulation environment for fog computing, A. Coutinho et al. IEEE International Conference on Communications (ICC), 2018
[2] Emuedge: A hybrid emulator for reproducible and realistic edge computing experiments, Y. Zeng IEEE International Conference on Fog Computing (ICFC), 2019
[3] MockFog: Emulating Fog Computing Infrastructure in the Cloud, J. Hasenburg et al. IEEE International Conference on Fog Computing (ICFC), 2019
47. Moysis Symeonides
msymeo03@ucy.ac.cy
Laboratory for Internet
Computing
Euro-Par 2023
Emulation Accuracy
The emulated computing resources has only a small performance
degradation for workloads approaching 100% CPU usage
Fogify achieves near to real-world network link capabilities, with
only outliers not captured and a slight overhead in low-latency
connections
The emulation results closely follow the real measurements
with a 5%-8% deviation of the overall experiment time.