Data Processing Challenges Presented by IoT Data
in Distributed Computing
School of Computing
National College of Ireland
Abstract—The Internet of Things (IoT) growth is charted to
accelerate over the next few years. The nature of IoT is such
that they exist as heterogeneous unfettered devices and sensors,
capable of emitting erratic and unyielding amounts of data. The
challenges of processing unrelenting data from the IoT are similar
to those of earlier Big data and Cloud computing but the scale is
magniﬁed. Streaming masses of disparate data to cloud hubs may
well stiﬂe and overwhelm infrastructure and service capability.
The IoT excel in generating data but possess limited resources
when it comes to data processing. We conduct a literature review
with an interest in how distributed computing may assist in
overcoming some of the challenges of processing IoT data.
As the internet continues to evolve, each cycle of growth
appears to consume yet another group of entities capable of
generating more data than the previous set . One such
group of entities is the Internet of Things (IoT). There is
an abundance of descriptions covering what the IoT are and
without an accepted deﬁnition it falls down to the setting in
which the IoT are used to obtain clariﬁcation ,. For our
purposes, we adopt an IoT description from  who suggest
that IoT objects share an internet connected relationship which,
enables them to converse through the transmission of data
concerning the context of their local environment. In this
paper we review a sample of literature which encompass some
of the challenges associated with processing IoT data and
how distributed computing may relieve certain pinch points?
Distributed computing (referring to compute and storage
services) offered through clouded implementations go some
way to close the gaps encountered in processing IoT data .
The literature refers to the estimated extent of IoT growth
as supplied by industrial and vendor research ,. For
example, a current IoT growth prediction is estimated to be
in the order of 30 billion IoT connected devices by 2020 with
a data footprint of 180 zettabytes being emitted annually by
II. IOT DATA PROCESSING CHALLENGES
The IoT is still a relatively new area and may appear
redolent of the advance of cloud computing. The sheer increase
in IoT deployment and uptake create distinct data processing
outcomes/ [Accessed 9 March 2017] 
challenges. The ﬁgures indicating data emissions may warrant
closer attention by those implementing IoT and their end users
than the ﬁgures powerpointed by vendors and commercial
research . The IoT span a vast area and much of it is
beyond the scope of our focus (e.g. data security, privacy and
connectivity concerns which are relevant to IoT have been
precluded) on the data processing challenges mentioned in this
literature review and summarised below.
A. Volume of Data Generated
Each new deployment of an IoT device adds to the amount
of data emitted ,. The approach to systems architecture
as  observe will need to adapt to the demands of processing
IoT data. The IoT data multiplier effect will impact systems
which are not sized for the data proﬁles they expect to serve
(including built in burst capability to cater for future data
volume requirements) and will be at risk of becoming a point
of failure should ingestion gateways become ﬂooded with data.
B. Uneven Frequency of Emission
Devices may be event driven, always on or a mixture of
both, hence the data generated is not necessarily uniform
which , recognise and  further elaborate on possible
approaches to manage the irregular proﬁle of IoT data. Another
issue could be related to cost where infrastructure is in place
awaiting data from event driven devices which are infrequently
triggered or as  suggest, devices emitting redundant data
to the cloud when it is superﬂuous to requirements.
C. Speed of Arrival
Devices producing real time or near real time streams as
,, confer present an additional processing challenge.
Poor throughput and processing blockages will preclude timely
reporting of sensitive information emitted from event driven
sensors. Information which arrives too late may be of little
A plethora of IoT devices producing differing output as
,,, suggest will add to the data processing
burden. Connecting to a clouded infrastructure may relieve
some of the IoT device heterogeneity connectivity concerns
through the concept of the Cloud of Things ,.
However, customised pre-processing may be required to
contend with the variety of ﬂavours and formats of data
Data quality may be eroded for several reasons as 
mention this may be due to missing values, duplication,
unknown meaning and sparse data. The level of impact will
vary depending on the domain. Missing data may be due
to erratic connectivity, device malfunction or failures ,.
Devices may generate data that is rated as poor because of
over dilution with superﬂuous values which are not part of
the end user requirements .
F. Data Locality
Devices generally are incapable of processing their own
data, hence data is transmitted to the cloud for processing
,,. It is optimal to retain data locality, that the
compute is proximal to the data. For IoT this will require
boosting compute services towards the network edges.
Without the ability to identify, isolate and pre-process the
data the value it may contain could escape unnoticed .
Amongst a recent review of IoT literature  suggest that
it tends to conduct a broad synopsis of the areas of concern
without addressing the key item impacting IoT, that being how
the handling of data is accomplished. We ﬁnd literature that
focus on IoT architecture  or IoT middleware , however,
in general the data element is recognised, but perhaps not to
the rigor desired.
III. DISTRIBUTED PROCESSING OF IOT DATA
The Cloud of Things (CoT) mentioned earlier, is the
conﬂuence of IoT and the Cloud ,,,. There are
necessary shared services that IoT implementations obtain
an advantage from which, according to  and related
to our focus, is the provision of a substantial processing
resource. However, based on the challenges mentioned above,
dependence on a central clouded resource is unlikely to
perform to the demands placed on it by a disparate
heterogeneous IoT population. What follows is some of the
distributed computing assists which we see in our review of
A. Fog Computing
Fog computing, as , concur, is the juxtaposition of
cloud services to primarily, but not only, the outer reaches
of the network touching many disparate devices. Bringing
compute closer to the device through the Fog may alleviate
many of the IoT data challenges . Proximal Fog endpoints
which  suggest enable data locality, initial inspection,
prejudiced selection of data and processing of high priority real
time data at the edge. Lower priority data (and where necessary
Fog processed data) undergoing a store and forward basis prior
to transmission to the central clouded facilities for downstream
processing. Bringing multiple Fog end points into play across
a wide array of IoT deployments and selective partitioning
of data by priority, leads to data only appearing where it is
needed. It is not apparent how an extensive Fog cloud would
be implemented  consider that the Fog lacks the resources
required to conduct lengthy or convoluted data processing, and
advocate the Lambda architectural design  that provides
processing and machine learning capabilities from device data.
The provision of storage and processing brought to work in
unison at these Fog edges as  recognise creates the setting
for parallelisation through federated mini clouds conﬁgured
to suit the local IoT needs, with data coalescing to central
clouded facilities. However, the dispersal of services to the
edge gives rise to issues of command and control of the
diaspora. The eradication of such issues is one of the alluring
factors for the move to the cloud in the ﬁrst place .
B. Contextualising Data
Adding context to IoT data to enable a better understanding
of the data which as , suggest has been highlighted due
to the expansion of the IoT. Addition of such context could
be achieved by early simple machine learning in the Fog,
providing the resources are in place. Conducting a cluster
analysis of the IoT data in real time as  suggest would
be one such method of bringing meaning to data which could
be distributed across compute. The use of Map Reduce on IoT
real time streams is not suitable as  point out and suggest
that a new design pattern which provides parallel processing
of IoT real time streams is long overdue.
The IoT present many challenges which are typical of
those witnessed by the surge of Big Data. It appears that
the IoT data footprint is bigger by magnitudes. A review
of the literature associated with the challenges of IoT data
processing and what distributed computing might contribute
to the alleviation of such issues was conducted. The current
approach of a centralised cloud may not be capable of fully
keeping in step with the demands of IoT data processing.
Applying a Fog computing implementation could be designed
to overcome many of the challenges mentioned. Data locality
being a main attraction, hence assisting the introduction of
parallelisation. Obtaining a degree of meaning about the data
through its contextualisation could enable better management
of data volume through partitioning. Both these areas present
opportunities for further research in particular the devolution
of machine learning of real time IoT data processing in the
 D. Puschmann, P. Barnaghi, and R. Tafazolli, “Adaptive clustering for
dynamic iot data streams,” IEEE Internet of Things Journal, vol. 4, no. 1,
pp. 64–74, 2017.
 L. Atzori, A. Iera, and G. Morabito, “A Survey of the Internet of
Things,” Proceedings of the 1st International Conference on E-Business
Intelligence (ICEBI2010), vol. 54, pp. 358–366, 2010.
 M. A. Razzaque, M. Milojevic-Jevric, A. Palade, and S. Cla, “Middle-
ware for internet of things: A survey,” IEEE Internet of Things Journal,
vol. 3, no. 1, pp. 70–95, 2016.
 M. D´ıaz, C. Mart´ın, and B. Rubio, “State-of-the-art, challenges, and open
issues in the integration of internet of things and cloud computing,”
Journal of Network and Computer Applications, vol. 67, pp. 99–117,
 A. Botta, W. De Donato, V. Persico, and A. Pescap´e, “Integration of
Cloud computing and Internet of Things: A survey,” Future Generation
Computer Systems, vol. 56, pp. 684–700, 2016.
 Y. Qin, Q. Z. Sheng, N. J. G. Falkner, S. Dustdar, H. Wang, and A. V.
Vasilakos, “When things matter: A survey on data-centric internet of
things,” Journal of Network and Computer Applications, vol. 64, pp.
 T. Chun-Wei, L. Chin-Feng, C. Ming-Chao, and Y. Laurence, “Data
Mining for Internet of Things,” IEEE Communications Surveys &
Tutorials, vol. 16, no. 1, pp. 77–97, 2014.
 IBM, “Enabling IoT Platforms to Deliver Business Outcomes.” [Online].
 A. Sheth, “Internet of Things to Smart IoT Through Semantic, Cognitive,
and Perceptual Computing,” IEEE Intelligent Systems, vol. 31, no. 2, pp.
 C. Perera, A. Zaslavsky, P. Christen, and D. Georgakopoulos, “Context
Aware Computing for The Internet of Things,” IEEE Communications
Surveys & Tutorials, vol. 16, no. 1, pp. 414–454, 2014.
 M. Aazam and E. N. Huh, “Fog computing and smart gateway based
communication for cloud of things,” Proceedings - 2014 International
Conference on Future Internet of Things and Cloud, FiCloud 2014, pp.
 H. Cai, B. Xu, L. Jiang, and A. V. Vasilakos, “IoT-based Big Data
Storage Systems in Cloud Computing: Perspectives and Challenges,”
IEEE Internet of Things Journal, vol. PP, no. 99, p. 1, 2016.
 L. Jiang, L. D. Xu, H. Cai, Z. Jiang, F. Bu, and B. Xu, “An IoT-
Oriented Data Storage Framework in Cloud Computing Platform,” IEEE
Transactions on Industrial Informatics, vol. 10, no. 2, pp. 1443–1451,
 F. Chen, P. Deng, J. Wan, D. Zhang, A. V. Vasilakos, and X. Rong, “Data
mining for the internet of things: Literature review and challenges,”
International Journal of Distributed Sensor Networks, vol. 11, no. 8, p.
 S. Li, L. D. Xu, and S. Zhao, “The internet of things: a survey,”
Information Systems Frontiers, vol. 17, no. 2, pp. 243–259, 2015.
 F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, “Fog Computing and
Its Role in the Internet of Things,” Proceedings of the ﬁrst edition of
the MCC workshop on Mobile cloud computing, pp. 13–16, 2012.
 N. Marz and J. Warren, Big data: principles and best practices of
scalable realtime data systems. London;Greenwich, Conn;: Manning,
 X. Masip-Bruin, E. Marn-Tordera, G. Tashakor, A. Jukan, and G. J. Ren,
“Foggy clouds and cloudy fogs: a real need for coordinated management
of fog-to-cloud computing systems,” IEEE Wireless Communications,
vol. 23, no. 5, pp. 120–128, October 2016.