The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” of Big Data

The Paradigm of Fog Computing
with Bio-inspired Search Methods
and the “5Vs” of Big Data
Presenters:
Richard Millham, Israel Edem
Agbehadji, and Samuel Ofori Frimpong
Durban Univeristy of Technology, South Afrca

Outline
• Introduction
• Growth of Big Data
• The 5Vs of Big Data
• Framework to Manage Big Data
• Data Streaming vs Datasets
• Edge/Fog Computing paradigm
• Challenges of Fog computing and Potential
Solutions
• Conclusion

Introduction
• This presentation seeks to briefly present some of the issues of
big data:
• What characteristics constitute big data?
• What methods and phases are needed to process big data?
• Datasets vs data streaming? What is the difference?
• What is the role and domain of bio-inspired algorithms?
• The drivers for fog/edge computing architecture?

Big data
• Like many concepts, there is no consensus of what constitutes big data
• Many will say Big data is a voluminous amount of varied data available at
high rate, but it possesses other characteristics as well (5 Vs)
• Big data yields neither meaning nor value, it is important to understand
the unique features of data which may inform the analysis
• Any framework of analysing big data must address big data characteristics
namely velocity, variety, veracity, volume and value
• Sources of big data are numerous but have evolved with our changing
society
• IOT and smart entities
• Enterprise systems
• Social media

The growth of IOT, along with the subsequent growth of IOT data, is one of
the main contributors to the growth of Big Data and the need for methods to
manage it Durban Univeristy of Technology, South Afrca

Smart Cities and IOT Sensors/Data Analytics
Smart City IOT/Data Analytics
• Smart cities enables its citizens to enjoy a
wide range of new services:
• health sector to monitor quality of
service delivery
• Government gains better insights for
better social intervention programs to
citizens
• Companies to customers to understand
customers perception of products
• These services are enabled through the use of
IOT sensors to monitor the environment and
data analytics to make sense of the
monitored data collected

The 5-Vs of Big Data

Big Data Framework
• To manage big data, a framework consisting of a set of steps
and phases. Although some of these phases may overlap and
the steps may vary, this framework is as follows:
• Data Pre-Processing
• Data Cleansing
• Acquire data from a multitude of heterogeneous
devices: social media, IOT sensors, mobile phones,
enterprise system transactions, GPS devices, etc
• Estimate missing values, if needed
• Remove redundant values
• Reformat heterogeneous data into a more uniform
format(s)

Big Data Framework (cont)
Data Scattered in 3-D space Data Cleansing (Data Reduction)
• One of the most important steps
in data cleansing is data
reduction (reducing the amount
of data to be processed by later
stages). This can be
accomplished by:
• Removing outliers (noise)
• Removing redundant data
• Removing non-interesting data
(with little value)

• After data cleaning is complete,
the next step is data clustering
or the combining of similar items
together into groups for easier
processing of data in later stages
• Clustering methods include:
• K-Nearest Neighbour
• Density-Based scan discovers
different cluster shapes

Feature Extraction and Classification
• The next step after data clustering
is feature extraction and
classification where important
features are extracted from the
data and classified (labeled). This
reduces the amount of resources
used to describe a group of data
• Many tools may be used including:
• Autoencoder (to learn unlabeled
data)

Big Data
Framework (cont)
• Data Mining Phase
• This phase involves finding relationships
among groups of data identified during the
previous phase
• These relationships include correlations
(dependencies among variables) and
association rules (if-then rules) among others
• Methods include Apriori, PageRank etc.
• Many data mining tools exist, using a variety
of methods, including:
• Orange
• Weka
• Apache Mahout
• RapidMiner
• KNIME integrates various components
for machine learning and data mining.

Big Data
Framework
(cont)
• Visualisation/Business Intelligence Phase
• In this phase, the data relationships and classes identified in previous stages may be visualized
in the form of pie graphs, charts, linear diagrams, etc and/or incorporated into business rules
within the organization.
• Some examples:
• Linear graph may show the increase/decrease in sales of particular products based on
particular features offered. Hence, businesses may be able to determine the most
popular features for each price range
• Business rules may find associations between different itemsets. An example, a store
might find a strong association between the sale of hamburgers and rolls.

Datasets vs Data
Streams
• Datasets may consist of high volume, veracity,
value and variety but are often fixed in terms of
velocity. In other words, these datasets may
contain the 4 Vs of big data and are modelled on
high velocity data coming in during the formation
of the dataset. However, once this dataset is
formed, they are stable. Consequently, many
different methods and tools may be used to
analyse them
• Data streaming, on the other hand, contains the
same characteristics of datasets but also contain
continuous high velocity with often changing
varieties, values, and veracities of data. Analysis
of this data, due to these characteristics, is
problematic and requires huge resources in
computation (i.e. a supercomputer)

Datasets vs Data
Streams (cont)
• As this solution is not usually practical,
different methods must be used to
manage data streams including:
• Fixed or random sampling of the
stream (ex: 1 in 50 frames) to get a
snapshot of current data
• Sliding windows to contain these
samples and to ensure that these
samples are current as the streams
may change
• Potentially different methods that are
used for data streams in order to
handle the high velocity and produce
satisfactory results

Big Data Analytics
• Following diagram shows some of
the methods mentioned or to be
mentioned in presentation under
the term Big Data Analytics
• Batch (dataset) vs stream processing
• Machine learning and advanced
learning (feature extraction,
classification, and business rules)
• Data mining
• Stochastic (probability) models for
preprocessing of noise, feature
extraction, classification, etc
• Edge computing and cloud computing

Bio-inspired Computation
• Bio-inspired computation models the natural behavior of animals
(optimized over a very long time period) to achieve some set goal
• Numerous bio-inspired algorithms exist (200+) each with their
advantages and disadvantages
• One basic premise of these algorithms is exploration vs exploitation
• exploration:- search different regions of the solution space to find a global
solution
• exploitation:- search in a small region of the present solution in order to
improve its quality with a small perturbation
• Bio-inspired algorithms have been used in many application domains
such as route optimization, recommender systems, renewable energy

Bio-inspired
Computation(Cont.)
• Search strategy based on the
behaviour of animals in their
natural habitat.

Application domain of Bio-inspired

Why is Edge/Fog Computing Needed?
Cloud Computing
Problems with Cloud – Need for New
Paradigm
• As illustrated in diagram, big data
(huge amounts from many types of
devices flow at high speed to the
cloud) to be processed using data
framework in cloud
• Network soon becomes overloaded
as many early phases
(preprocessing and data reduction)
are only done in the cloud
[Bottleneck]

Fog Computing Paradigm
• The focus is on devices connected to the
edge of networks.
• The term fog computing or edge
computing operates on the concept that
instead of hosting devices to work from a
centralized location that is cloud server,
fog systems operate on network ends
(Naha et al. 2018).
• Advantage of fog computing is that it
avoids delay in processing of raw data
collected from edge networks rather than
sending it directly to the cloud for
processing

Fog computing applications
SMART CITY
MONITORING
ENERGY EFFICIENT
MODEL
FOG COMPUTING IN
HEALTH MONITORING

Quality Challenge of Fog computing and 5V’s and
Solution
• There are many issues in fog computing with big data but a key challenge is the issue of data quality.
• Solution: Fog Computing and “5Vs” for Quality-of-Use (QoU) Framework.
• This framework has analytical model that consider speed, size and type of data from
IoT devices and then determine the quality and importance of data to store on cloud
platform.
• The framework has two components, namely IoT (data) and fog computing
• The IoT (data) components is the location of sensors, Internet-enabled devices which
capture large data, at a speed and different types of data
• The data generated are processed and analyzed by fog computing component to
produce quality data that is useful

More
Challenges in
Fog
Computing
and IoT
• The challenges include:
• energy consumption
• data distribution
• heterogeneity of edge devices
• dynamicity of fog network etc.
• This leads to finding new methods to
address the challenges
• One promising method is the use of bio-inspired
algorithms (a subset of Evolutionary algorithms)
to manage different aspects of these problems

Fog Computing and Evolutionary
Algorithms Models
• Evolutionary Algorithm for Energy Efficient Model.
• Bio-Inspired Algorithm for Scheduling of Service Requests
to Virtual Machine (VMs).
• Bio-Inspired Algorithms and Fog Computing for Intelligent
Computing in Logistic Data Center.
• Ensemble of Swarm Algorithm for Fire-and-Rescue
Operations.
• Evolutionary Computation and Epidemic Models for Data
Availability in Fog Computing.
• Bio-Inspired Optimization for Job Scheduling in Fog
Computing.

Conclusion
• This presentation is a brief overview of big data along with many of its aspects
• Increasing technological and societal changes make big data much more predominant
• With increasing prevalence of big data comes a demand to manage this data (particularly
data streams) through new methods and new architectures (edge/fog computing)
• Promising methods have emerged in the field of bio-inspired algorithms which have been
applied to a variety of domains, including challenges with new architectures

The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” of Big Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” of Big Data

Similar to The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” of Big Data (20)

Recently uploaded

Recently uploaded (20)

The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” of Big Data

Editor's Notes