SlideShare a Scribd company logo
1 of 8
Download to read offline
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016)
204
A Novel Framework for Big Data Processing in a Data-Driven
Society
Otuonye Anthony I.1
, Onyechere Patricia O.2
, Njoku Obilor A.3
1
Department of Information Technology,
Federal University of Technology Owerri, Nigeria.
2
Department of Management Technology,
Federal University of Technology Owerri, Nigeria.
3
Department of Computer Science,
Federal University of Technology Owerri, Nigeria.
Corresponding Author: Otuonye Anthony I
_________________________________________________________________________________________
Abstract
Big data has continued to attract huge attention from the academia, as well as the industry and government
agencies. The attraction has been due to a quick rise in the generation of large-scale and varied data, resulting
from emerging increase in the use of sensor networks, social networks, and some semantic web applications.
Spontaneous increase in large-scale data poses a critical challenge for today‟s Data Scientists and Social
Managers. This paper introduces a big data processing framework from the data mining perspective that will
fully harness the potential benefits of the big data revolution and enhance effective management and processing
of large-scale data. We have proposed a three-tier data mining structure for big data storage, processing and
analysis from a single platform and provide social sensing feedback for a better understanding of our society.
Big Data concern large-volume, complex, and growing data sets with multiple, autonomous sources. Our model
is both data-driven and demand-driven from various information sources, mining and analysis. The study
became necessary due to the growing need to assist governments and business agencies to take advantage of the
big data technology for the desired turn-around in their socio-economic activities. We have adopted the HACE
theorem in the model design which characterizes the unique features of the big data revolution: large,
heterogeneous, complex, and evolving. We also adopted the Hadoop‟s Map Reduce optimization strategies
based on the MapReduce parallel processing framework for big data mining. There is need to revise most of our
traditional data mining techniques and deploy the suggested distributed versions of big date models available to
ensure profitable data analysis in our contemporary data-driven society.
__________________________________________________________________________________________
Keywords: Big Data, HACE Theorem, Data Mining, Social Network, Hadoop
INTRODUCTION
Big data defines large data sets that are complex,
voluminous, heterogeneous, emanating from a variety
of data sources, and in most cases, with rapidly
changing relationships among the sets (Otuonye, A.
et al, 2018). It is a term that describes the large
collections of information – both structured and
unstructured – that floods our society on a daily basis.
According to Otuonye, A. et al, (2018), big data is
defined as the large and ever-increasing volumes of
data created by humans, machines, software
applications, and tools. The data is generated from a
large number of sources including social media,
internet-enabled devices (such as smart phones and
tablets), machine, video and voice recordings, and so
on. According to Jenn, C. (2014), “… 90% of the
data in the world today was created in the last two
years”.
The reason for the explosion in information is not far-
fetched: currently, we have over 7 billion people in
the world today (Worldometers, 2014) and out of this
number , over 2 billion are connected to the internet.
Research has also shown that over 5 billion people
are using mobile devices across the globe. Due to the
fact of these technological advancements, people and
machine now generate huge amounts of data on a
daily basis through the increased use of such devices.
The use of remote sensors is another source of big
data. They continuously produce much heterogeneous
data that are both structured or unstructured in nature.
These are big data which cannot generally be
categorized using the regular relational database. big
data is also captured and processed at a very high
speed to improve businesses and decision making
processes.
There is therefore need for commensurate
advancements in data storage and data mining
technologies to ensure proper preservation and
analysis of big data. Hence, this paper introduces a
big data processing framework from the data mining
perspective to fully harness the potential benefits of
the big data revolution and enhance effective
management and processing of large-scale data. We
are proposing a three-tier data mining structure for
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211
© Scholarlink Research Institute Journals, 2019 (ISSN: 2141-7016)
jeteas.scholarlinkresearch.com
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016)
205
big data storage, processing and analysis from a
single platform and provide social sensing feedback
for a better understanding of our society.
Big data consists of billions and trillions of records
which might be in terabytes or petabytes (1024
terabytes) or exabytes (1024 petabytes). According to
Rajneeshikumar, P. & Uday, A. (2016), the amount
of data produced in our society today can only be
estimated in the order of zettabytes, and it is
estimated that this quantity grows at the rate of about
40 percent every year. For example, it was reported
that the presidential debate alone between former
President Barack Obama and Governor Mitt Romney
on 4 October 2012, triggered more than 10 million
tweets in 2 hours (Mervis J., 2012), exposing public
interests in healthcare among other things. A good
number of IT scholars are of the opinion that the
traditional restraints of the relational database system
can be overcome by effective deployment of the big
data technology, which have opened opportunities for
storage and processing of all types of data coming
from diverse sources.
Apart from the inability of the traditional system to
manage disparate data, it is equally very difficult to
integrate and move data across organizations which is
traditionally constrained by data storage platforms
such as relational databases technology and batch
files, which has limited ability to process very large
volumes of data, data with complex structure (or with
no structure at all), or data generated at very high
speeds.
We therefore propose a big data processing
framework from the data mining perspective. We will
deploy the HACE theorem which carefully
incorporates all the features of Big Data. The
challenges of big data are broad in terms of data
access, storage, searching, sharing, and transfer, and
these will be considered in this paper.
Aim and Objectives of Study
This paper proposes a Big Data Processing
framework that adopts the HACE theorem for
profitable implementation. The study will seek to
achieve the following specific objectives:
i. Fully explain the characteristics of the Big
Data using the HACE theorem.
ii. Propose a new framework for Big Data
processing and analysis from the data
mining perspective.
iii. Propose a three-tier data mining architecture
that will provide accurate and relevant social
sensing feedback.
iv. Make relevant suggestions and
recommendations for profitable data
analysis in our data-driven society.
LITERATURE REVIEW
Critical Analysis on Existing Big Data Processing
Models
Various authors have given a critical look into the
various models of big data processing available
today. A very important analysis was given by
Rajneeshikumar P. & Uday A. (2016), in their paper
titled “Data Mining and Information Security in Big
Data Using HACE Theorem”. They equally
presented a model of big data from the data mining
perspective but focused generally on security and
privacy issues in big data mining using the AES
algorithm. The table 1 presents the achievements and
scholarly views of other researchers in the area of big
data technology.
Table 1. Comparative study of various big data processing models
S/n Technique used Description Drawback
1. AES Algorithm
[11]
Used AES Algorithm to model big data security systems
with a focus on data mining and privacy issues.
System could not provide accurate and relevant
social sensing feedback in real-time.
2. HACE theorem
[14]
Uses distributed parallel computing with the help of the
Apache Hadoop. Built the following:
a. Big Data Mining Platform
b. Big Data Semantics and Application Knowledge
3. Big Data Mining Algorithm
System not much secured
3. Parallelization
strategy
[2]
Used SVM algorithm, NNLS algorithm, LASSO
algorithm to convert the problems into matrix-vector
Multiplication
Suitable for medium
scale data only, and no form of security was
provided,
4. Parallel Algorithms
for Mining Large-
scale Rich-media
Data. [13]
Used Spectral, Clustering, FP-Growth, and Supports
Vector Machines.
suitable for single source
knowledge discovery
methods, but not suitable for multisource
knowledge discovery.
5. Combined Data
Mining [10]
Multiple data sets, multiple features, multiple methods on
demand, Pair pattern, and Cluster pattern.
Not suitable for handling
large data.
6. Decision Tree
Learning
[15]
Converts original sample data sets into a group of unreal
data sets, from where the original samples cannot be
reconstructed without the entire group of unreal data sets.
Centralized, Storage
Complexity, and Privacy
Loss.
7. Naïve
Bayesian theory
[9]
Adding noise to classifier‟s parameters. Classification accuracy, but centralized.
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016)
206
The Five ‘Vs’ of Big Data
The big data concept began to gather momentum
from 2003 when the „Vs‟ of big data was proposed to
give foundational structure to the phenomenon which
gave rise to its current form and definition.
According to the foundational definition in [10], big
data concept has the following five mainstreams:
volume, velocity, value, variety, and veracity.
Volume: Information is gathered from different
sources including social media, cell phones, machine-
to-machine (M2M) sensors, credit cards, business
transactions, photographs, and videos recordings. A
vast amount of data is generated each day from these
channels, which have become so large that storing
and analyzing them would pose a major challenge,
especially using our traditional database technology.
According to Longbing C. (2012), facebook alone
generates about 12 billion messages a day, and over
400 million new pictures are uploaded every twelve
hours. Users‟ comments alone on issues of social
importance are in millions. Opinions of product users
generated by pressing the “Likes” button are in their
trillions. Collection and analysis of such information
have now become an engineering challenge.
Velocity: This refers to the speed at which new data
is being generated from various sources such as e-
mails, twitter messages, video clips, and social media
updates. Such information now comes in torrents
from all over the world on a daily basis. The
streaming data need to be processed and analyzed at
the same speed and in a timely manner for it to be of
value to business organizations and the general
society. Results of data analysis should equally be
transmitted instantaneously to various users. Credit
card transactions, for instance, need to be checked in
seconds for fraudulent activities. Trading systems
need to analyze social media networks in seconds to
obtain data for proper decisions to buy or sell shares.
Variety: Variety refers to different types of data and
the varied formats in which data are presented. Using
the traditional database systems, information is stored
as structured data mostly in numeric data format. But
in today‟s society, we receive information mostly as
unstructured text documents, email, video, audio,
financial transactions, and so on. The society no
longer makes use of only structured data arranged in
columns of names, phone numbers, and addresses
that fits nicely into relational database tables.
According to [10], more than 80% of today‟s data is
unstructured. Big data technology now provides new
and innovative ways that permit simultaneous
gathering and storage of both structured and
unstructured data (Longbing C. (2012).
Value: Data is only useful if value can be extracted
from it. By value, we refer to the worth of the data.
Business owners should not only embark on data
gathering and analysis, but understand the costs and
benefits of collecting and analyzing such data. The
benefits to be derived from such information should
exceed the cost of data gathering and analysis for it to
be taken as valuable. Big data initiative also creates
an understanding of costs and benefits.
Veracity: Veracity refers to the trustworthiness of
the data. That is, how accurate is the data that have
been gathered from the various sources? Big data
initiative tries to verify the reliability and authenticity
of data such as abbreviations and typos from twitter
posts and some internet contents. Big data
technology can make comparisons that bring out the
correct and qualitative data sets. There are new
approaches that link, match, cleanse and transform
data coming from various systems.
Types of Big Data
Fundamentally, there are only two types of big data
which are usually generated from social media sites,
streaming data from IT systems and connected
devices, and data from government and open data
sources. The two types of big data are structured and
unstructured data.
Structured data: Structured data are data in the form
of words and numbers that are easily categorized in
tabular format for easy storage and retrieval using
relational database systems. Such data are usually
generated from sources such as global positioning
system (GPS) devices, smart phones and network
sensors embedded in electronic devices.
Unstructured data: Unstructured data are data items
that not easily analyzed using traditional systems
because of their inability to be maintained in a tabular
format. It contains more complex data in the form of
photos and other multimedia information, consumer
comments on products and services, customer
reviews of commercial websites, and so on
(Rajneeshikumar P. & Uday A. (2016).
Sources of Big Data
Figure 1. Sources of Big data
Log data Social
media
Transacti
ons
Events
Images Audio Video E-mails
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016)
207
An Interpretation of the HACE Theorem
The HACE theorem is a widely accepted theorem
that fully describes the big data characteristics and
features. Basically, Big Data has the characteristics of
being heterogeneous, large volume, autonomous
sources with distributed and decentralized control,
and a complex and evolving relationships among
data. These characteristics pose enormous challenge
in determining useful knowledge and information
from the big data.
The native myth that tells the story of blind men
trying to size up an elephant was adopted in HACE
theorem. The big elephant in that story represents the
Big Data in our context. Each blind man is to draw
useful conclusion regarding the elephant (of course
which will depend on the part of the animal he
touched). Since the knowledge extracted from the
experiment with the blind men will be according to
the part of the information he collected, it is expected
that the blind men will each conclude independently
and differently that the elephant “feels” like a rope, a
stone, a stick, a wall, a hose, and so on.
To make the problem even more complex, assume
that:
i. The size of the elephant is not static, but
increasing at a very fast rate, with a constantly
changing posture.
ii. The information sources for each blind man is
different, and possibly inaccurate and unreliable
which give him varying knowledge about what
the elephant looks like (example, one blind man
may share his own inaccurate view of the elephant
with his friend), and this information sharing will
definitely make changes in the thinking of each
blind man.
Big data gathering and analysis is equivalent to the
scenario illustrated above. It will involve merging or
integrating heterogeneous information from different
sources (just like the blind men) to arrive at the best
possible and accurate knowledge regarding the
information domain. This will certainly not be as
easy as enquiring from each blind man about the
elephant or drawing one single picture of the elephant
from a joint opinion of the blind men. The difficulty
stems from the fact that each data source may express
a different language, and may even have
confidentiality concerns about the message they
measured based on their country‟s information
exchange procedure.
HACE theorem therefore presents the key
characteristics of Big Data to include:
Huge With Heterogeneous Data Sources
Big data is heterogeneous because different data
collectors make use of their own big data protocols
and schema for knowledge recording. Therefore, the
nature of information gathered even from the same
sources will vary based on the application and
procedure of collection. This will end up in
diversities of knowledge representation.
Autonomous Sources and Decentralized Access
Control to Data
With big data, each data source is distinct and
autonomous with a distributed and decentralized
control. Therefore in big data, there is no centralized
control in the generation and collection of
information. This setting is similar to the World Wide
Web (WWW) where the function of each web server
does not depend on the others.
Complex Data Relationships and Evolving
Knowledge Associations
Generally, analyzing information using centralized
information systems aims at discovering certain
features that best represent every observation. That is,
each object is treated as an independent entity
without considering any other social connection with
other objects within the domain or outside.
Meanwhile, relationships and correlation are the most
important factors of the human society. In our
dynamic society, individuals must be represented
alongside their social ties and connections which also
evolve depending on certain temporal, spatial, and
other factors. Example, the relationship between
two or more facebook friends represents a complex
relationship because new friends are added every day.
To maintain the relationship among these friends will
therefore pose a huge challenge for developers. Other
examples of complex data types are time-series
information, maps, videos, and images.
RESEARCH METHODOLOGY
In this study we will adopt the Data mining technique
and the HACE theorem which has been proved to
characterize the unique features of the big data.
Implementing the HACE theorem using recent
advances in data mining technologies will provide a
unique framework of big data processing for a
profitable data analysis in business and the society at
large.
Data Mining Technique
The Hadoop‟s Map Reduce technique will be used
for big data mining, while the k-means and Naïve
Bayes algorithm will be used for clustering and
dataset classification. We shall consider the
suggestions of Kalaivani K. (2015) that observed the
need to revisit most of the data mining techniques in
use today and proposed distributed versions of the
various data mining methods available due to the new
challenges of big data.
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016)
208
Implementation of the HACE Theorem in Data
Mining
HACE theorem models the detailed characteristics of
the Big Data which include: Huge With
Heterogeneous and Diverse Data Sources that
represents diversities of knowledge representation,
Autonomous Sources and Decentralized Control
similar to the World Wide Web (WWW) where the
function of each web server does not depend on the
other servers, and Complex Data Relationships and
Evolving Knowledge Associations, which therefore
suggests that each object is treated as an independent
entity with consideration on their social connection
with other objects within the same domain or outside.
A popular open source implementation of the HACE
theorem is Apache Hadoop, which is equally
recommended in this research paper. Hadoop has the
capacity to link a number of relevant, disparate
datasets for analysis in order to reveal new patterns,
trends and insights, which is the most important value
of big data.
Critical Challenges of Big Data Processing
Systems
Big data systems face a number of challenges. The
amount of data generated is already very large and
increasing daily. The speed of data generation and
growth is also increasing, which is partly driven by
the proliferation of internet connected devices. The
variety of data being generated is also on the
increase, and current technology, architecture,
management and methods of analysis are now unable
to cope with the flood.
Privacy Concerns for Data Security in Various
Countries
Most Governments around the world are committed
to protecting the privacy rights of their citizens.
Many countries including Australia have passed the
Privacy Act, which sets clear boundaries for usage of
personal information. Government agencies, when
collecting or managing citizens‟ data, are subject to a
range of legislative controls, and must comply with a
number of acts and regulations such as the Freedom
of Information Act (in the case of Nigeria), the
Archives Act, the Telecommunications Act, the
Electronic Transactions Act, and the Intelligence
Services Act. These legislative instruments are
designed to maintain public confidence in the
government as a secure repository and steward of
citizen information. The use of big data by
government agencies will therefore add an additional
layer of complexity in the management of
information security risks.
Information Sharing
Every economy thrives by information, and no
society can survive without access to relevant
information. Government agencies must strive to
provide access to information whilst still adhering to
privacy laws. Apart from making data available, data
must also be accurate, complete and timely for it to
support complex analysis and decision making.
Qualitative data will save costs, enhance business
intelligence, and improve productivity. The current
trend towards open data is highly appreciated since
its focus is on making information available to the
public, but in managing big data, government must
look for ways to standardize data access across her
agencies in such a way that collaboration will only be
to the extent made possible by privacy laws.
Technological Initiatives
Government agencies can only manage the new
requirements of big data efficiently through the
adoption of new technologies. If big data analytics is
carried out upon current ICT systems, the benefits of
data archiving, analysis and use will be lost. The
emergence of big data and the potential to undertake
complex analysis of very large data sets is,
essentially, a consequence of recent advances in
technology.
MODEL FORMULATION AND DISCUSSIONS
Proposing a New Big Data Mining Framework
The following is a High Level Model which shows a
conceptual framework for big data mining. It will
follow a three-tier architecture as shown in figure 2.
Figure 2. A Three-tier architecture for big data mining
(Source: Otuonye A., et al, 2018).
Big data
Tier 1
Big data mining Platform
Using parallel programming
models such as Hadoop‟s
MapReduce technique
Tier 2
Big data semantics and domain
knowledge for the various
Applications
Integrate various domain
applications, information sharing
and privacy mechanisms
Tier 3
Big data mining Algorithm
To adjust parameters to feedback
based on global knowledge and local
sharing
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016)
209
Interpretation of Major Elements in the
Framework
Tier 1:
Tier 1 accessing big datasets and performs arithmetic
operations on them. Big data cannot practically be
stored in a single location. Storage in diverse
locations will also increase. Therefore, effective
computing platform will have to be in place to take
up the distributed large scale datasets and perform
arithmetic and logical operations on them. In order to
achieve such common operations in a distributed
computing environment, parallel computing
architecture must be employed. The major challenge
at Tier 1 is that a single personal computer cannot
possibly handle the big data mining because of the
large quantity of data involved. To overcome this
challenge at Tier 1, the concept of data distribution
has to be used.
For processing of big data in a distributed
environment, we propose the adoption of such
parallel programming models like the Hadoop‟s Map
Reduce technique (Che, D. et al, 2013).
Tier 2:
Tier 2 processes the semantics of big data from the
various Big Data applications. Such information will
be of benefit to the data mining process taking place
at Tier 1 and to the data mining algorithms at Tier 3
by adding certain technical barriers and checks and
balances and data privacy mechanisms to the process.
Addition of technical barriers is necessary because
information sharing and data privacy mechanisms
between data producers and data consumers can be
different for various domain applications (EYGM
LIMITED, (2014).
Tier 3:
Algorithm Designs take place at Tier 3. Big data
mining algorithms will help in tackling the
difficulties raised by the Big Data volumes,
complexity, dynamic data characteristics and
distributed data. The algorithm at Tier 3 will contain
three iterative stages: The first iteration is a pre-
processing of all uncertain, sparse, heterogeneous,
and multisource data. The second is the mining of
dynamic and complex data after the pre-processing
operation. Thirdly the global knowledge received by
local learning matched with all relevant information
is feedback into the pre-processing stage, while the
model and parameters are adjusted according to the
feedback.
General Design Architecture
The general design architecture of the proposed data
mining system is shown in figure 3, which shows the
various activities involved in the mining process.
Figure 3. Proposed General Design Architecture for Big Data Mining
Server
MapReduce program
Admin
Big data
User Interface
Hadoop system
K-means
Algorithm
Naïve Bayes
Algorithm
Clustering and
classification results
Store / Fetch
Interrelated data
Load data
Map()
Reduce()
Output
Data
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016)
210
Information Flow
The flowchart of the proposed system is presented in figure 4.
DISCUSSIONS
Figure 4.2 shows the proposed big data processing
framework for profitable big data processing and
analysis. The model shows all the phases of the big
data mining concept. The major elements of the
system include: Big data sources, Admin, User
interface, the Hadoop‟s MapReduce system, and the
Hadoop‟s K-means and Naïve Bayes algorithm.
Admin
The Admin will be responsible for querying the
system based on request. He will interact directly
with the graphical user interface to supply his needs,
while the query will be processed by the Hadoop
System.
Hadoop’s MapReduce Module
The MapReduce program is a programming model
used to separate datasets and subsequently sent as
input into independent subsets. It contains the Map()
procedure which performs filtering and sorting on
datasets. It also contains the Reduce() procedure that
carries out a summary operation to produce output.
K-means and Naïve Bayes Algorithm
After the MapReduce operation, the output is
transferred to the K-means or Naive Bayes algorithm
to carry out clustering and classification of the
datasets through an iterative procedure. The result is
stored in separate data stores for easy access.
CONCLUSION AND RECOMMENDATIONS
Conclusion
In this paper, we have made a new Big Data
Processing Framework from the data mining
perspective, which deployed the HACE theorem for a
full description of the Big Data characteristics. The
challenges of big data are broad in terms of data
access, storage, searching, sharing, and transfer, and
these have been considered in this paper.
The paper also proposed a three-tier big data mining
architecture for a more profitable storage and analysis
of data.
Recommendations
The benefits of the big data revolution are enormous
and have the capacity to enhance economic activities
and advance government‟s efforts and agencies
around the world. There are available big data
Available
Not available
Login
User
Authentication
Perform query
Record
Availability
Set constraints
Mining
Load data
Fetch record
Security and
privacy checks
Store
Logout
No
Yes
Available
Not available
Figure 4.. Flowchart of the proposed system
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016)
211
technologies with affordable, open source, and
distributed platforms (such as the Hadoop), and
relatively easy to deploy. This is highly
recommended in this paper.
There is need for full implementation and
deployment of relevant advancements in data storage
and data mining technologies to ensure proper
preservation and analysis of big data. There is need
to harness the potential benefits of the big data
revolution and enhance effective management and
processing of large-scale data.
REFERENCES
Chernyak, L. (2011). Big Data - the new theory and
practice. Open the system. DBMS. – M, Open
Systems, № 10. ISSN 1028-7493.
Che, D., Safran, M. & Peng, Z. (2013). From Big
Data to Big Data Mining: challenges, issues,
and opportunities, Database Systems for
Advanced Applications, pp. 1–15, Springer,
Berlin, Germany.
Gillick, D., Faria, A. & DeNero J. (2006).
MapReduce: Distributed Computing for
Machine Learning, Berkley, Dec.
Schadt, E. (2012). The Changing Privacy Landscape
in the Era of Big Data, Molecular Systems
,vol. 8, article 612.
EYGM Limited, (2014). Big data: changing the way
businesses compete and operate, Retrieved
from (www.ey.com/GRCinsignts).
IBM What Is Big Data: Bring Big Data to the
Enterprise, Retrieved from http://www-
01.ibm.com/software/data/bigdata/, IBM,
2012.
Mervis, J. (2012). Agencies rally to tackle big data,
Science, vol. 336, no. 6077, p. 22.
Jenn C. (2014). The V's of Big Data: Velocity,
Volume, Value, Variety, and Veracity, March
11, Retrieved from
(https://www.xsnet.com/blog/bid/205405/ ).
Kalaivani.K. & Amutha, M. (2015). Analysis of Big
Data with Data mining using HACE theorem,
Journal of Recent Research in Engineering and
Technology ISSN (Online): 2349 –2252, ISSN
(Print):2349 –2260 Volume 2 Issue 4.
Lo, B. & Velastin, S. (2009). Parallel Algorithms for
Mining Large-Scale Rich-Media Data, Proc.
17th ACM Intl Conf. Multimedia, (MM09,),
pp. 917-918,2009.
Longbing, C. (2012). Combined Mining: Analyzing
Object and Pattern Relations for Discovering
Actionable Complex Patterns, Sponsored by
Australian Research Council discovery Grants.
Otuonye, A., Udunwa, A. & Nwokonkwo, O. (2018).
A Model Design of Big Data Processing
System Using HACE Theorem, International
Journal of Electronics Communication and
Computer Engineering,vol.9, issue 1.
Otuonye, A., Nwokonkwo, O. & Eke, F. (2018).
Overcoming Big Data Mining Challenges for
Revolutionary Breakthroughs in Commerce
and Industry, International Journal of
Electronics Communication and Computer
Engineering, vol. 9, issue 1.
Rajneeshkumar, P. & Uday, A. (2016). Data Mining
and Information Security in Big Data Using
HACE Theorem, International Research
Journal of Engineering and Technology. ,
volume 03, Issue 06..
Rajneeshkumar, P. & Uday, A. (2016). Survey on
data mining and Information security in Big
Data using HACE theorem, International
Engineering Research Journal (IERJ), vol. 1,
issue 11 , 2016, ISSN 2395 -1621
Velastin, S. & Lo, B. (2009). Parallel Algorithms for
Mining Large- Scale Rich-Media Data, Proc.
17th ACM Intl Conf. Multimedia, (MM
09,),pp.917-918,2009.
Wu, X., Zhu, X., Wu, G., & Ding, W. (2014). Data
Mining with Big Data, IEEE Trans. On
Knowledge and data engineering, vol. 26, no.
1, pp. 97-107.
Weber-Jahnke, J. & Fong, P. (2012). Privacy
preserving decision tree learning using
unrealized data sets, IEEE Trans. Knowl. Data
Eng., vol. 24, no. 2, pp.353_364.
Worldometers (2014). Real time world statistics,
Retrieved from
2zhttp://www.worldometers.info/world-
population/.

More Related Content

What's hot

Hadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of TanzaniaHadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of Tanzaniaijsrd.com
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesKaran Deep Singh
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesKathirvel Ayyaswamy
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor networkparry prabhu
 
A comprehensive survey on data mining
A comprehensive survey on data miningA comprehensive survey on data mining
A comprehensive survey on data miningeSAT Publishing House
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big datakk1718
 
Big data ppt
Big data pptBig data ppt
Big data pptYash Raj
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictive6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictiveEditorJST
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
hariri2019.pdf
hariri2019.pdfhariri2019.pdf
hariri2019.pdfAkuhuruf
 
Data Mining Algorithm and New HRDSD Theory for Big Data
Data Mining Algorithm and New HRDSD Theory for Big DataData Mining Algorithm and New HRDSD Theory for Big Data
Data Mining Algorithm and New HRDSD Theory for Big DataKamleshKumar394
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science suresh sood
 
challenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing frameworkchallenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing frameworkKamleshKumar394
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
 

What's hot (20)

Data mining on big data
Data mining on big dataData mining on big data
Data mining on big data
 
Hadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of TanzaniaHadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of Tanzania
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor network
 
A comprehensive survey on data mining
A comprehensive survey on data miningA comprehensive survey on data mining
A comprehensive survey on data mining
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictive6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictive
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
hariri2019.pdf
hariri2019.pdfhariri2019.pdf
hariri2019.pdf
 
Data Mining Algorithm and New HRDSD Theory for Big Data
Data Mining Algorithm and New HRDSD Theory for Big DataData Mining Algorithm and New HRDSD Theory for Big Data
Data Mining Algorithm and New HRDSD Theory for Big Data
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Big data road map
Big data road mapBig data road map
Big data road map
 
Big data
Big dataBig data
Big data
 
challenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing frameworkchallenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing framework
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 

Similar to A Novel Framework for Big Data Processing in a Data-driven Society

ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGcscpconf
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...AnthonyOtuonye
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal1
 
Research in Big Data - An Overview
Research in Big Data - An OverviewResearch in Big Data - An Overview
Research in Big Data - An Overviewieijjournal
 
A SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSA SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSijistjournal
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformIRJET Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Katie Whipkey
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesEditor IJMTER
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdfAkuhuruf
 

Similar to A Novel Framework for Big Data Processing in a Data-driven Society (20)

ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
 
Big data survey
Big data surveyBig data survey
Big data survey
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
Research in Big Data - An Overview
Research in Big Data - An OverviewResearch in Big Data - An Overview
Research in Big Data - An Overview
 
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
 
A SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSA SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICS
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
Conclusion
ConclusionConclusion
Conclusion
 
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining Challenges
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdf
 
Datamining with big data
 Datamining with big data  Datamining with big data
Datamining with big data
 

More from AnthonyOtuonye

Modeling Embedded Software for Mobile Devices to Actualize Nigeria's Vision
Modeling Embedded Software for Mobile Devices to Actualize Nigeria's Vision Modeling Embedded Software for Mobile Devices to Actualize Nigeria's Vision
Modeling Embedded Software for Mobile Devices to Actualize Nigeria's Vision AnthonyOtuonye
 
Improving university education in nigeria through mobile academic directory
Improving university education in nigeria through mobile academic directoryImproving university education in nigeria through mobile academic directory
Improving university education in nigeria through mobile academic directoryAnthonyOtuonye
 
Framework for mobile application development using J2ME and UML
Framework for mobile application development using J2ME and UMLFramework for mobile application development using J2ME and UML
Framework for mobile application development using J2ME and UMLAnthonyOtuonye
 
Mitigating corruption in public trust through e-governance
Mitigating corruption in public trust through e-governanceMitigating corruption in public trust through e-governance
Mitigating corruption in public trust through e-governanceAnthonyOtuonye
 
E-gov Model for Imo State Nigeria
E-gov Model for Imo State NigeriaE-gov Model for Imo State Nigeria
E-gov Model for Imo State NigeriaAnthonyOtuonye
 
Deploying content management system to enhance state governance
Deploying content management system to enhance state governanceDeploying content management system to enhance state governance
Deploying content management system to enhance state governanceAnthonyOtuonye
 
Bringing E gov Reforms in Africa through the Content Management System
Bringing E gov Reforms in Africa through the Content Management SystemBringing E gov Reforms in Africa through the Content Management System
Bringing E gov Reforms in Africa through the Content Management SystemAnthonyOtuonye
 
Improving quality of assessment for university academic programmes
Improving quality of assessment for university academic programmesImproving quality of assessment for university academic programmes
Improving quality of assessment for university academic programmesAnthonyOtuonye
 
Design of Improved Programme Assessment Model for Nigerian Universities
Design of Improved Programme Assessment Model for Nigerian UniversitiesDesign of Improved Programme Assessment Model for Nigerian Universities
Design of Improved Programme Assessment Model for Nigerian UniversitiesAnthonyOtuonye
 
Emprical Study on Quality of Academic Programme Evaluation Framework: A step ...
Emprical Study on Quality of Academic Programme Evaluation Framework: A step ...Emprical Study on Quality of Academic Programme Evaluation Framework: A step ...
Emprical Study on Quality of Academic Programme Evaluation Framework: A step ...AnthonyOtuonye
 
Impact of Software Development Outcoursing on Growth of the IC Sector in Deve...
Impact of Software Development Outcoursing on Growth of the IC Sector in Deve...Impact of Software Development Outcoursing on Growth of the IC Sector in Deve...
Impact of Software Development Outcoursing on Growth of the IC Sector in Deve...AnthonyOtuonye
 
Remedy to economic recesion in nigeria: an ict-driven model for sustainable e...
Remedy to economic recesion in nigeria: an ict-driven model for sustainable e...Remedy to economic recesion in nigeria: an ict-driven model for sustainable e...
Remedy to economic recesion in nigeria: an ict-driven model for sustainable e...AnthonyOtuonye
 
Using ICT Policy Framework as Panacea for Economic Recession and Instability ...
Using ICT Policy Framework as Panacea for Economic Recession and Instability ...Using ICT Policy Framework as Panacea for Economic Recession and Instability ...
Using ICT Policy Framework as Panacea for Economic Recession and Instability ...AnthonyOtuonye
 
An Analysis of Big Data Computing for Efficiency of Business Operations Among...
An Analysis of Big Data Computing for Efficiency of Business Operations Among...An Analysis of Big Data Computing for Efficiency of Business Operations Among...
An Analysis of Big Data Computing for Efficiency of Business Operations Among...AnthonyOtuonye
 
Need to Implement ICT-based Business Policies for Sustainable Economic Growth...
Need to Implement ICT-based Business Policies for Sustainable Economic Growth...Need to Implement ICT-based Business Policies for Sustainable Economic Growth...
Need to Implement ICT-based Business Policies for Sustainable Economic Growth...AnthonyOtuonye
 
A multi-factor Authentication System to Mitigate Student Impersonation in Ter...
A multi-factor Authentication System to Mitigate Student Impersonation in Ter...A multi-factor Authentication System to Mitigate Student Impersonation in Ter...
A multi-factor Authentication System to Mitigate Student Impersonation in Ter...AnthonyOtuonye
 
Development of Electronic Bank Deposi and Withdrawal System Using Quick Respo...
Development of Electronic Bank Deposi and Withdrawal System Using Quick Respo...Development of Electronic Bank Deposi and Withdrawal System Using Quick Respo...
Development of Electronic Bank Deposi and Withdrawal System Using Quick Respo...AnthonyOtuonye
 
Ehanced Business Marketing For Small Scale Enterprises Via the Quick Response...
Ehanced Business Marketing For Small Scale Enterprises Via the Quick Response...Ehanced Business Marketing For Small Scale Enterprises Via the Quick Response...
Ehanced Business Marketing For Small Scale Enterprises Via the Quick Response...AnthonyOtuonye
 

More from AnthonyOtuonye (18)

Modeling Embedded Software for Mobile Devices to Actualize Nigeria's Vision
Modeling Embedded Software for Mobile Devices to Actualize Nigeria's Vision Modeling Embedded Software for Mobile Devices to Actualize Nigeria's Vision
Modeling Embedded Software for Mobile Devices to Actualize Nigeria's Vision
 
Improving university education in nigeria through mobile academic directory
Improving university education in nigeria through mobile academic directoryImproving university education in nigeria through mobile academic directory
Improving university education in nigeria through mobile academic directory
 
Framework for mobile application development using J2ME and UML
Framework for mobile application development using J2ME and UMLFramework for mobile application development using J2ME and UML
Framework for mobile application development using J2ME and UML
 
Mitigating corruption in public trust through e-governance
Mitigating corruption in public trust through e-governanceMitigating corruption in public trust through e-governance
Mitigating corruption in public trust through e-governance
 
E-gov Model for Imo State Nigeria
E-gov Model for Imo State NigeriaE-gov Model for Imo State Nigeria
E-gov Model for Imo State Nigeria
 
Deploying content management system to enhance state governance
Deploying content management system to enhance state governanceDeploying content management system to enhance state governance
Deploying content management system to enhance state governance
 
Bringing E gov Reforms in Africa through the Content Management System
Bringing E gov Reforms in Africa through the Content Management SystemBringing E gov Reforms in Africa through the Content Management System
Bringing E gov Reforms in Africa through the Content Management System
 
Improving quality of assessment for university academic programmes
Improving quality of assessment for university academic programmesImproving quality of assessment for university academic programmes
Improving quality of assessment for university academic programmes
 
Design of Improved Programme Assessment Model for Nigerian Universities
Design of Improved Programme Assessment Model for Nigerian UniversitiesDesign of Improved Programme Assessment Model for Nigerian Universities
Design of Improved Programme Assessment Model for Nigerian Universities
 
Emprical Study on Quality of Academic Programme Evaluation Framework: A step ...
Emprical Study on Quality of Academic Programme Evaluation Framework: A step ...Emprical Study on Quality of Academic Programme Evaluation Framework: A step ...
Emprical Study on Quality of Academic Programme Evaluation Framework: A step ...
 
Impact of Software Development Outcoursing on Growth of the IC Sector in Deve...
Impact of Software Development Outcoursing on Growth of the IC Sector in Deve...Impact of Software Development Outcoursing on Growth of the IC Sector in Deve...
Impact of Software Development Outcoursing on Growth of the IC Sector in Deve...
 
Remedy to economic recesion in nigeria: an ict-driven model for sustainable e...
Remedy to economic recesion in nigeria: an ict-driven model for sustainable e...Remedy to economic recesion in nigeria: an ict-driven model for sustainable e...
Remedy to economic recesion in nigeria: an ict-driven model for sustainable e...
 
Using ICT Policy Framework as Panacea for Economic Recession and Instability ...
Using ICT Policy Framework as Panacea for Economic Recession and Instability ...Using ICT Policy Framework as Panacea for Economic Recession and Instability ...
Using ICT Policy Framework as Panacea for Economic Recession and Instability ...
 
An Analysis of Big Data Computing for Efficiency of Business Operations Among...
An Analysis of Big Data Computing for Efficiency of Business Operations Among...An Analysis of Big Data Computing for Efficiency of Business Operations Among...
An Analysis of Big Data Computing for Efficiency of Business Operations Among...
 
Need to Implement ICT-based Business Policies for Sustainable Economic Growth...
Need to Implement ICT-based Business Policies for Sustainable Economic Growth...Need to Implement ICT-based Business Policies for Sustainable Economic Growth...
Need to Implement ICT-based Business Policies for Sustainable Economic Growth...
 
A multi-factor Authentication System to Mitigate Student Impersonation in Ter...
A multi-factor Authentication System to Mitigate Student Impersonation in Ter...A multi-factor Authentication System to Mitigate Student Impersonation in Ter...
A multi-factor Authentication System to Mitigate Student Impersonation in Ter...
 
Development of Electronic Bank Deposi and Withdrawal System Using Quick Respo...
Development of Electronic Bank Deposi and Withdrawal System Using Quick Respo...Development of Electronic Bank Deposi and Withdrawal System Using Quick Respo...
Development of Electronic Bank Deposi and Withdrawal System Using Quick Respo...
 
Ehanced Business Marketing For Small Scale Enterprises Via the Quick Response...
Ehanced Business Marketing For Small Scale Enterprises Via the Quick Response...Ehanced Business Marketing For Small Scale Enterprises Via the Quick Response...
Ehanced Business Marketing For Small Scale Enterprises Via the Quick Response...
 

Recently uploaded

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Recently uploaded (20)

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

A Novel Framework for Big Data Processing in a Data-driven Society

  • 1. Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016) 204 A Novel Framework for Big Data Processing in a Data-Driven Society Otuonye Anthony I.1 , Onyechere Patricia O.2 , Njoku Obilor A.3 1 Department of Information Technology, Federal University of Technology Owerri, Nigeria. 2 Department of Management Technology, Federal University of Technology Owerri, Nigeria. 3 Department of Computer Science, Federal University of Technology Owerri, Nigeria. Corresponding Author: Otuonye Anthony I _________________________________________________________________________________________ Abstract Big data has continued to attract huge attention from the academia, as well as the industry and government agencies. The attraction has been due to a quick rise in the generation of large-scale and varied data, resulting from emerging increase in the use of sensor networks, social networks, and some semantic web applications. Spontaneous increase in large-scale data poses a critical challenge for today‟s Data Scientists and Social Managers. This paper introduces a big data processing framework from the data mining perspective that will fully harness the potential benefits of the big data revolution and enhance effective management and processing of large-scale data. We have proposed a three-tier data mining structure for big data storage, processing and analysis from a single platform and provide social sensing feedback for a better understanding of our society. Big Data concern large-volume, complex, and growing data sets with multiple, autonomous sources. Our model is both data-driven and demand-driven from various information sources, mining and analysis. The study became necessary due to the growing need to assist governments and business agencies to take advantage of the big data technology for the desired turn-around in their socio-economic activities. We have adopted the HACE theorem in the model design which characterizes the unique features of the big data revolution: large, heterogeneous, complex, and evolving. We also adopted the Hadoop‟s Map Reduce optimization strategies based on the MapReduce parallel processing framework for big data mining. There is need to revise most of our traditional data mining techniques and deploy the suggested distributed versions of big date models available to ensure profitable data analysis in our contemporary data-driven society. __________________________________________________________________________________________ Keywords: Big Data, HACE Theorem, Data Mining, Social Network, Hadoop INTRODUCTION Big data defines large data sets that are complex, voluminous, heterogeneous, emanating from a variety of data sources, and in most cases, with rapidly changing relationships among the sets (Otuonye, A. et al, 2018). It is a term that describes the large collections of information – both structured and unstructured – that floods our society on a daily basis. According to Otuonye, A. et al, (2018), big data is defined as the large and ever-increasing volumes of data created by humans, machines, software applications, and tools. The data is generated from a large number of sources including social media, internet-enabled devices (such as smart phones and tablets), machine, video and voice recordings, and so on. According to Jenn, C. (2014), “… 90% of the data in the world today was created in the last two years”. The reason for the explosion in information is not far- fetched: currently, we have over 7 billion people in the world today (Worldometers, 2014) and out of this number , over 2 billion are connected to the internet. Research has also shown that over 5 billion people are using mobile devices across the globe. Due to the fact of these technological advancements, people and machine now generate huge amounts of data on a daily basis through the increased use of such devices. The use of remote sensors is another source of big data. They continuously produce much heterogeneous data that are both structured or unstructured in nature. These are big data which cannot generally be categorized using the regular relational database. big data is also captured and processed at a very high speed to improve businesses and decision making processes. There is therefore need for commensurate advancements in data storage and data mining technologies to ensure proper preservation and analysis of big data. Hence, this paper introduces a big data processing framework from the data mining perspective to fully harness the potential benefits of the big data revolution and enhance effective management and processing of large-scale data. We are proposing a three-tier data mining structure for Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 © Scholarlink Research Institute Journals, 2019 (ISSN: 2141-7016) jeteas.scholarlinkresearch.com
  • 2. Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016) 205 big data storage, processing and analysis from a single platform and provide social sensing feedback for a better understanding of our society. Big data consists of billions and trillions of records which might be in terabytes or petabytes (1024 terabytes) or exabytes (1024 petabytes). According to Rajneeshikumar, P. & Uday, A. (2016), the amount of data produced in our society today can only be estimated in the order of zettabytes, and it is estimated that this quantity grows at the rate of about 40 percent every year. For example, it was reported that the presidential debate alone between former President Barack Obama and Governor Mitt Romney on 4 October 2012, triggered more than 10 million tweets in 2 hours (Mervis J., 2012), exposing public interests in healthcare among other things. A good number of IT scholars are of the opinion that the traditional restraints of the relational database system can be overcome by effective deployment of the big data technology, which have opened opportunities for storage and processing of all types of data coming from diverse sources. Apart from the inability of the traditional system to manage disparate data, it is equally very difficult to integrate and move data across organizations which is traditionally constrained by data storage platforms such as relational databases technology and batch files, which has limited ability to process very large volumes of data, data with complex structure (or with no structure at all), or data generated at very high speeds. We therefore propose a big data processing framework from the data mining perspective. We will deploy the HACE theorem which carefully incorporates all the features of Big Data. The challenges of big data are broad in terms of data access, storage, searching, sharing, and transfer, and these will be considered in this paper. Aim and Objectives of Study This paper proposes a Big Data Processing framework that adopts the HACE theorem for profitable implementation. The study will seek to achieve the following specific objectives: i. Fully explain the characteristics of the Big Data using the HACE theorem. ii. Propose a new framework for Big Data processing and analysis from the data mining perspective. iii. Propose a three-tier data mining architecture that will provide accurate and relevant social sensing feedback. iv. Make relevant suggestions and recommendations for profitable data analysis in our data-driven society. LITERATURE REVIEW Critical Analysis on Existing Big Data Processing Models Various authors have given a critical look into the various models of big data processing available today. A very important analysis was given by Rajneeshikumar P. & Uday A. (2016), in their paper titled “Data Mining and Information Security in Big Data Using HACE Theorem”. They equally presented a model of big data from the data mining perspective but focused generally on security and privacy issues in big data mining using the AES algorithm. The table 1 presents the achievements and scholarly views of other researchers in the area of big data technology. Table 1. Comparative study of various big data processing models S/n Technique used Description Drawback 1. AES Algorithm [11] Used AES Algorithm to model big data security systems with a focus on data mining and privacy issues. System could not provide accurate and relevant social sensing feedback in real-time. 2. HACE theorem [14] Uses distributed parallel computing with the help of the Apache Hadoop. Built the following: a. Big Data Mining Platform b. Big Data Semantics and Application Knowledge 3. Big Data Mining Algorithm System not much secured 3. Parallelization strategy [2] Used SVM algorithm, NNLS algorithm, LASSO algorithm to convert the problems into matrix-vector Multiplication Suitable for medium scale data only, and no form of security was provided, 4. Parallel Algorithms for Mining Large- scale Rich-media Data. [13] Used Spectral, Clustering, FP-Growth, and Supports Vector Machines. suitable for single source knowledge discovery methods, but not suitable for multisource knowledge discovery. 5. Combined Data Mining [10] Multiple data sets, multiple features, multiple methods on demand, Pair pattern, and Cluster pattern. Not suitable for handling large data. 6. Decision Tree Learning [15] Converts original sample data sets into a group of unreal data sets, from where the original samples cannot be reconstructed without the entire group of unreal data sets. Centralized, Storage Complexity, and Privacy Loss. 7. Naïve Bayesian theory [9] Adding noise to classifier‟s parameters. Classification accuracy, but centralized.
  • 3. Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016) 206 The Five ‘Vs’ of Big Data The big data concept began to gather momentum from 2003 when the „Vs‟ of big data was proposed to give foundational structure to the phenomenon which gave rise to its current form and definition. According to the foundational definition in [10], big data concept has the following five mainstreams: volume, velocity, value, variety, and veracity. Volume: Information is gathered from different sources including social media, cell phones, machine- to-machine (M2M) sensors, credit cards, business transactions, photographs, and videos recordings. A vast amount of data is generated each day from these channels, which have become so large that storing and analyzing them would pose a major challenge, especially using our traditional database technology. According to Longbing C. (2012), facebook alone generates about 12 billion messages a day, and over 400 million new pictures are uploaded every twelve hours. Users‟ comments alone on issues of social importance are in millions. Opinions of product users generated by pressing the “Likes” button are in their trillions. Collection and analysis of such information have now become an engineering challenge. Velocity: This refers to the speed at which new data is being generated from various sources such as e- mails, twitter messages, video clips, and social media updates. Such information now comes in torrents from all over the world on a daily basis. The streaming data need to be processed and analyzed at the same speed and in a timely manner for it to be of value to business organizations and the general society. Results of data analysis should equally be transmitted instantaneously to various users. Credit card transactions, for instance, need to be checked in seconds for fraudulent activities. Trading systems need to analyze social media networks in seconds to obtain data for proper decisions to buy or sell shares. Variety: Variety refers to different types of data and the varied formats in which data are presented. Using the traditional database systems, information is stored as structured data mostly in numeric data format. But in today‟s society, we receive information mostly as unstructured text documents, email, video, audio, financial transactions, and so on. The society no longer makes use of only structured data arranged in columns of names, phone numbers, and addresses that fits nicely into relational database tables. According to [10], more than 80% of today‟s data is unstructured. Big data technology now provides new and innovative ways that permit simultaneous gathering and storage of both structured and unstructured data (Longbing C. (2012). Value: Data is only useful if value can be extracted from it. By value, we refer to the worth of the data. Business owners should not only embark on data gathering and analysis, but understand the costs and benefits of collecting and analyzing such data. The benefits to be derived from such information should exceed the cost of data gathering and analysis for it to be taken as valuable. Big data initiative also creates an understanding of costs and benefits. Veracity: Veracity refers to the trustworthiness of the data. That is, how accurate is the data that have been gathered from the various sources? Big data initiative tries to verify the reliability and authenticity of data such as abbreviations and typos from twitter posts and some internet contents. Big data technology can make comparisons that bring out the correct and qualitative data sets. There are new approaches that link, match, cleanse and transform data coming from various systems. Types of Big Data Fundamentally, there are only two types of big data which are usually generated from social media sites, streaming data from IT systems and connected devices, and data from government and open data sources. The two types of big data are structured and unstructured data. Structured data: Structured data are data in the form of words and numbers that are easily categorized in tabular format for easy storage and retrieval using relational database systems. Such data are usually generated from sources such as global positioning system (GPS) devices, smart phones and network sensors embedded in electronic devices. Unstructured data: Unstructured data are data items that not easily analyzed using traditional systems because of their inability to be maintained in a tabular format. It contains more complex data in the form of photos and other multimedia information, consumer comments on products and services, customer reviews of commercial websites, and so on (Rajneeshikumar P. & Uday A. (2016). Sources of Big Data Figure 1. Sources of Big data Log data Social media Transacti ons Events Images Audio Video E-mails
  • 4. Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016) 207 An Interpretation of the HACE Theorem The HACE theorem is a widely accepted theorem that fully describes the big data characteristics and features. Basically, Big Data has the characteristics of being heterogeneous, large volume, autonomous sources with distributed and decentralized control, and a complex and evolving relationships among data. These characteristics pose enormous challenge in determining useful knowledge and information from the big data. The native myth that tells the story of blind men trying to size up an elephant was adopted in HACE theorem. The big elephant in that story represents the Big Data in our context. Each blind man is to draw useful conclusion regarding the elephant (of course which will depend on the part of the animal he touched). Since the knowledge extracted from the experiment with the blind men will be according to the part of the information he collected, it is expected that the blind men will each conclude independently and differently that the elephant “feels” like a rope, a stone, a stick, a wall, a hose, and so on. To make the problem even more complex, assume that: i. The size of the elephant is not static, but increasing at a very fast rate, with a constantly changing posture. ii. The information sources for each blind man is different, and possibly inaccurate and unreliable which give him varying knowledge about what the elephant looks like (example, one blind man may share his own inaccurate view of the elephant with his friend), and this information sharing will definitely make changes in the thinking of each blind man. Big data gathering and analysis is equivalent to the scenario illustrated above. It will involve merging or integrating heterogeneous information from different sources (just like the blind men) to arrive at the best possible and accurate knowledge regarding the information domain. This will certainly not be as easy as enquiring from each blind man about the elephant or drawing one single picture of the elephant from a joint opinion of the blind men. The difficulty stems from the fact that each data source may express a different language, and may even have confidentiality concerns about the message they measured based on their country‟s information exchange procedure. HACE theorem therefore presents the key characteristics of Big Data to include: Huge With Heterogeneous Data Sources Big data is heterogeneous because different data collectors make use of their own big data protocols and schema for knowledge recording. Therefore, the nature of information gathered even from the same sources will vary based on the application and procedure of collection. This will end up in diversities of knowledge representation. Autonomous Sources and Decentralized Access Control to Data With big data, each data source is distinct and autonomous with a distributed and decentralized control. Therefore in big data, there is no centralized control in the generation and collection of information. This setting is similar to the World Wide Web (WWW) where the function of each web server does not depend on the others. Complex Data Relationships and Evolving Knowledge Associations Generally, analyzing information using centralized information systems aims at discovering certain features that best represent every observation. That is, each object is treated as an independent entity without considering any other social connection with other objects within the domain or outside. Meanwhile, relationships and correlation are the most important factors of the human society. In our dynamic society, individuals must be represented alongside their social ties and connections which also evolve depending on certain temporal, spatial, and other factors. Example, the relationship between two or more facebook friends represents a complex relationship because new friends are added every day. To maintain the relationship among these friends will therefore pose a huge challenge for developers. Other examples of complex data types are time-series information, maps, videos, and images. RESEARCH METHODOLOGY In this study we will adopt the Data mining technique and the HACE theorem which has been proved to characterize the unique features of the big data. Implementing the HACE theorem using recent advances in data mining technologies will provide a unique framework of big data processing for a profitable data analysis in business and the society at large. Data Mining Technique The Hadoop‟s Map Reduce technique will be used for big data mining, while the k-means and Naïve Bayes algorithm will be used for clustering and dataset classification. We shall consider the suggestions of Kalaivani K. (2015) that observed the need to revisit most of the data mining techniques in use today and proposed distributed versions of the various data mining methods available due to the new challenges of big data.
  • 5. Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016) 208 Implementation of the HACE Theorem in Data Mining HACE theorem models the detailed characteristics of the Big Data which include: Huge With Heterogeneous and Diverse Data Sources that represents diversities of knowledge representation, Autonomous Sources and Decentralized Control similar to the World Wide Web (WWW) where the function of each web server does not depend on the other servers, and Complex Data Relationships and Evolving Knowledge Associations, which therefore suggests that each object is treated as an independent entity with consideration on their social connection with other objects within the same domain or outside. A popular open source implementation of the HACE theorem is Apache Hadoop, which is equally recommended in this research paper. Hadoop has the capacity to link a number of relevant, disparate datasets for analysis in order to reveal new patterns, trends and insights, which is the most important value of big data. Critical Challenges of Big Data Processing Systems Big data systems face a number of challenges. The amount of data generated is already very large and increasing daily. The speed of data generation and growth is also increasing, which is partly driven by the proliferation of internet connected devices. The variety of data being generated is also on the increase, and current technology, architecture, management and methods of analysis are now unable to cope with the flood. Privacy Concerns for Data Security in Various Countries Most Governments around the world are committed to protecting the privacy rights of their citizens. Many countries including Australia have passed the Privacy Act, which sets clear boundaries for usage of personal information. Government agencies, when collecting or managing citizens‟ data, are subject to a range of legislative controls, and must comply with a number of acts and regulations such as the Freedom of Information Act (in the case of Nigeria), the Archives Act, the Telecommunications Act, the Electronic Transactions Act, and the Intelligence Services Act. These legislative instruments are designed to maintain public confidence in the government as a secure repository and steward of citizen information. The use of big data by government agencies will therefore add an additional layer of complexity in the management of information security risks. Information Sharing Every economy thrives by information, and no society can survive without access to relevant information. Government agencies must strive to provide access to information whilst still adhering to privacy laws. Apart from making data available, data must also be accurate, complete and timely for it to support complex analysis and decision making. Qualitative data will save costs, enhance business intelligence, and improve productivity. The current trend towards open data is highly appreciated since its focus is on making information available to the public, but in managing big data, government must look for ways to standardize data access across her agencies in such a way that collaboration will only be to the extent made possible by privacy laws. Technological Initiatives Government agencies can only manage the new requirements of big data efficiently through the adoption of new technologies. If big data analytics is carried out upon current ICT systems, the benefits of data archiving, analysis and use will be lost. The emergence of big data and the potential to undertake complex analysis of very large data sets is, essentially, a consequence of recent advances in technology. MODEL FORMULATION AND DISCUSSIONS Proposing a New Big Data Mining Framework The following is a High Level Model which shows a conceptual framework for big data mining. It will follow a three-tier architecture as shown in figure 2. Figure 2. A Three-tier architecture for big data mining (Source: Otuonye A., et al, 2018). Big data Tier 1 Big data mining Platform Using parallel programming models such as Hadoop‟s MapReduce technique Tier 2 Big data semantics and domain knowledge for the various Applications Integrate various domain applications, information sharing and privacy mechanisms Tier 3 Big data mining Algorithm To adjust parameters to feedback based on global knowledge and local sharing
  • 6. Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016) 209 Interpretation of Major Elements in the Framework Tier 1: Tier 1 accessing big datasets and performs arithmetic operations on them. Big data cannot practically be stored in a single location. Storage in diverse locations will also increase. Therefore, effective computing platform will have to be in place to take up the distributed large scale datasets and perform arithmetic and logical operations on them. In order to achieve such common operations in a distributed computing environment, parallel computing architecture must be employed. The major challenge at Tier 1 is that a single personal computer cannot possibly handle the big data mining because of the large quantity of data involved. To overcome this challenge at Tier 1, the concept of data distribution has to be used. For processing of big data in a distributed environment, we propose the adoption of such parallel programming models like the Hadoop‟s Map Reduce technique (Che, D. et al, 2013). Tier 2: Tier 2 processes the semantics of big data from the various Big Data applications. Such information will be of benefit to the data mining process taking place at Tier 1 and to the data mining algorithms at Tier 3 by adding certain technical barriers and checks and balances and data privacy mechanisms to the process. Addition of technical barriers is necessary because information sharing and data privacy mechanisms between data producers and data consumers can be different for various domain applications (EYGM LIMITED, (2014). Tier 3: Algorithm Designs take place at Tier 3. Big data mining algorithms will help in tackling the difficulties raised by the Big Data volumes, complexity, dynamic data characteristics and distributed data. The algorithm at Tier 3 will contain three iterative stages: The first iteration is a pre- processing of all uncertain, sparse, heterogeneous, and multisource data. The second is the mining of dynamic and complex data after the pre-processing operation. Thirdly the global knowledge received by local learning matched with all relevant information is feedback into the pre-processing stage, while the model and parameters are adjusted according to the feedback. General Design Architecture The general design architecture of the proposed data mining system is shown in figure 3, which shows the various activities involved in the mining process. Figure 3. Proposed General Design Architecture for Big Data Mining Server MapReduce program Admin Big data User Interface Hadoop system K-means Algorithm Naïve Bayes Algorithm Clustering and classification results Store / Fetch Interrelated data Load data Map() Reduce() Output Data
  • 7. Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016) 210 Information Flow The flowchart of the proposed system is presented in figure 4. DISCUSSIONS Figure 4.2 shows the proposed big data processing framework for profitable big data processing and analysis. The model shows all the phases of the big data mining concept. The major elements of the system include: Big data sources, Admin, User interface, the Hadoop‟s MapReduce system, and the Hadoop‟s K-means and Naïve Bayes algorithm. Admin The Admin will be responsible for querying the system based on request. He will interact directly with the graphical user interface to supply his needs, while the query will be processed by the Hadoop System. Hadoop’s MapReduce Module The MapReduce program is a programming model used to separate datasets and subsequently sent as input into independent subsets. It contains the Map() procedure which performs filtering and sorting on datasets. It also contains the Reduce() procedure that carries out a summary operation to produce output. K-means and Naïve Bayes Algorithm After the MapReduce operation, the output is transferred to the K-means or Naive Bayes algorithm to carry out clustering and classification of the datasets through an iterative procedure. The result is stored in separate data stores for easy access. CONCLUSION AND RECOMMENDATIONS Conclusion In this paper, we have made a new Big Data Processing Framework from the data mining perspective, which deployed the HACE theorem for a full description of the Big Data characteristics. The challenges of big data are broad in terms of data access, storage, searching, sharing, and transfer, and these have been considered in this paper. The paper also proposed a three-tier big data mining architecture for a more profitable storage and analysis of data. Recommendations The benefits of the big data revolution are enormous and have the capacity to enhance economic activities and advance government‟s efforts and agencies around the world. There are available big data Available Not available Login User Authentication Perform query Record Availability Set constraints Mining Load data Fetch record Security and privacy checks Store Logout No Yes Available Not available Figure 4.. Flowchart of the proposed system
  • 8. Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 10(5):204-211 (ISSN: 2141-7016) 211 technologies with affordable, open source, and distributed platforms (such as the Hadoop), and relatively easy to deploy. This is highly recommended in this paper. There is need for full implementation and deployment of relevant advancements in data storage and data mining technologies to ensure proper preservation and analysis of big data. There is need to harness the potential benefits of the big data revolution and enhance effective management and processing of large-scale data. REFERENCES Chernyak, L. (2011). Big Data - the new theory and practice. Open the system. DBMS. – M, Open Systems, № 10. ISSN 1028-7493. Che, D., Safran, M. & Peng, Z. (2013). From Big Data to Big Data Mining: challenges, issues, and opportunities, Database Systems for Advanced Applications, pp. 1–15, Springer, Berlin, Germany. Gillick, D., Faria, A. & DeNero J. (2006). MapReduce: Distributed Computing for Machine Learning, Berkley, Dec. Schadt, E. (2012). The Changing Privacy Landscape in the Era of Big Data, Molecular Systems ,vol. 8, article 612. EYGM Limited, (2014). Big data: changing the way businesses compete and operate, Retrieved from (www.ey.com/GRCinsignts). IBM What Is Big Data: Bring Big Data to the Enterprise, Retrieved from http://www- 01.ibm.com/software/data/bigdata/, IBM, 2012. Mervis, J. (2012). Agencies rally to tackle big data, Science, vol. 336, no. 6077, p. 22. Jenn C. (2014). The V's of Big Data: Velocity, Volume, Value, Variety, and Veracity, March 11, Retrieved from (https://www.xsnet.com/blog/bid/205405/ ). Kalaivani.K. & Amutha, M. (2015). Analysis of Big Data with Data mining using HACE theorem, Journal of Recent Research in Engineering and Technology ISSN (Online): 2349 –2252, ISSN (Print):2349 –2260 Volume 2 Issue 4. Lo, B. & Velastin, S. (2009). Parallel Algorithms for Mining Large-Scale Rich-Media Data, Proc. 17th ACM Intl Conf. Multimedia, (MM09,), pp. 917-918,2009. Longbing, C. (2012). Combined Mining: Analyzing Object and Pattern Relations for Discovering Actionable Complex Patterns, Sponsored by Australian Research Council discovery Grants. Otuonye, A., Udunwa, A. & Nwokonkwo, O. (2018). A Model Design of Big Data Processing System Using HACE Theorem, International Journal of Electronics Communication and Computer Engineering,vol.9, issue 1. Otuonye, A., Nwokonkwo, O. & Eke, F. (2018). Overcoming Big Data Mining Challenges for Revolutionary Breakthroughs in Commerce and Industry, International Journal of Electronics Communication and Computer Engineering, vol. 9, issue 1. Rajneeshkumar, P. & Uday, A. (2016). Data Mining and Information Security in Big Data Using HACE Theorem, International Research Journal of Engineering and Technology. , volume 03, Issue 06.. Rajneeshkumar, P. & Uday, A. (2016). Survey on data mining and Information security in Big Data using HACE theorem, International Engineering Research Journal (IERJ), vol. 1, issue 11 , 2016, ISSN 2395 -1621 Velastin, S. & Lo, B. (2009). Parallel Algorithms for Mining Large- Scale Rich-Media Data, Proc. 17th ACM Intl Conf. Multimedia, (MM 09,),pp.917-918,2009. Wu, X., Zhu, X., Wu, G., & Ding, W. (2014). Data Mining with Big Data, IEEE Trans. On Knowledge and data engineering, vol. 26, no. 1, pp. 97-107. Weber-Jahnke, J. & Fong, P. (2012). Privacy preserving decision tree learning using unrealized data sets, IEEE Trans. Knowl. Data Eng., vol. 24, no. 2, pp.353_364. Worldometers (2014). Real time world statistics, Retrieved from 2zhttp://www.worldometers.info/world- population/.