Big data analytics and deep learning are two of data science's most promising areas of convergence. The importance of Big Data has grown recently as several organizations, both public and commercial, have been amassing large amounts of region-specific data that may provide useful information on topics like as national information, advanced security, blackmail area, development, and prosperity informatics. For Big Data Analytics, where data is often unstructured and unlabeled, Deep Learning's ability to analyze and learn from large amounts of data on its own is a crucial feature. In this review, we look at how Deep Learning can be used to solve some of the most pressing problems in Big Data Analytics, including model isolation from large data sets, semantic querying, data marking, smart data recovery, and the automation of discriminative tasks.
Understand the Idea of Big Data and in Present Scenario
1. International Journal of Engineering, Business and Management (IJEBM)
ISSN: 2456-7817
[Vol-7, Issue-4, Jul-Aug, 2023]
Issue DOI: https://dx.doi.org/10.22161/ijebm.7.4
Article DOI: https://dx.doi.org/10.22161/ijebm.7.4.10
Int. j. eng. bus. manag.
www.aipublications.com Page | 63
Understand the Idea of Big Data and in Present Scenario
Chirag Kumar Dilipbhai Patel1
, Dr. Prasadu Peddi2
1
Research Scholar, Sunrise University, Alwar, Rajasthan, India
2
Professor, Sunrise University, Alwar, Rajasthan, India
Received: 07 Jul 2023; Received in revised form: 06 Aug 2023; Accepted: 15 Aug 2023; Available online: 25 Aug 2023
Š2023 The Author(s). Published by AI Publications. This is an open access article under the CC BY license
(https://creativecommons.org/licenses/by/4.0/)
Abstractâ Big data analytics and deep learning are two of data science's most promising areas of
convergence. The importance of Big Data has grown recently as several organizations, both public and
commercial, have been amassing large amounts of region-specific data that may provide useful information
on topics like as national information, advanced security, blackmail area, development, and prosperity
informatics. For Big Data Analytics, where data is often unstructured and unlabeled, Deep Learning's ability
to analyze and learn from large amounts of data on its own is a crucial feature. In this review, we look at
how Deep Learning can be used to solve some of the most pressing problems in Big Data Analytics, including
model isolation from large data sets, semantic querying, data marking, smart data recovery, and the
automation of discriminative tasks.
KeywordsâBig Data, Present scenario, Google Trends, Artificial Intelligency.
I. INTRODUCTION
Large datasets are rapidly becoming the norm. Gathering,
analyzing, and benefiting from this information seems to be
happening at the individual level. Big data are the case,
whether we're discussing analyzing billions of Google
search queries to assume trillions, analyzing flu outbreaks
in wireless networks to spot signs of fear-based oppressor
movement, or sifting through countless airplane data points
to determine the best time to buy plane tickets. It claims it
can solve almost any problem, including crime, public
health, technological advancement, and grammatical
reform, by combining the power of modern processing with
the massive data of the advanced age.
According to Google Trends, the rise in the number of
searches that include the term "big data" began in 2011, and
has peaked this summer. Although the term "big data" may
seem as if it's just associated with the field of data science,
it really assumes an exceptional job of human services
research, including emergency medication. The size of a
DVD would have been considered substantial just a decade
ago; today, however, it is standard. In addition, the
information received determines the nature of the data's
components. Broadband Internet speeds in excess of 100M
are already commonplace in modern life, just as the speed
of a 56K modem was once immediately ascertained. Big
data also has a far more progressive and reasonable
significance. If a researcher's data represents the population
as a whole, as suggested by this definition, rather than just
a subset of it, then they may benefit from using big data. On
the other hand, large data use is impractical since legacy
processing architectures aren't robust enough to handle it.
The Internet is the primary driver behind the deluge of new
information that has emerged in recent decades. Too huge,
too fast, and too unstructured to fit into the schemas of the
databases we've shown. It's like a bottomless pit of data
where we dump everything with a steady stream of pushes
to come, and every day the data keeps growing. Memory
sizes are now measured in exa bytes, zetta bytes, or even
yotta bytes, rather than the gigabytes, pets, or terabytes that
were commonplace in the past. When organizations use Big
Data solutions, they can dive deep into data sets and extract
insights that were previously unavailable to them. Big data
is an idiom that often goes ill-defined, much to how "cloud
computing" may refer to either a specific technology or a
broad category. Putting big data to work necessitates a shift
away from the rigidity of traditional data storage methods
and toward a more fluid, adaptable, and public model.
2. Patel and Peddi/ Understand the Idea of Big Data and in Present Scenario
Int. j. eng. bus. manag.
www.aipublications.com Page | 64
II. LITERATURE REVIEW
Stefan Strau (2018) Incredibly exciting progress is being
made in the field of artificial intelligence (AI), particularly
in the subfield known as machine learning (ML). New
methods of deep learning hold great promise for advancing
the field of artificial intelligence as it pertains to the
enhancement of human potential. But what exactly are the
wider societal repercussions of this particular breakthrough,
and to what degree are very old type AI notions still
applicable? These topics are discussed in this article, which
also serves as a useful reminder of the fundamental concepts
of AI and big data. The jobs, societal consequences, and
security risks associated with deep learning and automated
systems are under serious scrutiny. The paper argues that the
growing importance ofAI in the open field poses real threats
of profound robotization inclination supported by
inadequate AI quality, lacking shared dangers, and
algorithmic responsibility of distortion up to gradually
exacerbating clashes in decision making among gadgets and
people. Defeating ideological delusions of AI and reviving
a lifestyle of reliable, real innovation creation and
application are required to lessen these risks and forestall
the establishment of an intelligential cloud. This includes
the need for a more in-depth discussion of the potential for
bolstering amicable administration tactics and
computerization to implement AI development with respect
to cultural and individual prosperity.
Vargas et al., (2017) Artificial intelligence (ML) research
into deep learning is on the rise. It has several artificial
neural network frameworks hidden between its many
layers. Nonlinear shifts, like higher-level unit impressions,
are used by the substantial learning framework for large
datasets. Recent advances in large-scale learning models
across disciplines have endowed AI with important new
tasks. This research provides an innovative analysis of such
initiatives as well as unique scholarly undertakings. The
subsequent evaluation systematically provides one route, as
well as the main applications of deep realizing
computations. Furthermore, as contrasted to the much more
common computations in regular jobs, the benefits and
popularity of the deep learning system, as well as the
hierarchy of its in-layers and nonlinear activities, are clearly
shown. The top-tier evaluation also provides a crucial
presentation of the original concept and its ever-growing
benefits and widespread recognition of deep knowledge.
Zhong et al. (2016) showcased healthcare applications for
big data, and shown how huge data may be integrated into
daily life to enable the study of healthcare and sickness
interactions. Thus, big data analytics have a large impact on
the healthcare industry, helping to lower operating costs and
improve people's quality of life.
Genge et al. (2017) have proposed a smart grid
environment based anomalous behavior detection approach
using massive data. To classify assets, the proposed system
employs a cyber-attack collision assessment method. The
recommended big data based abnormal behavior detection
approach also makes use of the Gaussian clustering
methodology to classify the assets, reduce the number of
monitored components, and provide an efficient anomaly
detection function. With the IEEE 14 transport power
network model as inspiration, we developed the
foundational data-based anomalous conduct detection
technique. All three types of attacksâline breaker, bus
error, and reliabilityâare neutralized by the proposed
design. Job-specific tradeoffs between generalizability,
efficiency, and performance are inevitable.
Hordri et al., (2016) Over the last several years, Deep
Learning has been more popular. As a result of deep
learning's ability to discover large amounts of unlabeled
data and its efforts to improve analysis, it has been used to
a wide variety of settings. As such, this article gives a
review of deep learning and its applications throughout the
years with the hopes of providing useful references for other
researchers looking to incorporate deep learning into their
own work. Seven different areas where deep learning has
been put to use have been discussed so far, including
automatic voice recognition, image recognition, natural
language preparation, tranquilize revelation and toxicology,
client relationship management, proposition approaches,
and bioinformatics. In each case, we discuss the study's
findings and highlight the places where further research is
needed.
III. CORES OF BIG DATA
Data Volume- A huge amount of information is generated
every single second, minute, and hour. A single minute on
the internet sees the creation of 571 unique locations and the
transfer of 625,000 GB of information from one end to the
other, and that's just in the form of data (emails, photos, blog
entries, etc.). If we were to copy all the information
currently stored on Earth onto DVDs and stack them on top
of each other, the resulting weight would be so great that a
human being could climb it, reach the moon, and then return
to Earth to do it all over again.
Data Velocity- The pace at which data is being produced is
making it difficult for many organizations to keep up. They
need to construct their framework such that it can
effectively handle the resulting flood of data.
Data Value-There is a significant communication gap
between business executives and IT professionals. In most
cases, the primary concern of company executives is to
3. Patel and Peddi/ Understand the Idea of Big Data and in Present Scenario
Int. j. eng. bus. manag.
www.aipublications.com Page | 65
increase their own matter and benefit. On the other hand, IT
staff provide methods for managing and arranging data.
Data Complexity- The greatest challenge in executing big
data utilizing social databases is that they need parallel
applications to execute on a huge number of servers, and
data researchers must coordinate and transform data across
systems from a wide variety of sources.
Data Veracity - The term "veracity" refers to the accuracy
and reliability of information. In most cases, you can't trust
the information you find online. If a man creates a Facebook
profile in which he presents as a woman, no one will
question whether or not he is really doing so.
IV. BIG DATA CHARACTERISTICS
The study of large datasets has received much media
attention recently, and for good reason. In order to take part
in this shift, you'll need to learn more about big data
analysis. This innovation in testing is a motivating facet of
the rising field of big data analysis. Companies are advised
to finally have access to and analyze the mountains of data
they've been amassing but haven't been able to effectively
manage. It may include visualizing massive amounts of
disparate data, or it could have mainstream analysis
streaming at you in real time. It's a step forward and an
advancement.
There exist two views of big data:
⢠Decision-oriented
⢠Action-oriented
Evaluation with an eye toward making a decision is
strikingly similar to common business acumen. Explore
both specific data sets and summaries of larger data sources,
and use what you learn to inform your decision-making
processes. Of course, the outcomes of this evaluation might
lead to a shift in the kinds of activities pursued or the
underlying structures used, but the underlying goal is to
encourage more deliberate choice.
Assessments that are more action-oriented are utilized in
situations when a prompt response is necessary, such as
when a new situation arises or when certain sorts of data are
seen. There is a spectacular possibility for early adopters to
use big data for their own benefit via research and
generating receptive or proactive direct modifications. The
key to eliminating value quickly and effectively may lie in
utilizing and discovering big data via the use of evaluation
algorithms. Building these specialized projects from scratch
or with the help of already frameworks and modules is the
most efficient way to complete this process.
Look beyond the "three Vs" of variety, velocity, and volume
to see how big data analysis differs from more traditional
forms of research. This process might be automated. One of
the most significant shifts in assessment is the availability
of large data sets that can be easily imported into a program
for analysis. Big data analysis may provide challenges, such
as having to work with raw data that often requires linguistic
expert oversight in order to do any kind of analysis at all.
Without much of a stretch, it can be based on empirical
evidence. While many data scientists adopt a hypothesis-
driven approach to data analysis (develop a reason and
collect data to see if that reason is really) that is correct, you
can also use the data to work the assessment, particularly if
you've gathered a lot of it. For instance, you may utilize an
AI calculation to do such a study without the need for
guesswork.
Numerous skills would be useful. You may have been
dealing with a huge list of properties or traits of that data
asset in the past. Now you have the option of dealing with
terabytes of data, which might include countless numbers
and grades of information. At the moment, everything is
magnified.
An iterative model fits without much of a stretch. More
register strength undoubtedly means you can keep working
on your reproductions till you purchase them the way you
want them. This one right here is the real deal; take note.
You've accepted the fact that you're formulating a strategy
in which you're looking for signs of direct purchaser
activity. You may start by extracting some representative
sample data or at least identifying yourself with the actual
location of the data itself. To verify an assumption, you may
create a unit.
You don't need a lot of particular mind power to get your
unit working efficiently in the past, but you will need a lot
of physical character to go through all the emphases you
need to in order to get the math right. Common
computational approaches, such as natural language
processing or neural networks, that rapidly construct the
learning-based unit, may also be required.
Getting the computing power you need may be a breeze
with a cloud-based framework as a Service. You may
rapidly execute a large number of models to ingest massive
data sets and promptly evaluate them using Infrastructure as
a Service (IaaS) setups like Amazon Cloud Services (ACS).
V. PHASES INVOLVED IN BIG DATA
Big data processing involves 5 distinct phases: -
DataAcquisition and Recording- There is no question that
big data stem from humble beginnings. It is not constructed
from nothing. Petabytes of data are constantly produced by
the several coherent exams that are now being
acknowledged all over the world. Much of the information
4. Patel and Peddi/ Understand the Idea of Big Data and in Present Scenario
Int. j. eng. bus. manag.
www.aipublications.com Page | 66
is useless and must be eliminated. The most pressing
problem is, without a doubt, distinguishing subtleties that
are neither so positive nor so negative that important data is
lost. Accept, for the sake of argument, that one sensor's
readings diverge widely from the others; this can always be
the direct result of the sensor being damaged, but how can
we be sure it isn't an artifact that deserves special
consideration? Research that can finely process this tough
data to a size that its clients can unquestionably supervise
while not missing the needle in the heap is desperately
needed. The problem that follows is directly related to the
urgency with which correct metadata must be sent to
demonstrate the kind of obtained data, the method used to
estimate and record it, and so on. To effectively interpret the
results of a reasonable M test, some background
information regarding potential confounding variables and
measurements may be necessary; collecting this metadata
with observational data is crucial.
Information Extraction and Cleaning- It is made clear
that the gathered information does not follow a systematic,
assessment-oriented format. Combining the decoded
interpretations of various authorities, the sorted data from
estimates and receptors, and the image data, such as x-
pillars, is an example of the electronic prosperity data of a
crisis facility. It's impossible to get useful insights from the
data collected along that path. For this sort of data, a data
extraction technique is required to extract the necessary data
from the sources that are acceptable and show it in a
composed course of action perfect for evaluation. The
stakes couldn't be higher on this exam. This information
may combine video clips and pictures in the same way, and
the processes involved in doing so are quite amenable to
automation.
Data Integration, Aggregation, and Representation- It's
not enough to just gather information and dump it into a
database. If we store massive data sets remotely, it will be
very challenging for customers to get the right information
when it's needed. However, with a significant amount of
information, there is some hope in any case difficulties
continues because of differences in initial nuances and in
data report structure. Finding, seeing, understanding, and
referring to data are just the beginning of what is involved
in data mentioning. For a really optimal large-scale review,
every step of this process should be automated. An
acceptable database structure is fundamental. A set of
options for storing information is provided. Some buildings
will be far superior than others in some respects, albeit this
may come at the expense of other professions. One can
conclude that data source design is a creative endeavor best
left to trained professionals.
Query Processing, Data Modeling, and Analysis
Techniques for querying as well as mining- Big data isn't,
unsure, and changing sorted out; that much is clear. In fact,
even the next-generation of huge data is quite valuable
compared to limited explicit statements, since the
fundamental insights gleaned from vast examples will
generally be more precise. In order to mine effectively,
you'll need information that is both spotless and readily
accessible. The infrastructure for in-depth research and
mining interfaces must be in place. Having a system in place
for accurate mining estimates and area registration is also
crucial.
Interpretation- If customers can't grasp the concept behind
the analysis, there's no use in continuing with the big data
analysis. The investigation's findings are presented to the
decision-maker, who should find them predictable. Efforts
are included in this definition. Examining all of the
assumptions established and recalling the research is a
crucial part of this process. There are several potential
sources of errors, since the system would provide results
and defects might be based on faulty information. No sane
user would cede control to a computer system in such a
simple manner. Instead, they'll attempt to verify the
accuracy of the computer-generated findings. All of this has
to be made simple by a computer system, which is a huge
challenge given the depth of big data.
VI. APPLICATIONS OF BIG DATA
Government Using Big Data to complete government
projects is beneficial and may lead to savings as well as
increased creativity and output. Researching data requires
cooperation across several departments, but the end result is
worth the effort.
Manufacturing
Big data is most useful when it's put to use to improve
processes like planning and manufacturing. The ability to
separate variables like unequal component performance and
the comfort analytical built while making progress toward
near-zero downtime and accuracy is made possible by the
infrastructure provided by big data. This requires a large
amount of data and highly developed prediction equipment
used for a systematic procedure of data into useful
information.
Healthcare
Improved healthcare is enabled by big data analytics
through the following channels: customized medication and
business analytics; clinical risk interference and organized
analytics; waste and treatment inconsistency reduction;
internal coverage and instant external patient data;
standardized health-related terms and long-term point and
5. Patel and Peddi/ Understand the Idea of Big Data and in Present Scenario
Int. j. eng. bus. manag.
www.aipublications.com Page | 67
patient registries remedy; and immediate external coverage
and delayed internal patient data.
Sports
Big data may help with curriculum development and
competition analysis. It's also possible to forecast who'll
win a match and how well each player will perform. Thus,
information gathered during the season will aid in
estimating players' worth and earnings. Formula One races
generate gigabytes of data because to the vehicles' extensive
receptor selections. From tire pressure to combustion
efficiency, these sensors have you covered. The data is
subsequently sent to the appropriate people using fiber optic
lines, which can transmit information at the speed of light.
Engineers and data analysts work together to determine the
best course of action based on the available information. In
addition, teams use big data to try to anticipate the moment
at which they will win the race, based on simulations using
data collected over time.
The power of big data comes from a variety of sources.
Traditional information systems are just one source of big
data; other sources include social media, the cloud,
software, community influencers, the public internet,
networking technologies, legacy documents, business
applications, weather data, and sensor data. There aren't a
lot of resources specified here-
A. Transactional data
Combining transactional data with statistical tools like
regression analysis and decision trees may help define a unit
to predict outcomes like sales forecasts and the success rate
of new product launches. The component can learn from
past data and make accurate predictions. Statistical software
like SAS would make it simple to build such models. In
other words, SPSS The term "Transactional Processing
System" is used to describe a system whose primary
function is to record and process data involving a series of
independent events. Capturing information and improving
data for operational decisions are Transaction Processing
System's major functions. Transactions may be processed in
two distinct ways. Both Real Time Processing System, in
which data are really produced in real time, and batch
processing, in which the data are processed as a single
device in a short time. Both methods may help a company
make better decisions about its daily operations.
B. Social media data
The explosion in popularity of social media over the last
several years has resulted in a global data collection effort.
Actual time of occurrence of events is being reported.
Internet users are happy to share their thoughts about recent
movies, TV shows, and services within minutes using
messaging apps like Facebook, WhatsApp, and Twitter.
This is a unique chance for policymakers to amass
promotional information. With the use of social media,
consumers may research a product's reviews, additional
services, and customer complaints before making a
purchase decision. Customers' feelings are shared via social
media, which helps firms make generational decisions. In
addition to gathering market knowledge, businesses may
utilize social media analytics to gather competitive
intelligence about the products and services offered by rival
companies. In turn, this promotes the development of novel
approaches to boosting the performance of businesses
operating online.
C. Internet Applications
As the internet has developed, a greater number of people
are simultaneously using it and creating enormous
quantities of click streams, web searches, and other online
activity. Large numbers of people login onto and utilize
various online services every day, including e-commerce
platforms (like Amazon, Flipkart, Alibaba, eBay, Paytm,
bookmyshow.com etc.), search engines (like Google,
Yahoo, Bing etc.), and online banking apps. Their searches
and purchases generate click channels and records that may
be analyzed for insights.
D. Data from electronic instruments
Electronic media such as smartphones, RFID tags, GPS
Sensors, networked models, scanners, and cameras all
contribute to the generation of massive data sets. Additional
sources of massive amounts of data.
VII. CONCLUSION
Big Data fits in with the everyday world of problems and
approaches for application spaces that will gather and store
large amounts of raw data for precise evaluation of
geographic areas. Modern data genuine systems, together
with rethought computational and data collecting resources,
have made significant contributions to Big Data science's
development. Companies focused on technological
advancement, such as Amazon, Microsoft, Yahoo!, or
Google, have amassed and stored data measured in the
exabyte range or more. Social media platforms with a large
user base, such as Facebook, YouTube, and Twitter,
generate copious amounts of data on a regular basis. It has
become a standard topic of data science inquiry since many
companies have invested in developing assets using Big
Data Analytics to care for other information, simulations,
data examination, experimentation, and their checking, as
well as commercial demands.
6. Patel and Peddi/ Understand the Idea of Big Data and in Present Scenario
Int. j. eng. bus. manag.
www.aipublications.com Page | 68
REFERENCES
[1] Stefan Strau (2018) âFrom Big Data to Deep Learning: A
Leap Towards Strong AI or âIntelligentia Obscuraâ?â Big
Data Cogn. Comput. 2018, 2, 16; doi:10.3390/bdcc2030016
[2] Vargas, Rocio & Mosavi, Amir & Ruiz, Ramon. (2017).
DEEP LEARNING: A REVIEW. Advances in Intelligent
Systems and Computing. 5.
[3] Genge, B., Haller, P. and Kiss, I. (2017), Big data processing
to detect abnormal behavior in smart grids, in âSmart Grid
Inspired Future Technologies: First International Conference,
Smart GIFT 2016, Liverpool, UK, May 19-20, 2016, Revised
Selected Papersâ, Springer, pp. 214â221.
[4] Hordri, Nur & Yuhaniz, Siti & Shamsuddin, Siti Mariyam.
(2016). Deep Learning and Its Applications: A Review.
[5] GarcĂa, S., RamĂrez-Gallego, S., Luengo, J., BenĂtez, J.M.
and Herrera, F., 2016. Big data preprocessing: methods and
prospects. Big Data Analytics Journal, p. 9.
[6] Zhong, R.Y., Newman, S.T., Huang, G.Q. and Lan, S., 2016.
Big Data for supply chain management in the service and
manufacturing sectors: Challenges, opportunities, and future
perspectives. Computers & Industrial Engineering journal,
pp. 572-591.
[7] Dittrich, J. and Quiane-Ruiz, J.-A. (2012), âEfficient big data
processing in hadoop ´ mapreduceâ, Proceedings of the
VLDB Endowment 5(12), 2014â2015.
[8] Chen, Y., Alspaugh, S. and Katz, R. (2012), âInteractive
analytical processing in big data systems: A cross-industry
study of mapreduce workloadsâ, Proceedings of the VLDB
Endowment 5(12), 1802â1813.
[9] Humphreys, G., Houston, M., Ng, R., Frank, R., Ahern, S.,
Kirchner, P. D. and Klosowski, J. T. (2002), âChromium: a
stream-processing framework for interactive rendering on
clustersâ, ACM transactions on graphics (TOG) 21(3), 693â
702.
[10] Boja, C., Pocovnicu, A. and Batagan, L. (2012), âDistributed
parallel architecture forâ big dataââ, Informatica Economica
16(2), 116.
[11] Maier, M., Serebrenik, A. and Vanderfeesten, I. (2013),
âTowards a big data reference architectureâ, University of
Eindhoven.
[12] Lv, Y., Duan, Y., Kang, W., Li, Z. and Wang, F.-Y. (2015),
âTraffic flow prediction with big data: a deep learning
approachâ, IEEE Transactions on Intelligent Transportation
Systems 16(2), 865â873.
[13] Mestyan, M., Yasseri, T. and Kert ´ esz, J. (2013), âEarly
prediction of movie box office ´ success based on wikipedia
activity big dataâ, PloS one 8(8), e71226
[14] Zhang, Y., Chen, M., Mao, S., Hu, L. and Leung, V. (2014),
âCap: Community activity prediction based on big data
analysisâ, Ieee Network 28(4), 52â57.
[15] Kim, Y. and Shim, K. (2012), Parallel top-k similarity join
algorithms using mapreduce, in âData Engineering (ICDE),
2012 IEEE 28th International Conference onâ, IEEE, pp.
510â521