Machine Learning On Big Data: Opportunities And Challenges- Future Research Direction For Phd Scholars - Phdassistance

MACHINE LEARNING ON BIG
DATA: OPPORTUNITIES AND
CHALLENGES - FUTURE RESEARCH
DIRECTION FOR PHD SCHOLARS
An Academic presentation by
Dr. Nancy Agnes, Head, Technical Operations, Phdassistance
Group www.phdassistance.com
Email: info@phdassistance.com

In-brief
Introduction
Machine learning
Big data
Data preprocessing opportunities and challenges
Evaluation opportunities and challenges
Future research
Conclusion
Outline
TODAY'SDISCUSSION

Machine Learning (ML) is rapidly used in a variety of applications. It has risen to
prominence in recent years, owing in part to the emergence of big data. When it comes
to big data, ML algorithms have never been more promising. Big data allows machine
learning algorithms to discover finer-grained patterns and make more timely and
precise predictions than ever before; however, it also poses significant challenges to
machine learning, such as model scalability and distributed computing.
In-Brief

In various fields as computer vision, speech recognition,
natural language comprehension, neuroscience, fitness,
and the Internet of Things, ML techniques have had
enormous societal impacts.
The emergence of the era of big data has stirred up interest
in Machine Learning Big Data has never promised or
questioned machine learning algorithms to gain new
insights into a variety of business applications and human
behaviours.
Contd...
INTRODUCTION

On the one hand, big data provides ML algorithms with unparalleled amounts of data
from which to derive underlying patterns and create predictive models; on the other
hand, conventional ML algorithms face crucial challenges such as scalability in order
to fully unlock the value of big data.
With the ever-expanding world of big data, ML must develop and grow in order to turn
big data into actionable intelligence.
Contd...

ML aims to answer the question of how to build a computer system that improves itself
over time.
The problem of learning from experience with respect to certain tasks and performance
metrics is referred to as an ML problem.
Users may use ML techniques to deduce underlying structure and make predictions from
large datasets.
Contd...

ML thrives on strong computational environments, efficient learning techniques
(algorithms), and rich and/or large data.
As a result, ML has a lot of potential and is an essential part of big data analytics

Fig. 1. A Framework of machine learning on big data
(MLBid)

Data pre-processing, learning, and assessment are
common stages of Machine Learning.
Data pre-processing aids in the transformation of raw
data into the "right form" for further learning steps.
Via data cleaning, extraction, transformation, and fusion,
the pre-processing phase transforms such data into a
form that can be used as inputs to learning.
Contd...
MACHINE
LEARNING

Using the pre-processed input data, the learning step selects learning algorithms and
tunes model parameters to produce desired outputs.
Data pre-processing can be done with some learning methods, especially
representational learning.
After that, the trained models are evaluated to see how well they do.
The essence of learning input, the goal of learning activities, and the timing of data
availability are all characteristics of machine learning.
Contd...

ML can be divided into three major categories based on the quality of the input available
to a learning system: supervised learning, unsupervised learning, and reinforcement
learning.
ML can be divided into two types: representational learning and task learning,
depending on whether the learning goal is to learn particular tasks using input features
or to learn the features themselves.
Each Machine Learning Algorithm can be classified in a variety of ways.

Fig. 2. A multi-dimensional taxonomy of machine
learning

Volume, velocity, variety, veracity, and value are the five
dimensions of big data.
Starting from the bottom, we organised the five dimensions into
a stack of high, data, and value layers.
The data layer is integral to big data, and the meaning factor
characterises the influence of big data real-world applications.
Contd...
BIGDATA

The lower layer is more reliant on technical advancements, while the higher layer is
more focused on applications that leverage big data's strategic strength.
Established machine learning paradigms and algorithms must be modified to
understand the potential of big data analytics and to process big data efficiently.
We recognise key opportunities and challenges in this section.
We go through them individually for each of the three phases of machine learning:
preprocessing, learning, and assessment.
Contd...

Data replication or inconsistency can have a
significant impact on machine learning.
Traditional methods such as pairwise similarity
comparison are no longer feasible for big data,
despite a variety of techniques for detecting
duplicates produced in the last 20 years.
Contd...
When two or more data samples represent the
same object, duplication occurs.
DATA REDUNDANCY
DATAPREPROCESSING
OPPORTUNITIES AND
CHALLENGES

Furthermore, the conventional presumption that duplicated pairs are rarer than
non-duplicated pairs is no longer true.
Dynamic Time Warping can be much faster than current Euclidean distance
algorithms in this regard
DATA HETEROGENEITY
Big data promises to include multi-view data from a variety of repositories, in a
variety of formats, and from a variety of population samples, and thus is highly
heterogeneous.
Contd...

The value of these multi-view heterogeneous data. As a result, combining all of
the characteristics and treating them equally relevant is unlikely to result in
optimal learning outcomes.
Big data offers the possibility of simultaneously learning from different views and
then assembling multiple findings by learning the relevance of feature views to
the task.
The approach is supposed to be resistant to data outliers and to be able to solve
optimization and convergence problems.
Contd...

DATA DISCRETIZATION
However, most current discretization
dealing with large amounts of data.
methods would be ineffective when
Traditional discretization approaches have been parallelized in big data platforms
to solve big data problems, with a distributed variant of the entropy minimization
discretizer based on the Minimum Description Length Principle improving both
efficiency and accuracy.
Contd...

DATA LABELLING
Active learning can be used as an optimization technique for marking activities
in crowd-sourced databases, reducing the number of questions posed to the
crowd and enabling crowd-sourced applications to scale.
Designing active Learning Algorithms for a crowd-sourced dataset, on the other
hand, presents a number of practical challenges, including generality, scalability,
and usability.
Another problem is that such a dataset cannot cover all user-specific contexts,
resulting in output that is often inferior to user-centric training.
Contd...

IMBALANCED DATA
Traditional stratified random sampling approaches have tackled the problem of
unbalanced data.
However, if iterations of sub-sample generation and error metrics measurement are
needed, the process can take a long time.
Furthermore, conventional sampling methods are unable to support data sampling
over a user-specified subset of data that includes value-based sampling efficiently.
Parallel data sampling is needed by big data.

This paper provides a summary of the benefits and
drawbacks of machine learning on big data.
Big data poses new possibilities for inspiring revolutionary
and novel ML technologies to solve many associated
technological problems and generate real-world impacts,
while also posing multiple challenges for conventional ML in
terms of scalability, adaptability, and usability.
Contd...
FUTURE
RESEARCH

These opportunities and challenges can be used to evaluate current research in
this field.
According to the components of the MLBiD system, we also highlight some open
Research issues in ML on big data, as shown in Table.

In conclusion, machine learning is needed to address the
challenges faced by big data and to discover hidden patterns,
information, and insights from big data in order to transform its
potential into real value for business decision-making and
scientific exploration.
The combination of machine learning and big data points to a
bright future in a modern frontier.
CONCLUSION

Contact Us
UNITED KINGDOM
+44-1143520021
INDIA
+91-4448137070
EMAIL
info@phdassistance.com

Machine Learning On Big Data: Opportunities And Challenges- Future Research Direction For Phd Scholars - Phdassistance

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine Learning On Big Data: Opportunities And Challenges- Future Research Direction For Phd Scholars - Phdassistance

Similar to Machine Learning On Big Data: Opportunities And Challenges- Future Research Direction For Phd Scholars - Phdassistance (20)

More from PhD Assistance

More from PhD Assistance (20)

Recently uploaded

Recently uploaded (20)

Machine Learning On Big Data: Opportunities And Challenges- Future Research Direction For Phd Scholars - Phdassistance