SlideShare a Scribd company logo
MACHINE LEARNING WITH
BIG DATA
CHALLENGES & APPROACHES
BY K.DAVID
DMCA-40
Introduction
The Big Data revolution promises to transform how we live, work, and think by
enabling process optimization, empowering insight discovery and improving decision
making.
Today, the amount of data is exploding at an unprecedented rate as a result of developments
in Web technologies, social media, and mobile and sensing devices. For example, Twitter
processes over 70M tweets per day, thereby generating over 8TB daily.
The ability to extract value from Big Data depends on data analytics which consider
analytics to be the core of the Big Data revolution
Data analytics involves various approaches, technologies, and tools such as those from text
analytics, business intelligence, data visualization, and statistical analysis. The machine
learning (ML) as a fundamental component of data analytics. The ML will be one of the
main drivers of the Big Data revolution. The reason for this is its ability to learn from data
and provide data driven insights, decisions, and predictions.
ML Challenges Originating From Big Data
Definition
Big Data are often described by its dimensions, which are referred to as
its V’s.Earlier definitions of Big Data focussed on three V’s (volume,
velocity, and variety); however, a more commonly accepted definition
now relies upon the following four V’s : volume, velocity, variety, and
veracity.
It is important to note that other Vs can also be found in the
literature. For example, value is often added as a 5th V. However, value
is defined as the desired outcome of Big Data processing and not as
defining characteristics of Big Data itself
This section identifies machine learning challenges and associates each
challenge with a specific dimension of Big Data.
Charecteristics associated with challengs
VOLUME:
The first and the most talked about characteristic of Big Data
is volume: it is the amount, size, and scale of the data.
In the machine learning context, size can be defined either
Vertically by the number of records or samples in a dataset or
horizontally by the number of features or attributes it contains.
1) Processing Performance:-
One of the main challenges encountered in computations with
Big Data comes from the simple principle that scale, or
volume, adds computational complexity.
Consequently, as the scale becomes large, even trivial
operations can become costly.
volume
For example, the standard support vector machine (SVM)
algorithm has a training time complexity of O(m3
) and a
space complexity of O(m2
), where m is the number of
training samples. Therefore, an increase in the size m will
drastically affect the time and memory needed to train the
SVM algorithm and may even become computationally
infeasible on very large datasets.
Many other ML algorithms also exhibit high time complexity:
for example, the time complexity of principal component
analysis is O(mn2
+n3
) , that of logistic regression O(mn2
+n3
) ,
that of locally weighted linear regression O(mn2+n3) , and
that of Gaussian discriminative analysis O(mn2+n3), where
m is the number of samples and n the number of features.
Volume
Class imbalance:-
As datasets grow larger, the assumption that the data are
uniformly distributed across all classes is often broken.
This leads to a challenge referred to as class imbalance
The performance of a machine learning algorithm can be
negatively affected when datasets contain data from classes
with various probabilities of occurrence.
This problem is especially prominent when some classes are
represented by a large number of samples and some by very
few.
volume
Curse of Dimensionality:-
It refers to difficulties encountered when working in high
dimensional space. Specifically, the dimensionality describes
the number of features or attributes present in the dataset.
In addition, dimensionality affects processing performance:
the time and space complexity of ML algorithms is closely
related to data dimensionality. The time complexity of
many ML algorithms is polynomial in the number of
dimensions.
volume
Feature Engineering:-
This is the process of creating features, typically using
domain knowledge, to make machine learning perform better.
Indeed, the selection of the most appropriate features is one
of the most time consuming pre-processing tasks in machine
Learning.
As the dataset grows, both vertically and horizontally, it
becomes more difficult to create new, highly relevant features.
volume
Bonferonni’s principle :
It embodies the idea that if one is looking for a specific type
of event within a certain amount of data, the likelihood of
finding this event is high.
However, more often than not these occurrences are bogus,
meaning that they have no cause and are therefore
meaningless
instances within a Dataset.
In statistics, the Bonferonni correction theorem provides a
means of
avoiding those bogus positive searches within a dataset. It
suggests
That :- if testing m hypotheses with a desired significance of α ,
each
volume
Variance and Bias
Machine learning relies upon the idea of generalization;
through
observations and manipulations of data, representations can
be
generalized to enable analysis and prediction. Generalization
error
can be broken down into two components: variance and bias
Variance describes the consistency of a learner’s ability to
predict
random things, whereas bias describes the ability of a learner
to learn
the wrong thing
Variance and Bias:
APPROACHES
This paper reviews and organizes various proposed
machine learning approaches and discusses how they
address the identified challenges.
The big picture of approach-challenge correlations is
presented in Table.
It includes a list of approaches along with the challenges
that each best addresses. Symbol ‘√’ indicate high degree
of remedy while ‘*’ represents partial resolution.
Approaches ...
The following sub-sections introduce techniques and
methodologies being developed and used to handle the
challenges associated with machine learning with Big Data.
First, manipulation techniques used in conjunction with
existing algorithms are presented.
Second, various machine learning paradigms that are
especially well suited to handle Big Data challenge
A. Manipulations for Big Data
Data analytics using machine learning relies on an
established suite of events, also known as the data analytics
pipeline. The approaches presented in this section discuss
possible manipulations in various steps of the existing
Pipeline.
Dimensionality reduction aims to map high dimensionality
space onto lower-dimensionality one without significant loss
of information.
Data Analytics pipeline:
These three categories, along with their corresponding sub
categories and sample solutions
Manipulations
B. Machine Learning Paradigms for Big
Data
A variety of learning paradigms exists in the field of machine
learning; however, not all types are relevant to all areas of
Research
1) Deep Learning
Deep learning is an approach from the representation
learning family of machine learning. Representation learning
is also often referred to as feature learning
It transforms data into abstract representations that enable
the
features to be learnt. In a deep learning architecture, these
representations are subsequently used to accomplish the
machine learning tasks
2) Online Learning
Because it responds well to large-scale processing by nature,
online learning is another machine learning paradigm that has
been explored to bridge efficiency gaps created by Big Data.
Online learning can be seen as an alternative to batch
learning, the paradigm typically used in conventional machine
learning.
Local Learning
Local learning is a strategy that offers an alternative to typical
global learning. Conventionally, ML algorithms make use of
global learning through strategies such as generative learning
The idea behind it is to separate the input space into clusters
and then build a separate model for each cluster. This
reduces overall cost and complexity.
Transfer Learning
Transfer learning is an approach for improving learning in a
particular domain, referred to as the target domain, by
training
the model with other datasets from multiple domains, denoted
as source domains, with similar attributes or features, such as
the problem and constraints.
This type of learning is used when the data size within the
target domain is insufficient or the learning task is different .
Figure shows an abstract view of transfer learning.
Lifelong learning
Lifelong learning mimics human learning; learning is
continuous; knowledge is retained and used to solve different
problems. It is directed to maximize overall learning, to be
able to solve a new task by training either on one single
domain or on heterogeneous domains collectively
The learning outcomes from the training process are
collected and combined together in a space called the topic
model or knowledge model.
Ensemble learning
Conclusion
This paper has provided a systematic review of the
challenges associated with machine learning in the context
Of Big Data and categorized them according to the V
dimensions of Big Data. Moreover, it has presented an
overview of ML approaches and discussed how these
techniques overcome the various challenges identified.
Thank you

More Related Content

What's hot

Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
Ankit Gupta
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
Marina Santini
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
A brief history of machine learning
A brief history of  machine learningA brief history of  machine learning
A brief history of machine learning
Robert Colner
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
Jason Geng
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Sampath Kumar
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
Dr Ganesh Iyer
 
Machine Learning ppt
Machine Learning pptMachine Learning ppt
Machine Learning ppt
Student Conscious Club
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckAI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
SlideTeam
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
Lippo Group Digital
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
Srinath Perera
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
SwatiTripathi44
 
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Simplilearn
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale
 
Machine Learning techniques
Machine Learning techniques Machine Learning techniques
Machine Learning techniques Jigar Patel
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
Walaa Hamdy Assy
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
VARUN KUMAR
 

What's hot (20)

Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
A brief history of machine learning
A brief history of  machine learningA brief history of  machine learning
A brief history of machine learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
Machine Learning ppt
Machine Learning pptMachine Learning ppt
Machine Learning ppt
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckAI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Machine Learning techniques
Machine Learning techniques Machine Learning techniques
Machine Learning techniques
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 

Similar to Machine learning with Big Data power point presentation

Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
PhD Assistance
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
NeeleEilers
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning
Pranya Prabhakar
 
ML crash course
ML crash courseML crash course
ML crash course
mikaelhuss
 
Cognitive automation
Cognitive automationCognitive automation
Cognitive automation
Trideeb Kumar Das
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
Technovision
TechnovisionTechnovision
Technovision
SayantanGhosh58
 
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
cscpconf
 
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsMachine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Arpana Awasthi
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
AnanthReddy38
 
Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...butest
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Machine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business RevolutionMachine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business Revolution
Cognizant
 
Anirban part1
Anirban part1Anirban part1
Anirban part1
kamatchi priya
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
kavinilavuG
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docx
andreecapon
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
iaeronlineexm
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
Shanmugasundaram M
 
How Does Data Create Economic Value_ Foundations For Valuation Models.pdf
How Does Data Create Economic Value_ Foundations For Valuation Models.pdfHow Does Data Create Economic Value_ Foundations For Valuation Models.pdf
How Does Data Create Economic Value_ Foundations For Valuation Models.pdf
Dr. Anke Nestler
 

Similar to Machine learning with Big Data power point presentation (20)

Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning
 
ML crash course
ML crash courseML crash course
ML crash course
 
Cognitive automation
Cognitive automationCognitive automation
Cognitive automation
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
Technovision
TechnovisionTechnovision
Technovision
 
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
 
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsMachine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Machine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business RevolutionMachine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business Revolution
 
Anirban part1
Anirban part1Anirban part1
Anirban part1
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docx
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
How Does Data Create Economic Value_ Foundations For Valuation Models.pdf
How Does Data Create Economic Value_ Foundations For Valuation Models.pdfHow Does Data Create Economic Value_ Foundations For Valuation Models.pdf
How Does Data Create Economic Value_ Foundations For Valuation Models.pdf
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 

Machine learning with Big Data power point presentation

  • 1. MACHINE LEARNING WITH BIG DATA CHALLENGES & APPROACHES BY K.DAVID DMCA-40
  • 2. Introduction The Big Data revolution promises to transform how we live, work, and think by enabling process optimization, empowering insight discovery and improving decision making. Today, the amount of data is exploding at an unprecedented rate as a result of developments in Web technologies, social media, and mobile and sensing devices. For example, Twitter processes over 70M tweets per day, thereby generating over 8TB daily. The ability to extract value from Big Data depends on data analytics which consider analytics to be the core of the Big Data revolution Data analytics involves various approaches, technologies, and tools such as those from text analytics, business intelligence, data visualization, and statistical analysis. The machine learning (ML) as a fundamental component of data analytics. The ML will be one of the main drivers of the Big Data revolution. The reason for this is its ability to learn from data and provide data driven insights, decisions, and predictions.
  • 3. ML Challenges Originating From Big Data Definition Big Data are often described by its dimensions, which are referred to as its V’s.Earlier definitions of Big Data focussed on three V’s (volume, velocity, and variety); however, a more commonly accepted definition now relies upon the following four V’s : volume, velocity, variety, and veracity. It is important to note that other Vs can also be found in the literature. For example, value is often added as a 5th V. However, value is defined as the desired outcome of Big Data processing and not as defining characteristics of Big Data itself This section identifies machine learning challenges and associates each challenge with a specific dimension of Big Data.
  • 5. VOLUME: The first and the most talked about characteristic of Big Data is volume: it is the amount, size, and scale of the data. In the machine learning context, size can be defined either Vertically by the number of records or samples in a dataset or horizontally by the number of features or attributes it contains. 1) Processing Performance:- One of the main challenges encountered in computations with Big Data comes from the simple principle that scale, or volume, adds computational complexity. Consequently, as the scale becomes large, even trivial operations can become costly.
  • 6. volume For example, the standard support vector machine (SVM) algorithm has a training time complexity of O(m3 ) and a space complexity of O(m2 ), where m is the number of training samples. Therefore, an increase in the size m will drastically affect the time and memory needed to train the SVM algorithm and may even become computationally infeasible on very large datasets. Many other ML algorithms also exhibit high time complexity: for example, the time complexity of principal component analysis is O(mn2 +n3 ) , that of logistic regression O(mn2 +n3 ) , that of locally weighted linear regression O(mn2+n3) , and that of Gaussian discriminative analysis O(mn2+n3), where m is the number of samples and n the number of features.
  • 7. Volume Class imbalance:- As datasets grow larger, the assumption that the data are uniformly distributed across all classes is often broken. This leads to a challenge referred to as class imbalance The performance of a machine learning algorithm can be negatively affected when datasets contain data from classes with various probabilities of occurrence. This problem is especially prominent when some classes are represented by a large number of samples and some by very few.
  • 8. volume Curse of Dimensionality:- It refers to difficulties encountered when working in high dimensional space. Specifically, the dimensionality describes the number of features or attributes present in the dataset. In addition, dimensionality affects processing performance: the time and space complexity of ML algorithms is closely related to data dimensionality. The time complexity of many ML algorithms is polynomial in the number of dimensions.
  • 9. volume Feature Engineering:- This is the process of creating features, typically using domain knowledge, to make machine learning perform better. Indeed, the selection of the most appropriate features is one of the most time consuming pre-processing tasks in machine Learning. As the dataset grows, both vertically and horizontally, it becomes more difficult to create new, highly relevant features.
  • 10. volume Bonferonni’s principle : It embodies the idea that if one is looking for a specific type of event within a certain amount of data, the likelihood of finding this event is high. However, more often than not these occurrences are bogus, meaning that they have no cause and are therefore meaningless instances within a Dataset. In statistics, the Bonferonni correction theorem provides a means of avoiding those bogus positive searches within a dataset. It suggests That :- if testing m hypotheses with a desired significance of α , each
  • 11. volume Variance and Bias Machine learning relies upon the idea of generalization; through observations and manipulations of data, representations can be generalized to enable analysis and prediction. Generalization error can be broken down into two components: variance and bias Variance describes the consistency of a learner’s ability to predict random things, whereas bias describes the ability of a learner to learn the wrong thing
  • 13. APPROACHES This paper reviews and organizes various proposed machine learning approaches and discusses how they address the identified challenges. The big picture of approach-challenge correlations is presented in Table. It includes a list of approaches along with the challenges that each best addresses. Symbol ‘√’ indicate high degree of remedy while ‘*’ represents partial resolution.
  • 14.
  • 15. Approaches ... The following sub-sections introduce techniques and methodologies being developed and used to handle the challenges associated with machine learning with Big Data. First, manipulation techniques used in conjunction with existing algorithms are presented. Second, various machine learning paradigms that are especially well suited to handle Big Data challenge
  • 16. A. Manipulations for Big Data Data analytics using machine learning relies on an established suite of events, also known as the data analytics pipeline. The approaches presented in this section discuss possible manipulations in various steps of the existing Pipeline. Dimensionality reduction aims to map high dimensionality space onto lower-dimensionality one without significant loss of information.
  • 17. Data Analytics pipeline: These three categories, along with their corresponding sub categories and sample solutions
  • 19. B. Machine Learning Paradigms for Big Data A variety of learning paradigms exists in the field of machine learning; however, not all types are relevant to all areas of Research 1) Deep Learning Deep learning is an approach from the representation learning family of machine learning. Representation learning is also often referred to as feature learning It transforms data into abstract representations that enable the features to be learnt. In a deep learning architecture, these representations are subsequently used to accomplish the machine learning tasks
  • 20.
  • 21. 2) Online Learning Because it responds well to large-scale processing by nature, online learning is another machine learning paradigm that has been explored to bridge efficiency gaps created by Big Data. Online learning can be seen as an alternative to batch learning, the paradigm typically used in conventional machine learning.
  • 22. Local Learning Local learning is a strategy that offers an alternative to typical global learning. Conventionally, ML algorithms make use of global learning through strategies such as generative learning The idea behind it is to separate the input space into clusters and then build a separate model for each cluster. This reduces overall cost and complexity.
  • 23.
  • 24. Transfer Learning Transfer learning is an approach for improving learning in a particular domain, referred to as the target domain, by training the model with other datasets from multiple domains, denoted as source domains, with similar attributes or features, such as the problem and constraints. This type of learning is used when the data size within the target domain is insufficient or the learning task is different . Figure shows an abstract view of transfer learning.
  • 25.
  • 26. Lifelong learning Lifelong learning mimics human learning; learning is continuous; knowledge is retained and used to solve different problems. It is directed to maximize overall learning, to be able to solve a new task by training either on one single domain or on heterogeneous domains collectively The learning outcomes from the training process are collected and combined together in a space called the topic model or knowledge model.
  • 28. Conclusion This paper has provided a systematic review of the challenges associated with machine learning in the context Of Big Data and categorized them according to the V dimensions of Big Data. Moreover, it has presented an overview of ML approaches and discussed how these techniques overcome the various challenges identified.