BIG DATA AND MACHINE LEARNING

BIG DATA AND MACHINE
LEARNING
Big Data & IoT
Lecture #3
Umair Shafique (03246441789)
Scholar MS Information Technology - University of Gujrat

Table of contents
Define big data
Big data as
10V’s
Some Pros and
cons Of Big
Data
Perceived
Challenges of
Big Data
Define machine
learning
Real-world
examples
Working flow of
ML
types of ML
Challenges of
ML
Relate big data
with ML
Features of ML
with big data
Framework
based on ML for
big data
processing
Tools and
technologies for
big data and ML
Difference b/w
ML and Big data
Research
challenges and
open issues
Summary
References

What is Big Data?
Big Data is a collection of data that is
huge in volume, yet growing
exponentially with time. It is a data
with so large size and complexity that
none of traditional data management
tools can store it or process it
efficiently. Big data is also a data but
with huge size.

Who’s Generating Big Data?
The progress and innovation is no longer
hindered by the ability to collect data But,
by the ability to manage, analyze,
summarize, visualize, and discover
knowledge from the collected data in a
timely manner and in a scalable fashion.

Some Pros Of
Big Data :
Better decision-making
Increased productivity
Reduce costs
Improved customer service
Fraud detection
Greater innovation

Cons of big
data:
Need for talent
Data quality
Need for cultural change
Rapid change
Hardware needs
Costs

Perceived Challenges of Big Data

What is Machine
Learning?
Machine learning is an application
of AI that provides systems the
ability to learn on their own and
improve from experiences without
being programmed externally. If
your computer had machine
learning, it might be able to play
difficult parts of a game or solve a
complicated mathematical equation
for you.

Real world examples of machine learning
Machine learning is relevant in many fields, industries, and has the capability to grow over time. Here are
six real-life examples of how machine learning is being used.
1. Image recognition
Image recognition is a well-known and widespread example of machine learning in the real world. It can
identify an object as a digital image, based on the intensity of the pixels in black and white images or
colour images.
e.g.
• Label an x-ray as cancerous or not
• Assign a name to a photographed face (aka “tagging” on social media)
• Recognise handwriting by segmenting a single letter into smaller images
• Machine learning is also frequently used for facial recognition within an image. Using a database of
people, the system can identify commonalities and match them to faces. This is often used in law
enforcement.

2. Speech recognition
Machine learning can translate speech into text. Certain software applications can convert live voice and recorded
speech into a text file. The speech can be segmented by intensities on time-frequency bands as well.
• Voice search
• Voice dialling
• Appliance control
• Some of the most common uses of speech recognition software are devices like Google Home or Amazon Alexa.
3. Medical diagnosis
Machine learning can help with the diagnosis of diseases. Many physicians use chatbots with speech recognition
capabilities to discern patterns in symptoms.
• Assisting in formulating a diagnosis or recommends a treatment option
• Oncology and pathology use machine learning to recognise cancerous tissue
• Analyse bodily fluids
• In the case of rare diseases, the joint use of facial recognition software and machine learning helps scan patient
photos and identify phenotypes that correlate with rare genetic diseases.

4. Predictive analytics
Machine learning can classify available data into groups, which are then defined by rules set by analysts. When the
classification is complete, the analysts can calculate the probability of a fault.
• Predicting whether a transaction is fraudulent or legitimate
• Improve prediction systems to calculate the possibility of fault
• Predictive analytics is one of the most promising examples of machine learning. It's applicable for everything;
from product development to real estate pricing.
5. Extraction
Machine learning can extract structured information from unstructured data. Organizations amass huge volumes of
data from customers. A machine learning algorithm automates the process of annotating datasets for predictive
analytics tools.
• Generate a model to predict vocal cord disorders
• Develop methods to prevent, diagnose, and treat the disorders
• Help physicians diagnose and treat problems quickly
• Typically, these processes are tedious. But machine learning can track and extract information to obtain billions
of data samples.

How Machine Learning Works?
Consider a system with input data that contains photos of various kinds of fruits. You want the system to
group the data according to the different types of fruits.
First, the system will analyze the input data. Next, it tries to find patterns, like shapes, size, and color. Based
on these patterns, the system will try to predict the different types of fruit and segregate them. Finally, it
keeps track of all the decisions it made during the process to ensure it is learning. The next time you ask
the same system to predict and segregate the different types of fruits, it won't have to go through the
entire process again. That’s how machine learning works.

Types of Machine
Learning
• Supervised machine learning: You supervise the machine
while training it to work on its own. This requires labeled
training data
• Unsupervised learning: There is training data, but it won’t
be labeled
• Reinforcement learning: The system learns on its own

Supervised Learning
To understand how supervised learning works, look at the example
below, where you have to train a model or system to recognize an
apple.
• First, you have to provide a data set that contains pictures of a
kind of fruit, e.g., apples.
• Then, provide another data set that lets the model know that
these are pictures of apples. This completes the training phase.
• Next, provide a new set of data that only contains pictures of
apples. At this point, the system can recognize what the fruit it is and
will remember it.
• That's how supervised learning works. You are training the model
to perform a specific operation on its own. This kind of model is
often used in filtering spam mail from your email accounts.

Supervised learning include:
Classification: A typical supervised learning is a classification. The spam filter that we spoke
above is one such example. It is trained with many example emails along with its class (Spam,
Not-Spam) and then works automatically in classifying new emails.
Used for:
• Spam filtering
• Sentiment analysis
• Recognition of handwritten characters and numbers
• Fraud detection
Popular algorithms: Naive Bayes, Decision Tree, Linear Regression, Logistic Regression, K-Nearest
Neighbors, Support Vector Machine, Neural Networks
Regression: Regression is basically a classification where we forecast a number instead of
category. Examples are car price by its mileage, traffic by time of the day, demand volume by the
growth of the company, etc. Regression is perfect when something depends on time.
• Used for:
• Stock price forecasts
• Demand and sales volume analysis
• Medical diagnosis
• Any number-time correlations

Unsupervised
Learning
consider a cluttered dataset: a collection of pictures of different fruit.
You feed this data to the model, and the model analyzes it to
recognize any patterns. In the end, the machine categorizes the
photos into three types, as shown in the image, based on their
similarities. Flipkart uses this model to find and recommend products
that are well suited for you.
It include:
• Clustering: Clustering algorithm tries to find similar (by some
features) objects and merge them in a cluster. Those that have lots of
similar features are joined in one class. With some algorithms, you
even can specify the exact number of clusters you want.
Used:
• For market segmentation (types of customers, loyalty)
• For image compression
• To analyze and label new data
• To detect abnormal behavior
Popular Clustering algorithms are:
• K-Means

Reinforcement
Learning
Used today for:
• Replacement of all algorithms above
• Object identification of photos and videos
• Speech recognition and synthesis
• Image processing, style transfer
• Machine translation

Main Challenges of Machine
Learning
• Poor-Quality Data
• Irrelevant Features
• Testing and Validating

Big Data & Machine Learning (How Do They
Relate?)
According to recap, Big data refers to vast amounts of data that traditional storage
methods cannot handle. Machine learning is the ability of computer systems to learn to
make predictions from observations and data. Machine learning can use the information
provided by the study of big data to generate valuable business insights.
Machine learning tools use data-driven algorithms and statistical models to analyze data
sets and then draw inferences from identified patterns or make predictions based on them.
The algorithms learn from the data as they run against it, as opposed to traditional rules-
based analytics systems that follow explicit instructions.
Big data provides ample amounts of raw material from which machine learning systems
can derive insights. By combining them, organizations are producing significant analytics
findings and results.

Features of Machine Learning with
Big Data
•Sparse Representation
•Mining Structured Relations
•High Scalability and High Speed.

Reference Framework Based on Machine Learning for Big Data Processing

Big data processing procedure with
machine learning:
We suppose the big data processing procedure mainly consists of the following four
phases:
• pre-processing phase
• analysis phase
• model establishment phase
• model updating phase

Tools and technologies for big
data and ML:
Snowflake Data
Science
Matplotlib TensorFlow Bigml Apache Spark Knime Cloudera

Key difference b/w Big data and ML:

Summary of lecture
• In this lecture , we firstly provided an overview about big data and summarized the characteristics of big data.
• Then give over wiew on machine learing. In order to highlight the differences of machine learning techniques in the context of
big data, we then analyzed the new features of machine learning with big data.
• Next we relate big data and machine learning .
• We also proposed a reference framework for processing big data based on machine learning techniques with the power of
distributed storage and parallel computing. Finally, we presented several research challenges and open issues.
• We hope that this lecture can stimulate more interest in research and development of techniques based on machine learning for
big data processing.

References
• https://towardsdatascience.com/machine-learning-and-big-data-
real-world-applications-3ba3a3345cf5
• https://www.salesforce.com/eu/blog/2020/06/real-world-
examples-of-machine-learning.html
• https://www.google.com/amp/s/www.techtarget.com/searchbus
inessanalytics/tip/Big-data-vs-machine-learning-How-they-differ-
and-relate%3famp=1
• https://geekflare.com/big-data-tools-for-data-scientist/
• https://www.salesforce.com/eu/blog/2020/06/real-world-
examples-of-machine-learning.html
• https://towardsdatascience.com/machine-learning-and-big-data-
real-world-applications-3ba3a3345cf5

BIG DATA AND MACHINE LEARNING

More Related Content

What's hot

Similar to BIG DATA AND MACHINE LEARNING

Recently uploaded

BIG DATA AND MACHINE LEARNING

Editor's Notes