SlideShare a Scribd company logo
1 of 161
Download to read offline
Unit 4
Learning
L.A.Bewoor
laxmi.bewoor@viit.ac.in
Department of Computer Engineering
BRACT’S, Vishwakarma Institute of Information Technology, Pune-48
(An Autonomous Institute affiliated to Savitribai Phule Pune University)
(NBA and NAAC accredited, ISO 9001:2015 certified)
Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
Objective/s of this session
Discuss Learning components and types of learning
in AI
1. Differentiate between supervised, unsupervised
and reinforcement learning
2. Implement application for supervised,
unsupervised and reinforcement algorithm
3. Learn & Implement perceptron & neural networks
4. Learn & Implement Ensemble learning
Learning Outcome/Course Outcome
Dr. L. A. Bewoor Department of Computer Engineering, VIIT , Pune-48
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Contents
• Sequential and time series analysis
• Speech Recognizer
• Natural Language Processing
• Chatbots
• Perceptron based classifier
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Ensemble Learning
• Ensemble Learning is a method of reaching a consensus in predictions by
fusing the salient properties of two or more models. The final ensemble
learning framework is more robust than the individual models that
constitute the ensemble because ensembling reduces the variance in the
prediction errors
Ensemble Learning tries to capture complementary information from its
different contributing models—that is, an ensemble framework is
successful when the contributing models are statistically diverse.
• for example, a model may be well adapted to differentiate between cats
and dogs, but not so much when distinguishing between dogs and wolves.
On the other hand, a second model can accurately differentiate between
dogs and wolves while producing wrong predictions on the “cat” class. An
ensemble of these two models might draw a more discriminative decision
boundary between all the three classes of the data.
• In learning models, noise, variance, and bias are the major sources of
error. The ensemble methods in machine learning help minimize these
error-causing factors, thereby ensuring the accuracy and stability of
machine learning (ML) algorithms.
Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
Ensemble Learning
• we may have trained one cat/dog classifier on high-quality images taken by
a professional photographer. In contrast, another classifier has been
trained on data using low-quality photos captured on mobile phones.
When predicting a new sample, integrating the decisions from both these
classifiers will be more robust and bias-free.
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Bias and Variance
• Bias is the difference between the predicted
value and actual value by the model. Bias is
introduced when the model doesn’t consider the
variation of data and creates a simple model.
• The simple model doesn’t follow the patterns of
data, and hence the model gives errors in
predicting training as well as testing data i.e. the
model with high bias and high variance.
• When the model follows even random quirks of
data, as pattern of data, then the model might do
very well on training dataset i.e. it gives low bias,
but it fails on test data and gives high variance.
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
• In supervised learning, underfitting happens when a model unable to
capture the underlying pattern of the data. These models usually have
high bias and low variance. It happens when we have very less amount
of data to build an accurate model or when we try to build a linear model
with a nonlinear data. Also, these kind of models are very simple to
capture the complex patterns in data like Linear and logistic regression.
• In supervised learning, overfitting happens when our model captures the
noise along with the underlying pattern in data. It happens when we
train our model a lot over noisy dataset. These models have low bias
and high variance. These models are very complex like Decision trees
which are prone to overfitting.
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Ensemble learning techniques
1. Bagging : Bagging tries to implement similar learners on small
sample populations and then takes a mean of all the predictions.
In generalized bagging, you can use different learners on different
population. As you can expect this helps us to reduce the
variance error. Acronym “Bootstrap Aggregating”
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
• Multiple different training datasets can be prepared,
used to estimate a predictive model, and make
predictions. Averaging the predictions across the
models typically results in better predictions than a
single model fit on the training dataset directly.
Ensemble learning techniques
• Bagging is a parallel method, which means
several weak learners learn the data pattern
independently and simultaneously
• Bagging reduces variance
• Popular ensemble methods based on this
approach include:
Bagged Decision Trees
Random Forest Classifiers
Extra Trees
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
2. Boosting
• instead of parallel processing of data, sequential processing of
the dataset occurs. The first classifier is fed with the entire
dataset, and the predictions are analyzed.
• The instances where Classifier-1 fails to produce correct
predictions (that are samples near the decision boundary of
the feature space) are fed to the second classifier.
• This is done so that Classifier-2 can specifically focus on the
problematic areas of feature space and learn an appropriate
decision boundary. Similarly, further steps of the same idea
are employed, and then the ensemble of all these previous
classifiers is computed to make the final prediction on the test
data.
Ensemble learning techniques
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Ensemble learning techniques
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
• The main aim of the boosting method is to
reduce bias in the ensemble decision. Thus,
the classifiers are chosen for the ensemble
usually need to have low variance and high
bias, i.e., simpler models with less trainable
parameters.
– Adaptive Boosting
– Stochastic Gradient Boosting
– Gradient Boosting Machines
Ensemble learning techniques
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
3. Stacking
• The stacking ensemble method also involves creating
bootstrapped data subsets, like the bagging ensemble
mechanism for training multiple models.
However, the outputs of all such models are used as an
input to another classifier, called meta-classifier, which
finally predicts the samples. The intuition behind using two
layers of classifiers is to determine whether the training
data have been appropriately learned.
• For example, in the example of the cat/dog/wolf classifier at
the if, say, Classifier-1 can distinguish between cats and
dogs, but not between dogs and wolves, the meta-classifier
present in the second layer will be able to capture this
behavior from classifier-1. The meta classifier can then
correct this behavior before making the final prediction.
Ensemble learning techniques
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Ensemble learning techniques
• Split the training set into two disjoint sets.
• Train several base learners on the first part.
• Test the base learners on the second part.
• Using the predictions from 3) as the inputs, and the correct
responses as the outputs, train a higher level learner.
Example : Voting Classifier
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Reinforcement Learning (RL)
• Drawback of machine learning algorithms:
– Need of huge amount of data for training the model
– Data may be missing or false or unavailable
• Requirement by system
– Machines need to learn to perform actions by themselves and not just
learn from data.
• Reinforcement Learning
▪ Reinforcement Learning is a feedback-based Machine learning
technique in which an agent learns to behave in an environment by
performing the actions and seeing the results of actions.
▪ if the model performs an action that brings it closer to its goal then
positive reward or a negative reward if it goes away from its goal.
▪ Returns an optimum solution for a problem by taking a sequence of
decisions by itself(without human interference)
▪ Hit and Trail basis
▪ Sequential decision making
▪ Feedback is not instantaneous
▪ type of dynamic programming
Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
Important Terminologies in RL
• Agent: Agent is the model that is being trained via
reinforcement learning
• Environment: The training situation that the model must
optimize to is called its environment
• Action: All possible steps that can be taken by the model
• State: The current position/ condition returned by the
model
• Reward: To help the model move in the right direction, it
is rewarded/points are given to it to appraise some action
• Policy: Policy determines how an agent will behave at any
time. It is a strategy applied by the agent for the next
action based on the current state.
• Value: It is expected long-term retuned with the discount
factor and opposite to the short-term reward.
• Q-value: It is mostly similar to the value, but it takes one
additional parameter as a current action .
• Discount factor – helps to adjust the importance of
rewards over time. It exponentially decreases the value of
later rewards so agents don’t take any action with no
long-term impacts
RL algorithms Categorization
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Learning models of RL
• Markov Decision
Process(MDP):
• Most Reinforcement Learning
tasks can be framed as MDP
The following parameters are
used to get a solution:
– Set of actions- A
– Set of states -S
– Reward- R
– Policy- n
– Value- V
Mathematically we can express this
statement as :(no ‘memory’ )
Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
Bellman Equation & Dynamic Programming
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
ɤ between 0 to 1
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Bellman Equation & Dynamic Programming
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Solution is the largest value
in the array after computing
n iterations
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Q-Learning:
Markov Decision Process + Reinforcement Learning
Q-Learning is a Reinforcement learning policy that will find the next best action,
given a current state. It chooses this action at random and aims to maximize
the reward.
Q Learning
• The objective of the model is to find the best
course of action given its current state. To do
this, it may come up with rules of its own or it
may operate outside the policy given to it to
follow. This means that there is no actual need
for a policy, hence we call it off-policy.
• Model-free means that the agent uses
predictions of the environment’s expected
response to move forward. It does not use the
reward system to learn, but rather, trial and
error.
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Important Terms in Q-Learning
• States: The State, S, represents the current position of
an agent in an environment.
• Action: The Action, A, is the step taken by the agent
when it is in a particular state.
• Rewards: For every action, the agent will get a positive
or negative reward.
• Episodes: When an agent ends up in a terminating
state and can’t take a new action.
• Q-Values: Used to determine how good an Action, A,
taken at a particular state, S, is. Q (A, S).
• Temporal Difference: A formula used to find the
Q-Value by using the value of current state and action
and previous state and action.
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Q-Learning
Robot Navigation
• a robot has to cross a maze and reach
the end point. There are mines, and the
robot can only move one tile at a time. If
the robot steps onto a mine, the robot is
dead. The robot has to reach the end
point in the shortest time possible.
• The scoring/reward system is as below:
• The robot loses 1 point at each step. This
is done so that the robot takes the
shortest path and reaches the goal as
fast as possible.
• If the robot steps on a mine, the point
loss is 100 and the game ends.
• If the robot gets power ⚡, it gains 1
point.
• If the robot reaches the end goal, the
robot gets 100 points.
Q Table
In the Q-Table, the columns are the actions and the rows are the
states.
Each Q-table score will be the
maximum expected future reward
that the robot will get if it takes
that action at that state.
each value of the Q-table is
calculated with the Q-Learning
algorithm.
The Q-function uses the Bellman
equation and takes two inputs:
state (s) and action (a).
Q Learning Algorithm
Q Learning
• Q is brain of agent. Initialize it with 0
• Set gamma and environment rewards in R
• Each episode is one training session
• In each training session agent explores
enviornment (with R) and receives reward until it
reaches goal.
• Purpose is to enhance brain represented with Q.
More training results in more optimized Q.
• Gamma is set between 0 to1. Closer to 0 means
agent considers immediate rewards whereas closer
to 1 means future rewards
• Q(State,Action)= R(State,Action)+ Gamma*
max[Q(state,all actions)]
Q Learning Algorithm
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune
Perceptron
Introduction
• A perceptron is a neural network unit (an
artificial neuron) that does certain computations
to detect features or business intelligence in the
input data.
• It closely resembles with biological neuron
• McCullock and Walter Pitts firstly
introduced nerve cell as a simple logic gate with
binary outputs.
• Perceptron is a simple model of biological neuron
in the from of ANN. It is supervised learning
algorithm designed for binary classifier.
Biological Neuron vs Artificial Neuron
Multiple signals arrive at the dendrites
and are then integrated into the cell
body, and, if the accumulated signal
exceeds a certain threshold, an output
signal is generated that will be passed
on by the axon.
Cell neuclues(Soma)
Dendrites
Synapse
Axon
An artificial neuron is a mathematical function
based on a model of biological neurons, where
each neuron takes inputs, weighs them
separately, sums them up and passes this sum
through a nonlinear function to produce output.
Node
Input
Weights
Output
Artificial Neuron
The artificial neuron has the following
characteristics:
– A neuron is a mathematical function modeled on the
working of biological neurons
– It is an elementary unit in an artificial neural network
– One or more inputs are separately weighted
– Inputs are summed and passed through a nonlinear
function to produce output
– Every neuron holds an internal state called activation
signal
– Each connection link carries information about the
input signal
– Every neuron is connected to another neuron via
connection link
Perceptron
• Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron learning
rule based on the original MCP neuron. A Perceptron is an algorithm for supervised learning
of binary classifiers. This algorithm enables neurons to learn and processes elements in the
training set one at a time.
• There are two types of Perceptrons:
• Single layer - Single layer perceptrons can learn only linearly separable patterns
• Multilayer - Multilayer perceptrons or feedforward neural networks with two or more layers
have the greater processing power
• The Perceptron algorithm learns the weights for the input signals in order to draw a linear
decision boundary.
• It takes an input, aggregates it (weighted sum) and returns 1 only if the aggregated sum is
more than some threshold else returns 0.
Multiple-Layer Networks
and
Backpropagation Algorithms
Multiple-Layer Networks
and
Backpropagation Algorithms
Backpropagation is the generalization of the Widrow-Hoff learning rule to
multiple-layer networks and nonlinear differentiable transfer functions.
Input vectors and the corresponding target vectors are used to train a
network until it can approximate a function, associate input vectors with
specific output vectors, or classify input vectors in an appropriate way as
defined by you.
Architecture
This section presents the architecture of the network that is most
commonly used with the backpropagation algorithm –
the multilayer feedforward network
Architecture
Neuron Model
An elementary neuron with R inputs is shown below. Each input is
weighted with an appropriate w. The sum of the weighted inputs and the
bias forms the input to the transfer function f. Neurons can use any
differentiable transfer function f to generate their output.
Architecture
Neuron Model
Transfer Functions (Activition Function)
Multilayer networks often use the log-sigmoid transfer function logsig.
The function logsig generates outputs between 0 and 1 as the neuron's
net input goes from negative to positive infinity
Architecture
Neuron Model
Transfer Functions (Activition Function)
Alternatively, multilayer networks can use the tan-sigmoid transfer
function-tansig.
The function logsig generates outputs between -1 and +1 as the neuron's
net input goes from negative to positive infinity
Architecture
Feedforward Network
A single-layer network of S logsig neurons having R inputs is shown
below in full detail on the left and with a layer diagram on the right.
Architecture
Feedforward Network
Feedforward networks often have one or more hidden layers of sigmoid neurons followed
by an output layer of linear neurons.
Multiple layers of neurons with nonlinear transfer functions allow the network to learn
nonlinear and linear relationships between input and output vectors.
The linear output layer lets the network produce values outside the range -1 to +1. On the
other hand, if you want to constrain the outputs of a network (such as between 0 and 1),
then the output layer should use a sigmoid transfer function (such as logsig).
Learning Algorithm:
Backpropagation
The following slides describes teaching process of multi-layer neural network
employing backpropagation algorithm. To illustrate this process the three layer neural
network with two inputs and one output,which is shown in the picture below, is used:
Learning Algorithm:
Backpropagation
Each neuron is composed of two units. First unit adds products of weights coefficients and
input signals. The second unit realise nonlinear function, called neuron transfer (activation)
function. Signal e is adder output signal, and y = f(e) is output signal of nonlinear element.
Signal y is also output signal of neuron.
Learning Algorithm:
Backpropagation
To teach the neural network we need training data set. The training data set consists of
input signals (x1
and x2
) assigned with corresponding target (desired output) z.
The network training is an iterative process. In each iteration weights coefficients of nodes
are modified using new data from training data set. Modification is calculated using
algorithm described below:
Each teaching step starts with forcing both input signals from training set. After this stage
we can determine output signals values for each neuron in each network layer.
Learning Algorithm:
Backpropagation
Pictures below illustrate how signal is propagating through the network,
Symbols w(xm)n
represent weights of connections between network input xm
and
neuron n in input layer. Symbols yn
represents output signal of neuron n.
Learning Algorithm:
Backpropagation
Learning Algorithm:
Backpropagation
Learning Algorithm:
Backpropagation
Propagation of signals through the hidden layer. Symbols wmn
represent weights
of connections between output of neuron m and input of neuron n in the next
layer.
Learning Algorithm:
Backpropagation
Learning Algorithm:
Backpropagation
Learning Algorithm:
Backpropagation
Propagation of signals through the output layer.
Learning Algorithm:
Backpropagation
In the next algorithm step the output signal of the network y is compared
with the desired output value (the target), which is found in training data
set. The difference is called error signal d of output layer neuron
Learning Algorithm:
Backpropagation
The idea is to propagate error signal d (computed in single teaching step)
back to all neurons, which output signals were input for discussed
neuron.
Learning Algorithm:
Backpropagation
The idea is to propagate error signal d (computed in single teaching step)
back to all neurons, which output signals were input for discussed
neuron.
Learning Algorithm:
Backpropagation
The weights' coefficients wmn
used to propagate errors back are equal to
this used during computing output value. Only the direction of data flow
is changed (signals are propagated from output to inputs one after the
other). This technique is used for all network layers. If propagated errors
came from few neurons they are added. The illustration is below:
Learning Algorithm:
Backpropagation
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function (which
weights are modified).
Learning Algorithm:
Backpropagation
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function (which
weights are modified).
Learning Algorithm:
Backpropagation
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function (which
weights are modified).
Thank you
Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
Unit 5
AI Applications
L.A.Bewoor
laxmi.bewoor@viit.ac.in
Department of Computer Engineering
BRACT’S, Vishwakarma Institute of Information Technology, Pune-48
(An Autonomous Institute affiliated to Savitribai Phule Pune University)
(NBA and NAAC accredited, ISO 9001:2015 certified)
Objective/s of this session
Discuss real life applications of AI
Apply AI techniques for real world application
1. AI application for NLP
2. AI application for time series analysis
3. AI application for speech recognition
4. AI application for chatbots
5. AI application for perceptron based classifier
Learning Outcome/Course Outcome
Dr. L. A. Bewoor Department of Computer Engineering, VIIT , Pune-48 2
Contents
• Sequential and time series analysis
• Speech Recognizer
• Natural Language Processing
• Chatbots
• Perceptron based classifier
Time series analysis
■ A Time Series is a sequence of measures of a given
phenomenon taken at regular time intervals such as hourly,
daily, weekly, monthly, quarterly, annually, or every so many
years
– Stock series are measures of activity at a point in time
– Flow series are series which are a measure of activity to a date (e.g.
Retail, Current Account Deficit, Balance of Payments)
– price of a particular commodity like gold, silver, any eatables, petrol,
diesel etc.
– rate of interest, The rate of interest for home loans
▪ A set of observations ordered with respect to the successive
time periods is a time series. In other words, the arrangement
of data in accordance with their time of occurrence is a time
series. It is the chronological arrangement of data. Here, time
is just a way in which one can relate the entire phenomenon
to suitable reference points.
• A time series depicts the relationship between two
variables. Time is one of those variables and the second is
any quantitative variable.
Uses of Time Series
• The most important use of studying time series is that it
helps us to predict the future behaviour of the variable
based on past experience
• It is helpful for business planning as it helps in comparing
the actual current performance with the expected one
• From time series, we get to study the past behaviour of the
phenomenon or the variable under consideration
• We can compare the changes in the values of different
variables at different times or places, etc.
Time series analysis
Components for Time Series Analysis
• Trend
• Seasonal Variations
• Cyclic Variations
• Random or Irregular movements
Trend
• The trend shows the general tendency of the data to increase or decrease
during a long period of time. A trend is a smooth, general, long-term,
average tendency. It is not always necessary that the increase or decrease
is in the same direction throughout the given period of time.
• It is observable that the tendencies may increase, decrease or are stable in
different sections of time. But the overall trend must be upward,
downward or stable. The population, agricultural production, items
manufactured, number of births and deaths, number of industry or any
factory, number of schools or colleges are some of its example showing
some kind of tendencies of movement.
Components for Time Series Analysis
• Seasonal Variations
• These are the rhythmic forces which operate in a regular and periodic
manner over a span of less than a year. They have the same or almost the
same pattern during a period of 12 months. This variation will be present
in a time series if the data are recorded hourly, daily, weekly, quarterly, or
monthly.
• These variations come into play either because of the natural forces or
man-made conventions. The various seasons or climatic conditions play an
important role in seasonal variations. Such as production of crops depends
on seasons, the sale of umbrella and raincoats in the rainy season, and the
sale of electric fans and A.C. shoots up in summer seasons.
• The effect of man-made conventions such as some festivals, customs,
habits, fashions, and some occasions like marriage is easily noticeable.
They recur themselves year after year. An upswing in a season should not
be taken as an indicator of better business conditions.
Components for Time Series Analysis
Cyclic Variations
• The variations in a time series which operate themselves over a
span of more than one year are the cyclic variations. This
oscillatory movement has a period of oscillation of more than a
year. One complete period is a cycle. This cyclic movement is
sometimes called the ‘Business Cycle’.
• It is a four-phase cycle comprising of the phases of prosperity,
recession, depression, and recovery. The cyclic variation may be
regular are not periodic. The upswings and the downswings in
business depend upon the joint nature of the economic forces and
the interaction between them.
Random or Irregular Movements
• There is another factor which causes the variation in the variable
under study. They are not regular variations and are purely
random or irregular. These fluctuations are unforeseen,
uncontrollable, unpredictable, and are erratic. These forces are
earthquakes, wars, flood, famines, and any other disasters.
Components for Time Series Analysis
Fundamental Rule of Time Series Analysis
• Stationarity is an important concept in the field of time series analysis with
tremendous influence on how the data is perceived and predicted.
• When forecasting or predicting the future, most time series models
assume that each point is independent of one another. The best indication
of this is when the dataset of past instances is stationary.
• For data to be stationary, the statistical properties of a system do not
change over time. This does not mean that the values for each data point
have to be the same, but the overall behavior of the data should remain
constant. From a purely visual assessment, time plots that do not show
trends or seasonality can be considered stationary. More numerical factors
in support of stationarity include a constant mean and a constant variance.
• Non-stationary time series
A non-stationary time series's statistical properties like mean,
variance etc will not be constant over time An example of a
non stationary time series is a series with a trend - something
that grows over time for instance. The sample mean and
variance of such a series will grow as you increase the size of
the sample.
• perform a transformation to convert into a stationary dataset.
The most common transforms are the difference and
logarithmic transform.
Fundamental Rule of Time Series Analysis
Time Series Decomposition
• Additive time series
• Remember the equation for additive time series is
simply: Ot
= Tt
+ St
+ Rt
• Ot
= output
Tt
= trend
St
= seasonality
Rt
= residual
t
= variable representing a particular point in time
• additive = trend + seasonal + residual
Time Series Decomposition
• Multiplicative time series
• Remember the equation for additive time series is
simply: Ot
= Tt
* St
* Rt
• Ot
= output
Tt
= trend
St
= seasonality
Rt
= residual
t
= variable representing a particular point in time
• multiplicative = trend * seasonal * residual
FORCASTING AND TIME SERIES ANALYSIS
The forecasting is based on the past recorded data and help in the
determination of future plan with respect to any desired objective. It helps
in the fixing of strategies.
STRATEGY MAKING DECISION
PLANNED
PERFORMANCE
ANALYSIS
DEVIATION
DESIRED
PERFORMANCE
FORECASTE
TYPES OF FORECAST
1. Demand Forecast – Prediction of demand for products or services.
2. Environmental Forecast – Prediction of social, political and economic changes.
3. Technological Forecast – Prediction of technological changes.
TIMING OF FORECASTS
Forecasts are usually classified accordingly to time period.
1. Short range forecast – commonly one year and usually less than the three
months. Eg purchasing of job scheduling, workforce, production level,
regional production, seasonal production etc.
2. Medium range forecast – commonly one to three years. Eg cash
budgeting, sale planning etc.
3. Long range forecast – commonly three to more years. Eg R and D capital
expenditure, establishment of new plants, facilities of labor etc.
Forecasting Methods
Forecasting methods are based on opinion (quantitative) or judgment
(qualitative). The quantitative methods are further divided into two
namely, time series and casual.
A time series is a set of measurements of a variable that are ordered through
time to time. The time variables does not fluctuate arbitrarily. It moves
uniformly always in the same direction.
The time series forecasting methods attempt to account for changes over a
period of time at regular intervals by examining patterns, cycles or trends to
product the outcome for a future time period.
Causal methods are based on the assumptions that the variable value under
consideration has a cause effect relationship with one or more other values.
Methods of Forecasting
1. Define objective
2. Select the variable of interest
3. Determine the time for forecasting
4. Select appropriate model
5. Collect the relevant data
6. Make the forecast
TYPES OF FORECASTING TECHNIQUES
A fixed and suitable technique for forecasting is primary necessity for the
validity of forecasts. In last few decades some forecasting techniques have
been developed and can be classified into three broad categories.
1. NAÏVE METHODS –
It is based on the assumption that future is just an extension of past.
2. BAROMETRIC METHODS –
It is based on assumption that forecast can be made on the basis of
certain happenings on the past. In this method a factor dependent series
has been constructed and there after statistical analysis can yields
forecast.
3. ANALYTICAL METHODS –
It is based on the analysis of causative forces operative on the variable to
be forecasted. Analytical techniques may be non-mathematical like factor
listing or opinion or mathematical.
TIME SERIES ANALYSIS
A time series is orderly arranged numerical values of desired variables with
respect to time. It is represented both in tabular as well as graphical manner.
Objectives : 1 : To identify the pattern and isolate the influencing factors (or
effects) for prediction purpose as well as for future planning and control.
: 2 : To review and evaluate plan progress
Pattern : It is assumed that time series data consists of an uniform pattern
with random fluctuations.
• Actual value of variable per unit time
= Mean value of variable per unit time + Random deviation/unit time
Ŷ = (r) pattern + e
Components :
1 : Trend – Sometimes a time series displayed either upward or downward
movements in the average value of the variable of interest.
2 : Cycles – An upward or downward movements in the variable of interest over
a period of time. It may has four phases peak, contradiction, trough and
expansion
3 : Seasonal – An upward and downward movements within year and follow
regular pattern.
4 : Irregular – rapid upward or downward movements caused by short term
unanticipated and non-recurring factors.
Time Series Methods - The available data of time series is used for the
mathematical analysis to derive future inferences. These processes have
limitations that they have no accurate future values. This limitations of the
time series approach is taken care by the application of causal methods. The
time series methods are as follows -
A. Freehand Methods
B. Smoothing Methods – Smoothing is a process that often improves our ability to
forecast series by reducing the impact of noise
(i) Moving Averages (ii) Weighted Moving Averages (iii) Semi Averages.
C. Exponential Smoothing Methods – (i) Simple exponential Smoothing (ii) Adjusted
Exponential Smoothing
D. Quadratic Trend Model
A. Freehand Methods
A freehand curve draws as a straight line from value of lowest time limit to
value of highest time limit of series. The forecast can be obtained simply by
extending the trend line. A trend line fitted by the freehand method should
confirmed the conditions mentioned below.
(i) It is smooth and straight
(ii) The sum of the vertical deviations above and below the trend line are equal.
(iii) The sum of squares of the vertical deviations from the trend line is as small as
possible.
(iv) The trend line bisects the cycles
Limitation : 1 : This method is highly subjective
: 2 : The trend line drawn cannot have much value
: 3 : It is very time consuming to constant a freehand trend.
B. Smoothing Methods
The objective of smoothing methods is to smoothes out the random
variations due to irregular components of the time series
(i) Moving Averages
It is a quantitative method of forecasting or smoothing a time series by
averaging each successive groups of data values. It is an subjective
method and depends on the length of the period chosen for calculating
moving average.
The moving averages which serve as an estimate of the next periods
value of a variable given a period of length n is expressed as –
Σ {D1
+ Dt-1
+Dt-2
+-----+ Dt- (n+1/
}
Moving average (MAt +1
) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
n
Where – t = current time period; D = actual data which is exchanged each
period and n = length of time period.
• In this method the term “moving” is used because it is obtained by
summing and averaging the values from a given number of periods, each
time deleting the oldest value and adding a new value.
Limitation – It is highly subjective and dependent on the length of period
chosen for calculation of average. The method has three important
limitation.
(a) The increase in size of n increase smoothness of variation but it also
makes the method less sensitive to real changes in the date.
(b) It is difficult to choose the options length of time for which to
compute the moving averages. Moving average can not be found for the
first and last K/2 periods in a k- period moving average.
(c) Moving average cannot pick up trends very well.
Illustration - Calculation of Trend and Short term fluctuations
Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Variable 205 316 340 446 396 450 515 575 495 605
205 316 340 446 396 450 515 575 495 605
(ii) Weighted Moving Averages - In moving average, each observation is
given equal importance (weight) . However, it may be desired to place
more weight (importance) on certain period of time than others. So a
moving average; in which some time periods are weighted differently
than others; is called a weighted moving average. Commonly, the more
recent observations receives the more weight, and the weight decreases
for older data values.
1307/4=326.75
1632/4=408
1807/4=451.75
1936/4=484
2035/4=508.75
2190/4=547.5
1
Weighted moving Average = ⎯ Σ (weight for period n) x (Data value in period n)
Σ weights
Illustration - Forecasting of sales by weighting in past three months
Weight applied 1 2 3 Month Three months ago
Two months ago Last month
X-weighted = 3Mi – 1
+ 2Mi – 2
+ Mi – 3
1
= ⎯ [ 3 × sales last month + 2 × sales in two months ago + 1 × sale in three
6 months ago
march
11
april mey june july aug sep oct nov Dec
20 24 38 42 56 52 40 38 45 40
MONTH SALE THREE MONTH MOVING AVERAGE
MARCH 20
APRIL 24
MAY 38
JUNE 42
JULY 56
AUG 52
SEP 40
0CT 38
(ii) SEMI AVERAGE METHOD
The semi average method permits us to estimate the slope and intercept
of the trend line quite early of a linear function will adequately describe
the data. The trend line is determined simply by means of lower and
upper halves of data. In continuous series these points are determined
at mid point of class interval. The arithmetic mean of the first part is the
intercept value and the slope is determined by the ratio of the difference
in the arithmetic mean of the number of years between them, that is the
change per unit time.
The resultant is the time series is represented by equation
Ỹ = a – bx
When Ỹ = calculated trend value
a = intercept
b = slop value
The equation should always be stated completely with reference to the
year where x = 0 and a description of the units x and y. In the condition
of odd number it is customary to ignore middle time series value.
It may be satisfactory if the trend is linear. If the data deviate much from
linearity the forecast will be biased and less reliable.
ILLUSTRATION
The production of any company
Tons / per year is as follows.
Determine the trend line.
2001 115
2002 120
2003 130
2004 160
2005 145
2006 155
2007 160
2008 155
2009 170
2010 175
To calculate the time series Ỹ = a + bx
∆ y change in series
Slop b = = ∆x change in year
163 – 134 29
= —————— = — = 5.8
2008 – 2003 5
Intercept = a = 134 at 2003
Thus the trend line is Ỹ = 134 + 5.8 x
If we want to predict product in 2012
2003 – 2012 = 9
Ỹ = 134 + 5.8 × 9 = 186.2 tons
Natural Language Processing steps
1. Segmentation:
break the entire document down
into its constituent sentences
with its punctuations like full
stops and commas.
I am in VIIT. I am learning AI at TY.
2. Tokenizing:
I am in VIIT.
I am learning AI at TY
I am learning AI at TY
I am
learning
AI
• Removing Stop Words:
Natural Language Processing steps
I am learning AI at TY
I learning
AI
• Stemming:
It is the process of obtaining the Word Stem of a word. Word Stem gives new words
upon adding affixes to them
learning learn
Lemmatization:
The process of obtaining the Root Stem of a word. Root Stem gives the new base
form of a word that is present in the dictionary and from which the word is derived.
intelligence, intelligent, and intelligently has a root word intelligent, which has a
meaning
• Speech Recognition (is also known as Automatic Speech
Recognition (ASR), or computer speech recognition) is the
process of converting a speech signal to a sequence of words, by
means of an algorithm implemented as a computer program.
• The main goal of speech recognition area is to develop
techniques and systems for speech input to machine.
Speech Recognition
Applications of speech recognition
Problem
Domain
Application Input Pattern classes
Speech/Tele
hphone/Co
mmunicatio
n Sector
Education
assistance
Telephone directory enquiry
without operator
Teaching students of foreign
languages to pronounce
vocabulary correctly. Teaching
overseas students to
pronounce English correctly.
Speech wave
form
Speech wave
form
Spoken words
Spoken words
Speech Recognition
Approaches to speech recognition:
• Acoustic Phonetic Approach
– The earliest approaches to speech recognition were based on
finding speech sounds and providing appropriate labels to these
sounds.
– This is the basis of the acoustic phonetic approach (Hemdal and
Hughes 1967), which postulates that there exist finite, distinctive
phonetic units (phonemes) in spoken language and that these
units are broadly characterized by a set of acoustics properties
that are manifested in the speech signal over time.
– Even though, the acoustic properties of phonetic units are highly
variable, both with speakers and with neighboring sounds, it is
assumed in the acoustic-phonetic approach that the rules
governing the variability are straightforward and can be readily
learned by a machine.
• Artificial Intelligence Approach
• Pattern Recognition Approach
– The pattern-matching approach(Itakura 1975; Rabiner 1989; Rabiner
and Juang 1993) involves two essential steps namely, pattern training
and pattern comparison.
– The essential feature of this approach is that it uses a well formulated
mathematical framework and establishes consistent speech pattern
representations, for reliable pattern comparison, from a set of labeled
training samples via a formal training algorithm.
– A speech pattern representation can be in the form of a speech
template or a statistical model (e.g., a HIDDEN MARKOV MODEL or
HMM) and can be applied to a sound (smaller than a word), a word,
or a phrase.
– In the pattern-comparison stage of the approach, a direct comparison
is made between the unknown speeches (the speech to be
recognized) with each possible pattern learned in the training stage in
order to determine the identity of the unknown according to the
goodness of match of the patterns. The pattern-matching approach
has become the predominant method for speech recognition in the
last six decades
Approaches to speech recognition:
Artificial Intelligence approach (Knowledge
Based approach)
• The Artificial Intelligence approach [97] is a hybrid of
the acoustic phonetic approach and pattern
recognition approach. In this, it exploits the ideas and
concepts of Acoustic phonetic and pattern
recognition methods.
• Knowledge based approach uses the information
regarding linguistic, phonetic and spectrogram.
Perceptron
• A neural network link that contains computations to track features and
uses Artificial Intelligence in the input data is known as Perceptron.
• This neural links to the artificial neurons using simple logic gates with
binary outputs.
• An artificial neuron invokes the mathematical function and has node,
input, weights, and output equivalent to the cell nucleus, dendrites,
synapse, and axon, respectively, compared to a biological neuron.
Perceptron
• Perceptron was introduced by Frank Rosenblatt in 1957. He
proposed a Perceptron learning rule based on the original MCP
neuron. A Perceptron is an algorithm for supervised learning of
binary classifiers. This algorithm enables neurons to learn and
processes elements in the training set one at a time.
• overcomes some of the limitations of the M-P neuron by
introducing the concept of numerical weights (a measure of
importance) for inputs, and a mechanism for learning those
weights. Inputs are no longer limited to boolean values like in the
case of an M-P neuron, it supports real inputs as well which
makes it more useful and generalized.
Perceptron Working
Unit 5
AI Applications
L.A.Bewoor
laxmi.bewoor@viit.ac.in
Department of Computer Engineering
BRACT’S, Vishwakarma Institute of Information Technology, Pune-48
(An Autonomous Institute affiliated to Savitribai Phule Pune University)
(NBA and NAAC accredited, ISO 9001:2015 certified)
Objective/s of this session
Discuss real life applications of AI
1. AI application for NLP
2. AI application for time series analysis
3. AI application for speech recognition
4. AI application for chatbots
5. AI application for perceptron based classifier
Learning Outcome/Course Outcome
Dr. L. A. Bewoor Department of Computer Engineering, VIIT , Pune-48 2
Contents
• Sequential and time series analysis
• Speech Recognizer
• Natural Language Processing
• Chatbots
• Perceptron based classifier
Time series analysis
■ A Time Series is a sequence of measures of a given
phenomenon taken at regular time intervals such as hourly,
daily, weekly, monthly, quarterly, annually, or every so many
years
– Stock series are measures of activity at a point in time
– Flow series are series which are a measure of activity to a date (e.g.
Retail, Current Account Deficit, Balance of Payments)
– price of a particular commodity like gold, silver, any eatables, petrol,
diesel etc.
– rate of interest, The rate of interest for home loans
▪ A set of observations ordered with respect to the successive
time periods is a time series. In other words, the arrangement
of data in accordance with their time of occurrence is a time
series. It is the chronological arrangement of data. Here, time
is just a way in which one can relate the entire phenomenon
to suitable reference points.
• A time series depicts the relationship between two
variables. Time is one of those variables and the second is
any quantitative variable.
Uses of Time Series
• The most important use of studying time series is that it
helps us to predict the future behaviour of the variable
based on past experience
• It is helpful for business planning as it helps in comparing
the actual current performance with the expected one
• From time series, we get to study the past behaviour of the
phenomenon or the variable under consideration
• We can compare the changes in the values of different
variables at different times or places, etc.
Time series analysis
Components for Time Series Analysis
• Trend
• Seasonal Variations
• Cyclic Variations
• Random or Irregular movements
Trend
• The trend shows the general tendency of the data to increase or decrease
during a long period of time. A trend is a smooth, general, long-term,
average tendency. It is not always necessary that the increase or decrease
is in the same direction throughout the given period of time.
• It is observable that the tendencies may increase, decrease or are stable in
different sections of time. But the overall trend must be upward,
downward or stable. The population, agricultural production, items
manufactured, number of births and deaths, number of industry or any
factory, number of schools or colleges are some of its example showing
some kind of tendencies of movement.
Components for Time Series Analysis
• Seasonal Variations
• These are the rhythmic forces which operate in a regular and periodic
manner over a span of less than a year. They have the same or almost the
same pattern during a period of 12 months. This variation will be present
in a time series if the data are recorded hourly, daily, weekly, quarterly, or
monthly.
• These variations come into play either because of the natural forces or
man-made conventions. The various seasons or climatic conditions play an
important role in seasonal variations. Such as production of crops depends
on seasons, the sale of umbrella and raincoats in the rainy season, and the
sale of electric fans and A.C. shoots up in summer seasons.
• The effect of man-made conventions such as some festivals, customs,
habits, fashions, and some occasions like marriage is easily noticeable.
They recur themselves year after year. An upswing in a season should not
be taken as an indicator of better business conditions.
Components for Time Series Analysis
Cyclic Variations
• The variations in a time series which operate themselves over a
span of more than one year are the cyclic variations. This
oscillatory movement has a period of oscillation of more than a
year. One complete period is a cycle. This cyclic movement is
sometimes called the ‘Business Cycle’.
• It is a four-phase cycle comprising of the phases of prosperity,
recession, depression, and recovery. The cyclic variation may be
regular are not periodic. The upswings and the downswings in
business depend upon the joint nature of the economic forces and
the interaction between them.
Random or Irregular Movements
• There is another factor which causes the variation in the variable
under study. They are not regular variations and are purely
random or irregular. These fluctuations are unforeseen,
uncontrollable, unpredictable, and are erratic. These forces are
earthquakes, wars, flood, famines, and any other disasters.
Components for Time Series Analysis
Fundamental Rule of Time Series Analysis
• Stationarity is an important concept in the field of time series analysis with
tremendous influence on how the data is perceived and predicted.
• When forecasting or predicting the future, most time series models
assume that each point is independent of one another. The best indication
of this is when the dataset of past instances is stationary.
• For data to be stationary, the statistical properties of a system do not
change over time. This does not mean that the values for each data point
have to be the same, but the overall behavior of the data should remain
constant. From a purely visual assessment, time plots that do not show
trends or seasonality can be considered stationary. More numerical factors
in support of stationarity include a constant mean and a constant variance.
• Non-stationary time series
A non-stationary time series's statistical properties like mean,
variance etc will not be constant over time An example of a
non stationary time series is a series with a trend - something
that grows over time for instance. The sample mean and
variance of such a series will grow as you increase the size of
the sample.
• perform a transformation to convert into a stationary dataset.
The most common transforms are the difference and
logarithmic transform.
Fundamental Rule of Time Series Analysis
Time Series Decomposition
• Additive time series
• Remember the equation for additive time series is
simply: Ot
= Tt
+ St
+ Rt
• Ot
= output
Tt
= trend
St
= seasonality
Rt
= residual
t
= variable representing a particular point in time
• additive = trend + seasonal + residual
Time Series Decomposition
• Multiplicative time series
• Remember the equation for additive time series is
simply: Ot
= Tt
* St
* Rt
• Ot
= output
Tt
= trend
St
= seasonality
Rt
= residual
t
= variable representing a particular point in time
• multiplicative = trend * seasonal * residual
FORCASTING AND TIME SERIES ANALYSIS
The forecasting is based on the past recorded data and help in the
determination of future plan with respect to any desired objective. It helps
in the fixing of strategies.
STRATEGY MAKING DECISION
PLANNED
PERFORMANCE
ANALYSIS
DEVIATION
DESIRED
PERFORMANCE
FORECASTE
TYPES OF FORECAST
1. Demand Forecast – Prediction of demand for products or services.
2. Environmental Forecast – Prediction of social, political and economic changes.
3. Technological Forecast – Prediction of technological changes.
TIMING OF FORECASTS
Forecasts are usually classified accordingly to time period.
1. Short range forecast – commonly one year and usually less than the three
months. Eg purchasing of job scheduling, workforce, production level,
regional production, seasonal production etc.
2. Medium range forecast – commonly one to three years. Eg cash
budgeting, sale planning etc.
3. Long range forecast – commonly three to more years. Eg R and D capital
expenditure, establishment of new plants, facilities of labor etc.
Forecasting Methods
Forecasting methods are based on opinion (quantitative) or judgment
(qualitative). The quantitative methods are further divided into two
namely, time series and casual.
A time series is a set of measurements of a variable that are ordered through
time to time. The time variables does not fluctuate arbitrarily. It moves
uniformly always in the same direction.
The time series forecasting methods attempt to account for changes over a
period of time at regular intervals by examining patterns, cycles or trends to
product the outcome for a future time period.
Causal methods are based on the assumptions that the variable value under
consideration has a cause effect relationship with one or more other values.
Methods of Forecasting
1. Define objective
2. Select the variable of interest
3. Determine the time for forecasting
4. Select appropriate model
5. Collect the relevant data
6. Make the forecast
TYPES OF FORECASTING TECHNIQUES
A fixed and suitable technique for forecasting is primary necessity for the
validity of forecasts. In last few decades some forecasting techniques have
been developed and can be classified into three broad categories.
1. NAÏVE METHODS –
It is based on the assumption that future is just an extension of past.
2. BAROMETRIC METHODS –
It is based on assumption that forecast can be made on the basis of
certain happenings on the past. In this method a factor dependent series
has been constructed and there after statistical analysis can yields
forecast.
3. ANALYTICAL METHODS –
It is based on the analysis of causative forces operative on the variable to
be forecasted. Analytical techniques may be non-mathematical like factor
listing or opinion or mathematical.
TIME SERIES ANALYSIS
A time series is orderly arranged numerical values of desired variables with
respect to time. It is represented both in tabular as well as graphical manner.
Objectives : 1 : To identify the pattern and isolate the influencing factors (or
effects) for prediction purpose as well as for future planning and control.
: 2 : To review and evaluate plan progress
Pattern : It is assumed that time series data consists of an uniform pattern
with random fluctuations.
• Actual value of variable per unit time
= Mean value of variable per unit time + Random deviation/unit time
Ŷ = (r) pattern + e
Components :
1 : Trend – Sometimes a time series displayed either upward or downward
movements in the average value of the variable of interest.
2 : Cycles – An upward or downward movements in the variable of interest over
a period of time. It may has four phases peak, contradiction, trough and
expansion
3 : Seasonal – An upward and downward movements within year and follow
regular pattern.
4 : Irregular – rapid upward or downward movements caused by short term
unanticipated and non-recurring factors.
Time Series Methods - The available data of time series is used for the
mathematical analysis to derive future inferences. These processes have
limitations that they have no accurate future values. This limitations of the
time series approach is taken care by the application of causal methods. The
time series methods are as follows -
A. Freehand Methods
B. Smoothing Methods – Smoothing is a process that often improves our ability to
forecast series by reducing the impact of noise
(i) Moving Averages (ii) Weighted Moving Averages (iii) Semi Averages.
C. Exponential Smoothing Methods – (i) Simple exponential Smoothing (ii) Adjusted
Exponential Smoothing
D. Quadratic Trend Model
A. Freehand Methods
A freehand curve draws as a straight line from value of lowest time limit to
value of highest time limit of series. The forecast can be obtained simply by
extending the trend line. A trend line fitted by the freehand method should
confirmed the conditions mentioned below.
(i) It is smooth and straight
(ii) The sum of the vertical deviations above and below the trend line are equal.
(iii) The sum of squares of the vertical deviations from the trend line is as small as
possible.
(iv) The trend line bisects the cycles
Limitation : 1 : This method is highly subjective
: 2 : The trend line drawn cannot have much value
: 3 : It is very time consuming to constant a freehand trend.
B. Smoothing Methods
The objective of smoothing methods is to smoothes out the random
variations due to irregular components of the time series
(i) Moving Averages
It is a quantitative method of forecasting or smoothing a time series by
averaging each successive groups of data values. It is an subjective
method and depends on the length of the period chosen for calculating
moving average.
The moving averages which serve as an estimate of the next periods
value of a variable given a period of length n is expressed as –
Σ {D1
+ Dt-1
+Dt-2
+-----+ Dt- (n+1/
}
Moving average (MAt +1
) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
n
Where – t = current time period; D = actual data which is exchanged each
period and n = length of time period.
• In this method the term “moving” is used because it is obtained by
summing and averaging the values from a given number of periods, each
time deleting the oldest value and adding a new value.
Limitation – It is highly subjective and dependent on the length of period
chosen for calculation of average. The method has three important
limitation.
(a) The increase in size of n increase smoothness of variation but it also
makes the method less sensitive to real changes in the date.
(b) It is difficult to choose the options length of time for which to
compute the moving averages. Moving average can not be found for the
first and last K/2 periods in a k- period moving average.
(c) Moving average cannot pick up trends very well.
Illustration - Calculation of Trend and Short term fluctuations
Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Variable 205 316 340 446 396 450 515 575 495 605
205 316 340 446 396 450 515 575 495 605
(ii) Weighted Moving Averages - In moving average, each observation is
given equal importance (weight) . However, it may be desired to place
more weight (importance) on certain period of time than others. So a
moving average; in which some time periods are weighted differently
than others; is called a weighted moving average. Commonly, the more
recent observations receives the more weight, and the weight decreases
for older data values.
1307/4=326.75
1632/4=408
1807/4=451.75
1936/4=484
2035/4=508.75
2190/4=547.5
1
Weighted moving Average = ⎯ Σ (weight for period n) x (Data value in period n)
Σ weights
Illustration - Forecasting of sales by weighting in past three months
Weight applied 1 2 3 Month Three months ago
Two months ago Last month
X-weighted = 3Mi – 1
+ 2Mi – 2
+ Mi – 3
1
= ⎯ [ 3 × sales last month + 2 × sales in two months ago + 1 × sale in three
6 months ago
march
11
april mey june july aug sep oct nov Dec
20 24 38 42 56 52 40 38 45 40
MONTH SALE THREE MONTH MOVING AVERAGE
MARCH 20
APRIL 24
MAY 38
JUNE 42
JULY 56
AUG 52
SEP 40
0CT 38
(ii) SEMI AVERAGE METHOD
The semi average method permits us to estimate the slope and intercept
of the trend line quite early of a linear function will adequately describe
the data. The trend line is determined simply by means of lower and
upper halves of data. In continuous series these points are determined
at mid point of class interval. The arithmetic mean of the first part is the
intercept value and the slope is determined by the ratio of the difference
in the arithmetic mean of the number of years between them, that is the
change per unit time.
The resultant is the time series is represented by equation
Ỹ = a – bx
When Ỹ = calculated trend value
a = intercept
b = slop value
The equation should always be stated completely with reference to the
year where x = 0 and a description of the units x and y. In the condition
of odd number it is customary to ignore middle time series value.
It may be satisfactory if the trend is linear. If the data deviate much from
linearity the forecast will be biased and less reliable.
ILLUSTRATION
The production of any company
Tons / per year is as follows.
Determine the trend line.
2001 115
2002 120
2003 130
2004 160
2005 145
2006 155
2007 160
2008 155
2009 170
2010 175
To calculate the time series Ỹ = a + bx
∆ y change in series
Slop b = = ∆x change in year
163 – 134 29
= —————— = — = 5.8
2008 – 2003 5
Intercept = a = 134 at 2003
Thus the trend line is Ỹ = 134 + 5.8 x
If we want to predict product in 2012
2003 – 2012 = 9
Ỹ = 134 + 5.8 × 9 = 186.2 tons
Natural Language Processing (NLP)
NLP approaches for Text Analysis
• Conduct basic text processing
• Categorize and tag the words
• Classify text
• Extract information
• Analyze sentence structure
• Build feature vector
• Analyze meaning
NLP Libraries
• Natural Language Toolkit(NLTK)
• GenSim
• SpaCy
• CoreNLP
• TextBlob
• scikit-learn
NLP Components
NLP Phases
Natural Language Processing steps
1. Segmentation:
break the entire document down
into its constituent sentences
with its punctuations like full
stops and commas.
I am in VIIT. I am learning AI at TY.
2. Tokenizing:
I am in VIIT.
I am learning AI at TY
I am learning AI at TY
I am
learning
AI
• Syntactic Analysis
• Removing Stop Words:
Natural Language Processing steps
I am learning AI at TY
I learning
AI
• Stemming:
It is the process of obtaining the Word Stem of a word. Word Stem gives new words
upon adding affixes to them
learning learn
Lemmatization:
The process of obtaining the Root Stem of a word. Root Stem gives the new base
form of a word that is present in the dictionary and from which the word is derived.
intelligence, intelligent, and intelligently has a root word intelligent, which has a
meaning
• POS tagging:
POS stands for parts of speech, which includes Noun,
verb, adverb, and Adjective. It indicates that how a word
functions with its meaning as well as grammatically
within the sentences. A word has one or more parts of
speech based on the context in which it is used.
Semantic Analysis
Semantics involves the use of and meaning behind
words.
Word sense disambiguation. This derives the meaning
of a word based on context. Eg. A pleasant breeze was
experienced at river bank
• Named entity recognition. This determines words that can be
categorized into groups/entities like people, values, locations,
and so on. For example, in the sentence “Mark Zuckerberg is one
of the founders of Facebook, a company from the United
States” we can identify three types of entities:
• “Person”: Mark Zuckerberg
• “Company”: Facebook
• “Location”: United States
• Discourse Integration
Discourse Integration depends upon the sentences that proceeds it
and also invokes the meaning of the sentences that follow it. Eg.
Students were asking for the same.
Pragmatic Analysis
• Pragmatic is last phase of NLP. It helps you to discover the
intended effect by applying a set of rules that characterize
cooperative dialogues. Eg. Shut the door is request not order.
[ 281 ]
Artificial Intelligence
on the Cloud
In this chapter, we are going to learn about the cloud and artificial intelligence
workloads on the cloud. We will discuss the benefits and the risks of migrating AI
projects to the cloud. We will also learn about the offerings provided by the major
cloud providers. We will learn about the services and features that they offer and
hopefully get an understanding of why those providers are the market leaders.
By the end of this chapter, you will have a better understanding of the following:
• The benefits, risks, and costs of migrating to the cloud
• Fundamental cloud concepts such as elasticity
• The top cloud providers
• Amazon Web Services:
° Amazon SageMaker
° Alexa, Lex, and Polly – conversational agents
° Amazon Comprehend – natural language processing
° Amazon Rekognition – image and video
° Amazon Translate
° Amazon machine learning
° Amazon Transcribe – transcription
° Amazon Textract – document analysis
• Microsoft Azure:
° Machine Learning Studio
Artificial Intelligence on the Cloud
[ 282 ]
° Azure Machine Learning interactive workspace
° Azure Cognitive Services
• Google AI and its machine learning products:
° AI Hub
° AI building blocks
Why are companies migrating to
the cloud?
It is hard turning anywhere these days without being hit with the term "the cloud."
Our present-day society has hit a tipping point where businesses big and small are
seeing that the benefits of moving their workloads to the cloud outweigh the costs
and risks. As an example, the US Department of Defense, as of 2019, is in the process
of selecting a cloud provider and awarding a 10-year $10 billion-dollar contract.
Moving your systems to the cloud has many advantages, but one of the main
reasons companies move to the cloud is its elastic capabilities.
When deploying a new project in an on-premises environment, we always start with
capacity planning. Capacity planning is the exercise that enterprises go through to
determine how much hardware they will need for a new system to run efficiently.
Depending on the size of the project, the cost of this hardware can run into the
millions. For that reason, it could take months to complete the process. One of the
reasons it can take so long is because many approvals might be required to complete
the purchase. We can't blame business for being so slow and judicious with these
kinds of decisions.
Even though great planning and thought might go into these purchases, it is not
uncommon to either buy less equipment than required or to buy underpowered
equipment. Maybe just as often, too much equipment is bought or equipment that
is overkill for the project at hand. The reason this happens is because in many cases,
it is difficult to determine demand a priori.
Additionally, even if we get the capacity required properly at the beginning,
the demand might continue to grow and force us to go through the provisioning
process all over again. Or the demand might be variable. For example, we might
have a website that gets a lot of traffic during the day, but demand drops way down
at night. In this case, when using on-premises environments, we have no choice
but to account for the worst-case scenario and buy enough resources so that we
can handle peak periods of demand, but resources will be wasted when demand
decreases in slow periods.
Chapter 12
[ 283 ]
All these issues are non-existent in a cloud environment. All the major cloud
providers, in different ways, provide elastic environments. Not only can we
easily scale up, but we can just as easily scale down.
If we have a website that has variable traffic, we could put the servers that handle
the traffic behind a load balancer and set up alerts that automatically add more
servers to handle traffic spikes and other alerts to terminate the servers once
the storm passes.
The top cloud providers
Given the tsunami that is the cloud, many vendors are vying to quench the demand
for cloud services. However, as is often the case in technology markets, only a few
have bubbled to the top and dominate the space. In this section, we'll analyze the
top players.
Amazon Web Services (AWS)
Amazon Web Services is one of the cloud pioneers. Since it launched in 2006, AWS
has ranked highly in the greatly respected Gartner's Magic Quadrant in both vision
and execution. Since its inception, AWS has held a big chunk of the cloud market.
AWS is an appealing option both for legacy players as well as start-ups. According
to Gartner:
"AWS is the provider most commonly chosen for strategic, organization-wide
adoption"
AWS also has an army of consultants and advisors dedicated to helping its
customers deploy AWS services as well as to teach them how to best leverage
the services available. In summary, it is safe to say that AWS is the most mature,
most advanced cloud provider, with a strong track record of customer success,
as well as a strong stable of partners in AWS Marketplace.
On the flip side, since AWS is the leader and they know it, they are not always the
least expensive option. Another knock for AWS is that since they highly value being
first to market with new services and features, it seems like they are willing to launch
services quickly that might not be fully mature and feature-complete, and work out
the kinks once they are released. In fairness, this is not a tactic exclusive to AWS
and other cloud providers also release beta versions of their services. In addition,
since Amazon competes in markets other than the cloud, it is not uncommon for
some potential customers to go with other providers in order to not "feed the beast."
For example, Walmart is well known for avoiding using AWS at all costs because
of their fierce competition in the e-commerce space.
Artificial Intelligence on the Cloud
[ 284 ]
Microsoft Azure
For the past few years, Microsoft Azure has held the second position in the Gartner
Magic Quadrant, trailing AWS, lagging significantly on their ability to execute better
than AWS. But the good news is that they only trail AWS and they are a strong
number two.
Microsoft's solution is appealing to customers hosting legacy workloads as well
as brand new cloud deployments, but for different reasons.
Legacy workloads are normally run on Azure by clients that have traditionally
been Microsoft customers and are trying to leverage their previous investments
in that technology stack.
For new cloud deployments, Azure cloud services hold appeal because of
Microsoft's strong offerings for application development, specialized Platform
as a Service (PaaS) capabilities, data storage, machine learning, and Internet of
Things (IoT) services.
Enterprises that are strategically committed to the Microsoft technology stack have
been able to deploy many large-scale applications in production. Azure specifically
shines when developers fully commit to the suite of Microsoft products, such as
.NET applications, and then deploy them on Azure. Another reason Microsoft has
deep market penetration is its experienced sales staff and its extensive partner
network.
In addition, Microsoft realizes that the next battle in technology will not revolve
around operating systems but rather in the cloud and they have become increasingly
open to adopting non-Microsoft operating systems. As proof of this, as of now, about
half of Azure workloads run on Linux or other open source operating systems and
technology stacks.
A Gartner report notes "Microsoft has a unique vision for the future that involves bringing
in technology partners through native, first-party offerings such as those from VMware,
NetApp, Red Hat, Cray, and Databricks."
On the downside, there have been some reports of reliability, downtime, and service
disruptions as well as some customers taking issue with the quality of Microsoft's
technical support.
Google Cloud Platform (GCP)
In 2018, Google broke into the prestigious Gartner's leaders' quadrant with its GCP
offering, joining only AWS and Azure in the exclusive club. In 2019, GCP remained
in the same quadrant with its two fierce competitors. However, in terms of market
share, GCP is a distant third.
Chapter 12
[ 285 ]
They recently beefed up their sales staff, they have deep pockets, and they have
a strong incentive to not be left behind so don't discount them yet.
Google's reputation as a leader in machine learning is undisputed so it is no surprise
that GCP has strong big data and machine learning offerings. But GCP is also making
some headway, attracting bigger enterprises looking to host legacy workloads such
as SAP and other traditional customer relationship management (CRMs) systems.
Google's internal innovations around machine learning, automation, containers,
and networking, with offerings such as TensorFlow and Kubernetes, have advanced
cloud development. GPS's technology offerings revolve around their contributions
to open source.
Be careful about centering your cloud strategy exclusively around GCP, however.
In a recent report, Gartner declared:
"Google demonstrates an immaturity of process and procedures when dealing
with enterprise accounts, which can make the company difficult to transact
with at times."
And:
"Google has a much smaller pool of experienced Managed Service Providers (MSP)
and infrastructure-centric professional services partners than other vendors in this
Magic Quadrant."
However, Gartner also states:
"Google is aggressively targeting these shortcomings."
Gartner also notes that Google's channel needs development.
Alibaba Cloud
Alibaba Cloud made its first appearance in Gartner's Magic Quadrant in 2017, and as
of 2019, Alibaba's cloud offering called Aliyun remains in the Niche Player category.
Gartner only evaluated the company's international service, headquartered in
Singapore.
Alibaba Cloud is the market leader in China, and many Chinese businesses, as
well as the Chinese government, have been served well by using Alibaba as their
cloud provider. However, a big part of this market share leadership might be given
up if China ever decides to remove some of the restrictions on other international
cloud vendors.
Artificial Intelligence on the Cloud
[ 286 ]
The company provides support in China for building hybrid clouds. But, outside
of China, it's mostly used by cloud-centric workloads. In 2018, it forged partnerships
with VMware and SAP.
Alibaba has a suite of services that is comparable in scope to the service portfolios
of other global providers.
The company's close relationships with the Alibaba Group helps the cloud service
to be a bridge for international companies looking to do business in China, and out
of China for Chinese companies.
Alibaba does not yet seem to have the service and feature depth of competitors
such as AWS, Azure, and GCP. And in many regions, services are only available
for specific compute instances. They also need to strengthen their MSP ecosystem,
third-party enterprise software integration, and operational tools.
Oracle Cloud Infrastructure (OCI)
In 2017, Oracle's cloud offering made a debut on Gartner's Magic Quadrant as
a Visionary. But in 2018, due to a change to Gartner's evaluation criteria, Oracle
was moved to Niche Player status. It remained there as of 2019.
Oracle Cloud Infrastructure, or OCI, was a second-generation service launched in
2016 to phase out the legacy offering, now referred to as Oracle Cloud Infrastructure
Classic.
OCI offers both virtualized and bare-metal servers, with one-click installation and
configuration of Oracle databases and container services.
OCI appeals to customers with Oracle workloads that don't need more than basic
Infrastructure as a Service (IaaS) capabilities.
Oracle's cloud strategy relies on its applications, database, and middleware.
Oracle has made some headway in attracting talent from other cloud providers to
beef up its offerings. It's also made some progress in winning new business and
getting existing Oracle customers to move to the OCI cloud. However, Oracle still
has a long road ahead of it before it can catch up with the big three.
IBM Cloud
In the mainframe era, IBM was the undisputed computing king of the hill. It lost
that title when we started moving away from mainframes and personal computers
became ubiquitous. IBM is again trying to reclaim a leadership position in this new
paradigm shift. IBM Cloud is IBM's answer to this challenge.
Chapter 12
[ 287 ]
The company's diversified cloud services include container platforms, serverless
services, and PaaS offerings. They are complemented by IBM Cloud Private for
hybrid architectures.
Like some of the other lower-tier cloud providers, IBM appeals to its existing
customers who have a strong preference to purchase most of their technology from
Big Blue (IBM's nickname).
These existing customers usually have traditional workloads. IBM is also leveraging
these long relationships to transition these customers into emerging IBM solutions,
such as Watson's artificial intelligence.
IBM benefits from a large base of existing customers running critical production
services and that are just starting to get comfortable with cloud adoption. This
existing customer base positions IBM well to assist these customers as they embrace
the cloud and begin their transformation journeys.
Like Oracle, IBM is fighting an uphill battle to gain market share from AWS, Azure,
and Google.
Amazon Web Services (AWS)
We'll now focus on the top three cloud providers. As you are probably already
aware, cloud providers offer much more than artificial services, starting with
barebones compute and storage services, all the way to very sophisticated high-
level services. As with everything else in this book, we will specifically drill into
the artificial intelligence and machine learning services that cloud providers offer,
starting with AWS.
Amazon SageMaker
Amazon SageMaker was launched at Amazon's annual re:Invent conference in
Las Vegas, Nevada in 2017. SageMaker is a machine learning platform that enables
developers and data scientists to create, train, and deploy machine learning (ML)
models in the cloud.
A common tool used by data scientists in their day-to-day work is a Jupyter
Notebook. These notebooks are documents that contain a combination of computer
code such as Python, rich text elements such as paragraphs, equations, graphs, and
URLs. Jupyter notebooks can easily be understood by humans because they contain
analysis, descriptions, and results (figures, graphs, tables, and so on), and they are
also executable programs that can be processed online or on a laptop.
Artificial Intelligence on the Cloud
[ 288 ]
You can think of Amazon SageMaker as a Jupyter Notebook on steroids. These are
some of the advantages of SageMaker over traditional Jupyter notebooks. In other
words, these are the different steroid flavors:
• Like many of the machine learning services offered by Amazon, SageMaker
is a fully managed machine learning service so you do not have to worry
about upgrading operating systems or installing drivers.
• Amazon SageMaker provides implementations of some of the most common
machine learning models, but these implementations are highly optimized
and, in some cases, run up to 10 times faster than other implementations
of the same algorithm. In addition, you can bring in your own algorithms
if the machine learning model is not provided out of the box by SageMaker.
• Amazon SageMaker provides the right amount of muscle for a variety of
workloads. The type of machine that can be used to either train or deploy
your algorithm can be selected from the wide variety of machine types
that Amazon provides. If you are just experimenting with SageMaker, you
might decide to use an ml.t2.medium machine, which is one of the smallest
machines you can use with SageMaker. If you require some real power,
you can their accelerated computer instances, such as an ml.p3dn.24xlarge
machine. The power delivered by such an instance is equivalent to what just
a few years ago was considered a supercomputer and would cost millions
of dollars to purchase.
Amazon SageMaker allows developers to increase their productivity across the
entire machine learning pipeline, including:
Data preparation – Amazon SageMaker can seamlessly integrate with many other
AWS services, including S3, RDS, DynamoDB, and Lambda, making it simple
to ingest and prepare data for consumption by machine learning algorithms.
Algorithm selection and training – Out of the box, Amazon SageMaker has a variety
of high-performance, scalable machine learning algorithms optimized for speed and
accuracy. These algorithms can perform training on petabyte-size datasets and can
increase performance by up to 10 times the performance of similar implementations.
These are some of the algorithms that are included with SageMaker:
• BlazingText
• DeepAR forecasting
• Factorization machines
• K-Means
• Random Cut Forest (RCF)
Chapter 12
[ 289 ]
• Object detection
• Image classification
• Neural Topic Model (NTM)
• IP Insights
• K-Nearest Neighbors (k-NN)
• Latent Dirichlet Allocation (LDA)
• Linear Learner
• Object2Vec
• Principal Component Analysis (PCA)
• Semantic segmentation
• Sequence-to-sequence
• XGBoost
Algorithm tuning and optimizing – Amazon SageMaker offers automatic model
tuning, also known as hyperparameter tuning. The tuning finds the best parameter
set for a model by running multiple iterations using the same input dataset running
the same algorithm over a range of specified hyperparameters. As the training jobs
run, a scorecard is kept of the best performing version of the model. The definition
of "best" is based on a pre-defined metric.
As an example, let's assume we are trying to solve a binary classification problem.
The goal is to maximize the area under the curve (AUC) metric of the algorithm by
training an XGBoost algorithm model. We can tune the following hyperparameters
for the algorithm:
• alpha
• eta
• min_child_weight
• max_depth
In order to find the best values for these hyperparameters, we can specify a range of
values for the hyperparameter tuning. A series of training jobs will be kicked off and
the best set of hyperparameters will be stored depending on which version provides
the highest AUC.
Amazon SageMaker's automatic model tuning can be used both with SageMaker's
built-in algorithms as well as with custom algorithms.
Artificial Intelligence on the Cloud
[ 290 ]
Algorithm deployment – Deploying a model in Amazon SageMaker is a two-step
process:
1. Create an endpoint configuration specifying the ML compute instances that
are used to deploy the model.
2. Launching one or more ML compute instances to deploy the model and
exposing the URI to invoke that will allow users to make predictions.
The endpoint configuration API accepts the ML instance type and the initial count
of instances. In the case of neural networks, the configuration may include the type
of GPU-backed instance. The endpoint API provisions the infrastructure as defined
in the previous step.
SageMaker deployment supports both one-off and batch predictions. Batch
predictions make predictions on datasets that can be stored in Amazon S3 or other
AWS storage solutions.
Integration and invocation – Amazon SageMaker provides a variety of ways and
interfaces to interact with the service:
• Web API – Sagemaker has a web API that can be used to control and invoke
a SageMaker server instance.
• SageMaker API – As with other services, Amazon has an API for SageMaker
that supports the following list of programming languages:
° Go
° C++
° Java
° JavaScript
° Python
° PHP
° Ruby
° Java
• Web interface – If you are familiar with Jupyter Notebooks, you will feel
right at home with Amazon SageMaker since the web interface to interact
with SageMaker is Jupyter Notebooks.
• AWS CLI – The AWS command-line interface (CLI).
Chapter 12
[ 291 ]
Alexa, Lex, and Polly – conversational gents
In previous chapters, we discussed Alexa and its increasingly pervasive presence
in homes. We'll now delve into the technologies that power Alexa and allow you
to create your own conversational bots.
Amazon Lex is a service for building conversational agents. Amazon Lex, along
with other chatbots, is our generation's attempt at passing the Turing Test, which
we discussed in previous chapters. It will be a while before anyone confuses a
conversation with Alexa with a human conversation. However, Amazon and other
companies keep on making strides in making these conversations more and more
natural. Amazon Lex, which uses the same technologies that power Amazon Alexa
allows developers to quickly build sophisticated, natural language, conversational
agents or chatbots. For simple cases, it's possible to build some of these chatbots
without any programming. However, it is possible to integrate Lex with other
services in the AWS stack with AWS Lambda as the integration technology.
We will devote a whole chapter to creating chatbots later, so we will keep this
section short for now.
Amazon Comprehend – natural language
processing
Amazon Comprehend is a natural language processing (NLP) service provided
by AWS. It uses machine learning to analyze content, perform entity recognition,
and find implicit and explicit relationships. Companies are starting to realize that
they have valuable information in the mounds of data that they generate every
day. Valuable insights can be ascertained from customer emails, support tickets,
product reviews, call center conversations, and social media interactions. Up until
recently, it was cost-prohibitive to try to obtain these insights, but tools like Amazon
Comprehend make it cost-effective to perform analysis on vast amounts of data.
Another advantage of this service is that is it yet another AWS service that is fully
managed, so there is no need to provision servers, install drivers, and upgrade
software. It is simple to use and deep experience in NLP is not required to quickly
become productive with it.
Like other AWS AI/ML services, Amazon Comprehend integrates with other AWS
services such as AWS Lambda and AWS Glue.
Use cases – Amazon Comprehend can be used to scan documents and identify
patterns in those documents. This capability can be applied to a range of use cases,
such as sentiment analysis, entity extraction, and document organization by topic.
Artificial Intelligence on the Cloud
[ 292 ]
As an example, Amazon Comprehend could analyze text from a social media
interaction with a customer, identify key phrases, and determine whether the
customer's experience was positive or negative.
Console Access – Amazon Comprehend can be accessed from the AWS Management
Console. One of the easiest ways to ingest data into the service is by using Amazon
S3. We can then make a call to the Comprehend service to analyze text for key
phrases and relationships. Comprehend can return a confidence score for each
user request to determine the confidence level of accuracy; the higher the percentage,
the more confident the service is. Comprehend can easily process a single request
or multiple requests in a batch.
Available Application Programming Interface (APIs) – As of this writing,
Comprehend provides six different APIs to enable insights. They are:
• Key phrase Extraction API – Identifies key phrases and terms.
• Sentiment Analysis API – Returns the overall meaning and feeling of the
text, either positive, negative, neutral, or mixed.
• Syntax API – Allows a user to tokenize text to define word boundaries and
label words in their different parts of speech, such as nouns and verbs.
• Entity Recognition API – Identifies and labels different entities in the text,
such as people, places, and companies.
• Language Detection API – Identifies the primary language in which a text
is written. The service can identify over a hundred languages.
• Custom Classification API – Enables a user to build a custom text
classification model.
Industry-specific services – Amazon Comprehend Medical was released at AWS
re:Invent in 2018. It is built specifically for the medical industry and can identify
industry-specific terminology. Comprehend also offers a specific Medical Named
Entity and Relationship Extraction API. AWS does not store or use any text inputs
from Amazon Comprehend Medical for future machine learning training.
Amazon Rekognition – image and video
No, it's not a typo. Amazon named its recognition service with a k and not a c.
Amazon Rekognition can perform image and video analysis and enables users to
add this functionality to their applications. Amazon Rekognition has been pretrained
with millions of labeled images. Because of this, the service can quickly recognize:
• Object types – Chairs, tables, cars, and so on
• Celebrities – Actors, politicians, athletes, and so on
Chapter 12
[ 293 ]
• People – Facial analysis, facial expressions, facial quality, user verification,
and so on
• Text – Recognize an image as text and convert it to text
• Scenes – Dancing, celebrating, eating, and so on
• Inappropriate content – Adult, violent, or visually disturbing content
Amazon Rekognition has already recognized billions of images and videos and it
uses them to continuously get better and better. The application of deep learning
in the domain of image recognition might arguably be the most successful machine
learning application in the last few years and Amazon Rekognition leverages deep
learning to deliver impressive results. To use it, it is not required to have a high level
of machine learning expertise. Amazon Rekognition provides a simple API. To use
it, an image is passed along to the services along with a few parameters and that is it.
Amazon Rekognition will only continue to get better. The more it gets used, the more
inputs it receives, and the more it learns from those inputs. In addition, Amazon
continues to enhance and to add new features and functionality to the service.
Some of the most popular use cases and applications for Amazon Rekognition are:
Object, scene, and activity detection – With Amazon Rekognition, you can identify
thousands of different types of objects (for example, cars, houses, chairs, and so
on) and scenes (for example, cities, malls, beaches, and so on). When analyzing
video, specific activities that are happening in the frame can be identified, such as
"emptying a car trunk" or "children playing."
Gender recognition – Amazon Rekognition can be used to make an educated guess
to determine whether a person in an image is a male or a female. The functionality
should not be used as the sole determinant of a person's gender. It is not meant
to be used in such a way. For example, if a male actor is wearing a long-haired
wig and earrings for a role, they might be identified as a female.
Facial recognition and analysis – One of the uses of facial recognition systems is
to identify and authenticate a person from an image or video. This technology has
been around for decades, but it's not until recently that its application has become
more popular, cheaper, and more available, due in no small part to deep learning
techniques and the ubiquity of services such as Rekognition. Facial recognition
technologies power many of today's applications, such as photo sharing and storage
services and as a second factor in authentication workflows for smartphones.
Once we recognize that an object is a face, we might want to perform further
facial analysis. Some of the attributes that Amazon Rekognition can assist in
determining are:
• Eyes open or closed
Artificial Intelligence on the Cloud
[ 294 ]
• Mood:
° Happy
° Sad
° Angry
° Surprised
° Disgusted
° Calm
° Confused
° Fear
• Hair color
• Eye color
• Beards or mustaches
• Glasses
• Age range
• Gender
• Visual geometry of a face
These detected attributes are useful when there is a need to search through and
organize millions of images in seconds, generating metadata tags such as a person's
mood or to identify a person.
Pathing – The path of a person can be tracked in the scene using Amazon
Rekognition using video files. For example, if we see an image that contains a
person with bags around a trunk, we might not know whether the person is taking
the bags out of the trunk and arriving or if they are putting the bags into the trunk
and leaving. By analyzing the video using pathing, we will be able to make this
determination.
Unsafe content detection – Amazon Rekognition can assist in identifying potentially
unsafe or inappropriate content in images and video content and it can provide
detailed labels that accurately control access to those assets based on previously
determined criteria.
Celebrity recognition – Celebrities and famous people can be quickly identified in
image and video libraries to catalog photos and footage. This functionality can be
used in marketing, advertising, and media industry use cases.
Chapter 12
[ 295 ]
Text in images – Once we identify that an image contains text, it is only natural
to want to convert the letters and words in that image into text. As an example,
if Rekognition is able to not only recognize that an object is a license plate but
additionally convert the image into text, it will then be easy to index that against
Department of Motor Vehicle records and track individuals and their whereabouts.
Amazon Translate
Amazon Translate is another Amazon service that can be used to translate large
amounts of text written in one language to another language. Amazon Translate is
pay-per-use, so you will only be charged when you submit something that needs
translation. As of October 2019, Amazon Translate supports 32 languages:
Language Language Code
Arabic ar
Chinese (Simplified) zh
Chinese (Traditional) zh-TW
Czech cs
Danish da
Dutch nl
English en
Finnish fi
French fr
German de
Greek el
Hebrew he
Hindi hi
Hungarian hu
Indonesian id
Italian it
Japanese ja
Korean ko
Malay ms
Norwegian no
Persian fa
Polish pl
Portuguese pt
Artificial Intelligence on the Cloud
[ 296 ]
Romanian ro
Russian ru
Spanish es
Swedish sv
Thai th
Turkish tr
Ukrainian uk
Urdu ur
Vietnamese vi
With a few exceptions, most of these languages can be translated from one to the
other. Users can also add items to the dictionary to customize the terminology and
include terms that are specific to their organization or use case, such as brand and
product names.
Amazon Translate uses machine learning and a continuous learning model to
improve the performance of its translation over time.
The service can be accessed in three different ways, in the same way that many of the
AWS services can be accessed:
• From the AWS console, to translate small snippets of text and to sample the
service.
• Using the AWS API (supported languages are C++, Go, Java, JavaScript,
.NET, Node.js, PHP, Python, and Ruby).
• Amazon Translate can be accessed via the AWS CLI.
Uses for Amazon Translate
Many companies use Amazon Translate together with other external services.
Additionally, Amazon Translate can be integrated with other AWS services. For
example, Translate can be used in conjunction with Amazon Comprehend to pull
out predetermined entities, sentiments, or keywords from a social media feed and
then translate the extracted terms. In another example, the service can be paired with
Amazon S3 to translate document repositories and speak a translated language with
Amazon Polly.
However, using Amazon Translate does not mean that human translators don't
have a role anymore. Some companies are pairing Amazon Translate with human
translators to increase the speed of the translation process.
Endsem AI merged.pdf
Endsem AI merged.pdf
Endsem AI merged.pdf
Endsem AI merged.pdf
Endsem AI merged.pdf
Endsem AI merged.pdf
Endsem AI merged.pdf
Endsem AI merged.pdf
Endsem AI merged.pdf
Endsem AI merged.pdf
Endsem AI merged.pdf

More Related Content

Similar to Endsem AI merged.pdf

Machine learning - session 4
Machine learning - session 4Machine learning - session 4
Machine learning - session 4Luis Borbon
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxnagarajan740445
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence ApproachesJincy Nelson
 
IRJET- Tracking and Predicting Student Performance using Machine Learning
IRJET- Tracking and Predicting Student Performance using Machine LearningIRJET- Tracking and Predicting Student Performance using Machine Learning
IRJET- Tracking and Predicting Student Performance using Machine LearningIRJET Journal
 
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET Journal
 
a deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarizationa deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarizationJEE HYUN PARK
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood EstimationAvinash Chamwad
 
NPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdfNPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdfMr. Moms
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahulKirtoniya
 
IRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AIIRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AIIRJET Journal
 
A Survey on Different Relevance Feedback Techniques in Content Based Image Re...
A Survey on Different Relevance Feedback Techniques in Content Based Image Re...A Survey on Different Relevance Feedback Techniques in Content Based Image Re...
A Survey on Different Relevance Feedback Techniques in Content Based Image Re...IRJET Journal
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation SystemIRJET Journal
 
Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessingAjay Taneja
 

Similar to Endsem AI merged.pdf (20)

Machine learning - session 4
Machine learning - session 4Machine learning - session 4
Machine learning - session 4
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence Approaches
 
IRJET- Tracking and Predicting Student Performance using Machine Learning
IRJET- Tracking and Predicting Student Performance using Machine LearningIRJET- Tracking and Predicting Student Performance using Machine Learning
IRJET- Tracking and Predicting Student Performance using Machine Learning
 
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI 2
AI: Learning in AI  2AI: Learning in AI  2
AI: Learning in AI 2
 
a deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarizationa deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarization
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimation
 
NPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdfNPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdf
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
 
IRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AIIRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AI
 
A Survey on Different Relevance Feedback Techniques in Content Based Image Re...
A Survey on Different Relevance Feedback Techniques in Content Based Image Re...A Survey on Different Relevance Feedback Techniques in Content Based Image Re...
A Survey on Different Relevance Feedback Techniques in Content Based Image Re...
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessing
 

Recently uploaded

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 

Recently uploaded (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 

Endsem AI merged.pdf

  • 1. Unit 4 Learning L.A.Bewoor laxmi.bewoor@viit.ac.in Department of Computer Engineering BRACT’S, Vishwakarma Institute of Information Technology, Pune-48 (An Autonomous Institute affiliated to Savitribai Phule Pune University) (NBA and NAAC accredited, ISO 9001:2015 certified) Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 2. Objective/s of this session Discuss Learning components and types of learning in AI 1. Differentiate between supervised, unsupervised and reinforcement learning 2. Implement application for supervised, unsupervised and reinforcement algorithm 3. Learn & Implement perceptron & neural networks 4. Learn & Implement Ensemble learning Learning Outcome/Course Outcome Dr. L. A. Bewoor Department of Computer Engineering, VIIT , Pune-48 Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 3. Contents • Sequential and time series analysis • Speech Recognizer • Natural Language Processing • Chatbots • Perceptron based classifier Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 4. Ensemble Learning • Ensemble Learning is a method of reaching a consensus in predictions by fusing the salient properties of two or more models. The final ensemble learning framework is more robust than the individual models that constitute the ensemble because ensembling reduces the variance in the prediction errors Ensemble Learning tries to capture complementary information from its different contributing models—that is, an ensemble framework is successful when the contributing models are statistically diverse. • for example, a model may be well adapted to differentiate between cats and dogs, but not so much when distinguishing between dogs and wolves. On the other hand, a second model can accurately differentiate between dogs and wolves while producing wrong predictions on the “cat” class. An ensemble of these two models might draw a more discriminative decision boundary between all the three classes of the data. • In learning models, noise, variance, and bias are the major sources of error. The ensemble methods in machine learning help minimize these error-causing factors, thereby ensuring the accuracy and stability of machine learning (ML) algorithms. Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 5. Ensemble Learning • we may have trained one cat/dog classifier on high-quality images taken by a professional photographer. In contrast, another classifier has been trained on data using low-quality photos captured on mobile phones. When predicting a new sample, integrating the decisions from both these classifiers will be more robust and bias-free. Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 6. Bias and Variance • Bias is the difference between the predicted value and actual value by the model. Bias is introduced when the model doesn’t consider the variation of data and creates a simple model. • The simple model doesn’t follow the patterns of data, and hence the model gives errors in predicting training as well as testing data i.e. the model with high bias and high variance. • When the model follows even random quirks of data, as pattern of data, then the model might do very well on training dataset i.e. it gives low bias, but it fails on test data and gives high variance. Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 7. • In supervised learning, underfitting happens when a model unable to capture the underlying pattern of the data. These models usually have high bias and low variance. It happens when we have very less amount of data to build an accurate model or when we try to build a linear model with a nonlinear data. Also, these kind of models are very simple to capture the complex patterns in data like Linear and logistic regression. • In supervised learning, overfitting happens when our model captures the noise along with the underlying pattern in data. It happens when we train our model a lot over noisy dataset. These models have low bias and high variance. These models are very complex like Decision trees which are prone to overfitting. Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 8. Ensemble learning techniques 1. Bagging : Bagging tries to implement similar learners on small sample populations and then takes a mean of all the predictions. In generalized bagging, you can use different learners on different population. As you can expect this helps us to reduce the variance error. Acronym “Bootstrap Aggregating” Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 9. • Multiple different training datasets can be prepared, used to estimate a predictive model, and make predictions. Averaging the predictions across the models typically results in better predictions than a single model fit on the training dataset directly. Ensemble learning techniques • Bagging is a parallel method, which means several weak learners learn the data pattern independently and simultaneously • Bagging reduces variance • Popular ensemble methods based on this approach include: Bagged Decision Trees Random Forest Classifiers Extra Trees Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 10. 2. Boosting • instead of parallel processing of data, sequential processing of the dataset occurs. The first classifier is fed with the entire dataset, and the predictions are analyzed. • The instances where Classifier-1 fails to produce correct predictions (that are samples near the decision boundary of the feature space) are fed to the second classifier. • This is done so that Classifier-2 can specifically focus on the problematic areas of feature space and learn an appropriate decision boundary. Similarly, further steps of the same idea are employed, and then the ensemble of all these previous classifiers is computed to make the final prediction on the test data. Ensemble learning techniques Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 11. Ensemble learning techniques Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 12. • The main aim of the boosting method is to reduce bias in the ensemble decision. Thus, the classifiers are chosen for the ensemble usually need to have low variance and high bias, i.e., simpler models with less trainable parameters. – Adaptive Boosting – Stochastic Gradient Boosting – Gradient Boosting Machines Ensemble learning techniques Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 13. 3. Stacking • The stacking ensemble method also involves creating bootstrapped data subsets, like the bagging ensemble mechanism for training multiple models. However, the outputs of all such models are used as an input to another classifier, called meta-classifier, which finally predicts the samples. The intuition behind using two layers of classifiers is to determine whether the training data have been appropriately learned. • For example, in the example of the cat/dog/wolf classifier at the if, say, Classifier-1 can distinguish between cats and dogs, but not between dogs and wolves, the meta-classifier present in the second layer will be able to capture this behavior from classifier-1. The meta classifier can then correct this behavior before making the final prediction. Ensemble learning techniques Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 14. Ensemble learning techniques • Split the training set into two disjoint sets. • Train several base learners on the first part. • Test the base learners on the second part. • Using the predictions from 3) as the inputs, and the correct responses as the outputs, train a higher level learner. Example : Voting Classifier Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 15. Reinforcement Learning (RL) • Drawback of machine learning algorithms: – Need of huge amount of data for training the model – Data may be missing or false or unavailable • Requirement by system – Machines need to learn to perform actions by themselves and not just learn from data. • Reinforcement Learning ▪ Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions. ▪ if the model performs an action that brings it closer to its goal then positive reward or a negative reward if it goes away from its goal. ▪ Returns an optimum solution for a problem by taking a sequence of decisions by itself(without human interference) ▪ Hit and Trail basis ▪ Sequential decision making ▪ Feedback is not instantaneous ▪ type of dynamic programming Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 16. Important Terminologies in RL • Agent: Agent is the model that is being trained via reinforcement learning • Environment: The training situation that the model must optimize to is called its environment • Action: All possible steps that can be taken by the model • State: The current position/ condition returned by the model • Reward: To help the model move in the right direction, it is rewarded/points are given to it to appraise some action • Policy: Policy determines how an agent will behave at any time. It is a strategy applied by the agent for the next action based on the current state. • Value: It is expected long-term retuned with the discount factor and opposite to the short-term reward. • Q-value: It is mostly similar to the value, but it takes one additional parameter as a current action . • Discount factor – helps to adjust the importance of rewards over time. It exponentially decreases the value of later rewards so agents don’t take any action with no long-term impacts
  • 17. RL algorithms Categorization Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 18. Learning models of RL • Markov Decision Process(MDP): • Most Reinforcement Learning tasks can be framed as MDP The following parameters are used to get a solution: – Set of actions- A – Set of states -S – Reward- R – Policy- n – Value- V Mathematically we can express this statement as :(no ‘memory’ ) Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 19. Bellman Equation & Dynamic Programming Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune ɤ between 0 to 1
  • 20. Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune Bellman Equation & Dynamic Programming
  • 21. Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune Solution is the largest value in the array after computing n iterations
  • 22. Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune Q-Learning: Markov Decision Process + Reinforcement Learning Q-Learning is a Reinforcement learning policy that will find the next best action, given a current state. It chooses this action at random and aims to maximize the reward.
  • 23. Q Learning • The objective of the model is to find the best course of action given its current state. To do this, it may come up with rules of its own or it may operate outside the policy given to it to follow. This means that there is no actual need for a policy, hence we call it off-policy. • Model-free means that the agent uses predictions of the environment’s expected response to move forward. It does not use the reward system to learn, but rather, trial and error. Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 24. Important Terms in Q-Learning • States: The State, S, represents the current position of an agent in an environment. • Action: The Action, A, is the step taken by the agent when it is in a particular state. • Rewards: For every action, the agent will get a positive or negative reward. • Episodes: When an agent ends up in a terminating state and can’t take a new action. • Q-Values: Used to determine how good an Action, A, taken at a particular state, S, is. Q (A, S). • Temporal Difference: A formula used to find the Q-Value by using the value of current state and action and previous state and action. Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 26. Robot Navigation • a robot has to cross a maze and reach the end point. There are mines, and the robot can only move one tile at a time. If the robot steps onto a mine, the robot is dead. The robot has to reach the end point in the shortest time possible. • The scoring/reward system is as below: • The robot loses 1 point at each step. This is done so that the robot takes the shortest path and reaches the goal as fast as possible. • If the robot steps on a mine, the point loss is 100 and the game ends. • If the robot gets power ⚡, it gains 1 point. • If the robot reaches the end goal, the robot gets 100 points.
  • 27. Q Table In the Q-Table, the columns are the actions and the rows are the states. Each Q-table score will be the maximum expected future reward that the robot will get if it takes that action at that state. each value of the Q-table is calculated with the Q-Learning algorithm. The Q-function uses the Bellman equation and takes two inputs: state (s) and action (a).
  • 29. Q Learning • Q is brain of agent. Initialize it with 0 • Set gamma and environment rewards in R • Each episode is one training session • In each training session agent explores enviornment (with R) and receives reward until it reaches goal. • Purpose is to enhance brain represented with Q. More training results in more optimized Q. • Gamma is set between 0 to1. Closer to 0 means agent considers immediate rewards whereas closer to 1 means future rewards • Q(State,Action)= R(State,Action)+ Gamma* max[Q(state,all actions)]
  • 30. Q Learning Algorithm Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 32. Introduction • A perceptron is a neural network unit (an artificial neuron) that does certain computations to detect features or business intelligence in the input data. • It closely resembles with biological neuron • McCullock and Walter Pitts firstly introduced nerve cell as a simple logic gate with binary outputs. • Perceptron is a simple model of biological neuron in the from of ANN. It is supervised learning algorithm designed for binary classifier.
  • 33. Biological Neuron vs Artificial Neuron Multiple signals arrive at the dendrites and are then integrated into the cell body, and, if the accumulated signal exceeds a certain threshold, an output signal is generated that will be passed on by the axon. Cell neuclues(Soma) Dendrites Synapse Axon An artificial neuron is a mathematical function based on a model of biological neurons, where each neuron takes inputs, weighs them separately, sums them up and passes this sum through a nonlinear function to produce output. Node Input Weights Output
  • 34. Artificial Neuron The artificial neuron has the following characteristics: – A neuron is a mathematical function modeled on the working of biological neurons – It is an elementary unit in an artificial neural network – One or more inputs are separately weighted – Inputs are summed and passed through a nonlinear function to produce output – Every neuron holds an internal state called activation signal – Each connection link carries information about the input signal – Every neuron is connected to another neuron via connection link
  • 35. Perceptron • Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron learning rule based on the original MCP neuron. A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm enables neurons to learn and processes elements in the training set one at a time. • There are two types of Perceptrons: • Single layer - Single layer perceptrons can learn only linearly separable patterns • Multilayer - Multilayer perceptrons or feedforward neural networks with two or more layers have the greater processing power • The Perceptron algorithm learns the weights for the input signals in order to draw a linear decision boundary. • It takes an input, aggregates it (weighted sum) and returns 1 only if the aggregated sum is more than some threshold else returns 0.
  • 36.
  • 38. Multiple-Layer Networks and Backpropagation Algorithms Backpropagation is the generalization of the Widrow-Hoff learning rule to multiple-layer networks and nonlinear differentiable transfer functions. Input vectors and the corresponding target vectors are used to train a network until it can approximate a function, associate input vectors with specific output vectors, or classify input vectors in an appropriate way as defined by you.
  • 39. Architecture This section presents the architecture of the network that is most commonly used with the backpropagation algorithm – the multilayer feedforward network
  • 40. Architecture Neuron Model An elementary neuron with R inputs is shown below. Each input is weighted with an appropriate w. The sum of the weighted inputs and the bias forms the input to the transfer function f. Neurons can use any differentiable transfer function f to generate their output.
  • 41. Architecture Neuron Model Transfer Functions (Activition Function) Multilayer networks often use the log-sigmoid transfer function logsig. The function logsig generates outputs between 0 and 1 as the neuron's net input goes from negative to positive infinity
  • 42. Architecture Neuron Model Transfer Functions (Activition Function) Alternatively, multilayer networks can use the tan-sigmoid transfer function-tansig. The function logsig generates outputs between -1 and +1 as the neuron's net input goes from negative to positive infinity
  • 43. Architecture Feedforward Network A single-layer network of S logsig neurons having R inputs is shown below in full detail on the left and with a layer diagram on the right.
  • 44. Architecture Feedforward Network Feedforward networks often have one or more hidden layers of sigmoid neurons followed by an output layer of linear neurons. Multiple layers of neurons with nonlinear transfer functions allow the network to learn nonlinear and linear relationships between input and output vectors. The linear output layer lets the network produce values outside the range -1 to +1. On the other hand, if you want to constrain the outputs of a network (such as between 0 and 1), then the output layer should use a sigmoid transfer function (such as logsig).
  • 45. Learning Algorithm: Backpropagation The following slides describes teaching process of multi-layer neural network employing backpropagation algorithm. To illustrate this process the three layer neural network with two inputs and one output,which is shown in the picture below, is used:
  • 46. Learning Algorithm: Backpropagation Each neuron is composed of two units. First unit adds products of weights coefficients and input signals. The second unit realise nonlinear function, called neuron transfer (activation) function. Signal e is adder output signal, and y = f(e) is output signal of nonlinear element. Signal y is also output signal of neuron.
  • 47. Learning Algorithm: Backpropagation To teach the neural network we need training data set. The training data set consists of input signals (x1 and x2 ) assigned with corresponding target (desired output) z. The network training is an iterative process. In each iteration weights coefficients of nodes are modified using new data from training data set. Modification is calculated using algorithm described below: Each teaching step starts with forcing both input signals from training set. After this stage we can determine output signals values for each neuron in each network layer.
  • 48. Learning Algorithm: Backpropagation Pictures below illustrate how signal is propagating through the network, Symbols w(xm)n represent weights of connections between network input xm and neuron n in input layer. Symbols yn represents output signal of neuron n.
  • 51. Learning Algorithm: Backpropagation Propagation of signals through the hidden layer. Symbols wmn represent weights of connections between output of neuron m and input of neuron n in the next layer.
  • 54. Learning Algorithm: Backpropagation Propagation of signals through the output layer.
  • 55. Learning Algorithm: Backpropagation In the next algorithm step the output signal of the network y is compared with the desired output value (the target), which is found in training data set. The difference is called error signal d of output layer neuron
  • 56. Learning Algorithm: Backpropagation The idea is to propagate error signal d (computed in single teaching step) back to all neurons, which output signals were input for discussed neuron.
  • 57. Learning Algorithm: Backpropagation The idea is to propagate error signal d (computed in single teaching step) back to all neurons, which output signals were input for discussed neuron.
  • 58. Learning Algorithm: Backpropagation The weights' coefficients wmn used to propagate errors back are equal to this used during computing output value. Only the direction of data flow is changed (signals are propagated from output to inputs one after the other). This technique is used for all network layers. If propagated errors came from few neurons they are added. The illustration is below:
  • 59. Learning Algorithm: Backpropagation When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents derivative of neuron activation function (which weights are modified).
  • 60. Learning Algorithm: Backpropagation When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents derivative of neuron activation function (which weights are modified).
  • 61. Learning Algorithm: Backpropagation When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents derivative of neuron activation function (which weights are modified).
  • 62. Thank you Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune
  • 63. Unit 5 AI Applications L.A.Bewoor laxmi.bewoor@viit.ac.in Department of Computer Engineering BRACT’S, Vishwakarma Institute of Information Technology, Pune-48 (An Autonomous Institute affiliated to Savitribai Phule Pune University) (NBA and NAAC accredited, ISO 9001:2015 certified)
  • 64. Objective/s of this session Discuss real life applications of AI Apply AI techniques for real world application 1. AI application for NLP 2. AI application for time series analysis 3. AI application for speech recognition 4. AI application for chatbots 5. AI application for perceptron based classifier Learning Outcome/Course Outcome Dr. L. A. Bewoor Department of Computer Engineering, VIIT , Pune-48 2
  • 65. Contents • Sequential and time series analysis • Speech Recognizer • Natural Language Processing • Chatbots • Perceptron based classifier
  • 66. Time series analysis ■ A Time Series is a sequence of measures of a given phenomenon taken at regular time intervals such as hourly, daily, weekly, monthly, quarterly, annually, or every so many years – Stock series are measures of activity at a point in time – Flow series are series which are a measure of activity to a date (e.g. Retail, Current Account Deficit, Balance of Payments) – price of a particular commodity like gold, silver, any eatables, petrol, diesel etc. – rate of interest, The rate of interest for home loans ▪ A set of observations ordered with respect to the successive time periods is a time series. In other words, the arrangement of data in accordance with their time of occurrence is a time series. It is the chronological arrangement of data. Here, time is just a way in which one can relate the entire phenomenon to suitable reference points.
  • 67. • A time series depicts the relationship between two variables. Time is one of those variables and the second is any quantitative variable. Uses of Time Series • The most important use of studying time series is that it helps us to predict the future behaviour of the variable based on past experience • It is helpful for business planning as it helps in comparing the actual current performance with the expected one • From time series, we get to study the past behaviour of the phenomenon or the variable under consideration • We can compare the changes in the values of different variables at different times or places, etc. Time series analysis
  • 68. Components for Time Series Analysis • Trend • Seasonal Variations • Cyclic Variations • Random or Irregular movements
  • 69. Trend • The trend shows the general tendency of the data to increase or decrease during a long period of time. A trend is a smooth, general, long-term, average tendency. It is not always necessary that the increase or decrease is in the same direction throughout the given period of time. • It is observable that the tendencies may increase, decrease or are stable in different sections of time. But the overall trend must be upward, downward or stable. The population, agricultural production, items manufactured, number of births and deaths, number of industry or any factory, number of schools or colleges are some of its example showing some kind of tendencies of movement. Components for Time Series Analysis
  • 70. • Seasonal Variations • These are the rhythmic forces which operate in a regular and periodic manner over a span of less than a year. They have the same or almost the same pattern during a period of 12 months. This variation will be present in a time series if the data are recorded hourly, daily, weekly, quarterly, or monthly. • These variations come into play either because of the natural forces or man-made conventions. The various seasons or climatic conditions play an important role in seasonal variations. Such as production of crops depends on seasons, the sale of umbrella and raincoats in the rainy season, and the sale of electric fans and A.C. shoots up in summer seasons. • The effect of man-made conventions such as some festivals, customs, habits, fashions, and some occasions like marriage is easily noticeable. They recur themselves year after year. An upswing in a season should not be taken as an indicator of better business conditions. Components for Time Series Analysis
  • 71. Cyclic Variations • The variations in a time series which operate themselves over a span of more than one year are the cyclic variations. This oscillatory movement has a period of oscillation of more than a year. One complete period is a cycle. This cyclic movement is sometimes called the ‘Business Cycle’. • It is a four-phase cycle comprising of the phases of prosperity, recession, depression, and recovery. The cyclic variation may be regular are not periodic. The upswings and the downswings in business depend upon the joint nature of the economic forces and the interaction between them. Random or Irregular Movements • There is another factor which causes the variation in the variable under study. They are not regular variations and are purely random or irregular. These fluctuations are unforeseen, uncontrollable, unpredictable, and are erratic. These forces are earthquakes, wars, flood, famines, and any other disasters. Components for Time Series Analysis
  • 72. Fundamental Rule of Time Series Analysis • Stationarity is an important concept in the field of time series analysis with tremendous influence on how the data is perceived and predicted. • When forecasting or predicting the future, most time series models assume that each point is independent of one another. The best indication of this is when the dataset of past instances is stationary. • For data to be stationary, the statistical properties of a system do not change over time. This does not mean that the values for each data point have to be the same, but the overall behavior of the data should remain constant. From a purely visual assessment, time plots that do not show trends or seasonality can be considered stationary. More numerical factors in support of stationarity include a constant mean and a constant variance.
  • 73. • Non-stationary time series A non-stationary time series's statistical properties like mean, variance etc will not be constant over time An example of a non stationary time series is a series with a trend - something that grows over time for instance. The sample mean and variance of such a series will grow as you increase the size of the sample. • perform a transformation to convert into a stationary dataset. The most common transforms are the difference and logarithmic transform. Fundamental Rule of Time Series Analysis
  • 74. Time Series Decomposition • Additive time series • Remember the equation for additive time series is simply: Ot = Tt + St + Rt • Ot = output Tt = trend St = seasonality Rt = residual t = variable representing a particular point in time • additive = trend + seasonal + residual
  • 75. Time Series Decomposition • Multiplicative time series • Remember the equation for additive time series is simply: Ot = Tt * St * Rt • Ot = output Tt = trend St = seasonality Rt = residual t = variable representing a particular point in time • multiplicative = trend * seasonal * residual
  • 76. FORCASTING AND TIME SERIES ANALYSIS The forecasting is based on the past recorded data and help in the determination of future plan with respect to any desired objective. It helps in the fixing of strategies. STRATEGY MAKING DECISION PLANNED PERFORMANCE ANALYSIS DEVIATION DESIRED PERFORMANCE FORECASTE
  • 77. TYPES OF FORECAST 1. Demand Forecast – Prediction of demand for products or services. 2. Environmental Forecast – Prediction of social, political and economic changes. 3. Technological Forecast – Prediction of technological changes. TIMING OF FORECASTS Forecasts are usually classified accordingly to time period. 1. Short range forecast – commonly one year and usually less than the three months. Eg purchasing of job scheduling, workforce, production level, regional production, seasonal production etc. 2. Medium range forecast – commonly one to three years. Eg cash budgeting, sale planning etc. 3. Long range forecast – commonly three to more years. Eg R and D capital expenditure, establishment of new plants, facilities of labor etc. Forecasting Methods Forecasting methods are based on opinion (quantitative) or judgment (qualitative). The quantitative methods are further divided into two namely, time series and casual.
  • 78. A time series is a set of measurements of a variable that are ordered through time to time. The time variables does not fluctuate arbitrarily. It moves uniformly always in the same direction. The time series forecasting methods attempt to account for changes over a period of time at regular intervals by examining patterns, cycles or trends to product the outcome for a future time period. Causal methods are based on the assumptions that the variable value under consideration has a cause effect relationship with one or more other values. Methods of Forecasting 1. Define objective 2. Select the variable of interest 3. Determine the time for forecasting 4. Select appropriate model 5. Collect the relevant data 6. Make the forecast
  • 79. TYPES OF FORECASTING TECHNIQUES A fixed and suitable technique for forecasting is primary necessity for the validity of forecasts. In last few decades some forecasting techniques have been developed and can be classified into three broad categories. 1. NAÏVE METHODS – It is based on the assumption that future is just an extension of past. 2. BAROMETRIC METHODS – It is based on assumption that forecast can be made on the basis of certain happenings on the past. In this method a factor dependent series has been constructed and there after statistical analysis can yields forecast. 3. ANALYTICAL METHODS – It is based on the analysis of causative forces operative on the variable to be forecasted. Analytical techniques may be non-mathematical like factor listing or opinion or mathematical.
  • 80. TIME SERIES ANALYSIS A time series is orderly arranged numerical values of desired variables with respect to time. It is represented both in tabular as well as graphical manner. Objectives : 1 : To identify the pattern and isolate the influencing factors (or effects) for prediction purpose as well as for future planning and control. : 2 : To review and evaluate plan progress Pattern : It is assumed that time series data consists of an uniform pattern with random fluctuations. • Actual value of variable per unit time = Mean value of variable per unit time + Random deviation/unit time Ŷ = (r) pattern + e Components : 1 : Trend – Sometimes a time series displayed either upward or downward movements in the average value of the variable of interest. 2 : Cycles – An upward or downward movements in the variable of interest over a period of time. It may has four phases peak, contradiction, trough and expansion 3 : Seasonal – An upward and downward movements within year and follow regular pattern. 4 : Irregular – rapid upward or downward movements caused by short term unanticipated and non-recurring factors.
  • 81. Time Series Methods - The available data of time series is used for the mathematical analysis to derive future inferences. These processes have limitations that they have no accurate future values. This limitations of the time series approach is taken care by the application of causal methods. The time series methods are as follows - A. Freehand Methods B. Smoothing Methods – Smoothing is a process that often improves our ability to forecast series by reducing the impact of noise (i) Moving Averages (ii) Weighted Moving Averages (iii) Semi Averages. C. Exponential Smoothing Methods – (i) Simple exponential Smoothing (ii) Adjusted Exponential Smoothing D. Quadratic Trend Model A. Freehand Methods A freehand curve draws as a straight line from value of lowest time limit to value of highest time limit of series. The forecast can be obtained simply by extending the trend line. A trend line fitted by the freehand method should confirmed the conditions mentioned below.
  • 82. (i) It is smooth and straight (ii) The sum of the vertical deviations above and below the trend line are equal. (iii) The sum of squares of the vertical deviations from the trend line is as small as possible. (iv) The trend line bisects the cycles Limitation : 1 : This method is highly subjective : 2 : The trend line drawn cannot have much value : 3 : It is very time consuming to constant a freehand trend. B. Smoothing Methods The objective of smoothing methods is to smoothes out the random variations due to irregular components of the time series (i) Moving Averages It is a quantitative method of forecasting or smoothing a time series by averaging each successive groups of data values. It is an subjective method and depends on the length of the period chosen for calculating moving average. The moving averages which serve as an estimate of the next periods value of a variable given a period of length n is expressed as –
  • 83. Σ {D1 + Dt-1 +Dt-2 +-----+ Dt- (n+1/ } Moving average (MAt +1 ) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ n Where – t = current time period; D = actual data which is exchanged each period and n = length of time period. • In this method the term “moving” is used because it is obtained by summing and averaging the values from a given number of periods, each time deleting the oldest value and adding a new value. Limitation – It is highly subjective and dependent on the length of period chosen for calculation of average. The method has three important limitation. (a) The increase in size of n increase smoothness of variation but it also makes the method less sensitive to real changes in the date. (b) It is difficult to choose the options length of time for which to compute the moving averages. Moving average can not be found for the first and last K/2 periods in a k- period moving average. (c) Moving average cannot pick up trends very well.
  • 84. Illustration - Calculation of Trend and Short term fluctuations Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Variable 205 316 340 446 396 450 515 575 495 605 205 316 340 446 396 450 515 575 495 605 (ii) Weighted Moving Averages - In moving average, each observation is given equal importance (weight) . However, it may be desired to place more weight (importance) on certain period of time than others. So a moving average; in which some time periods are weighted differently than others; is called a weighted moving average. Commonly, the more recent observations receives the more weight, and the weight decreases for older data values. 1307/4=326.75 1632/4=408 1807/4=451.75 1936/4=484 2035/4=508.75 2190/4=547.5
  • 85. 1 Weighted moving Average = ⎯ Σ (weight for period n) x (Data value in period n) Σ weights Illustration - Forecasting of sales by weighting in past three months Weight applied 1 2 3 Month Three months ago Two months ago Last month X-weighted = 3Mi – 1 + 2Mi – 2 + Mi – 3 1 = ⎯ [ 3 × sales last month + 2 × sales in two months ago + 1 × sale in three 6 months ago march 11 april mey june july aug sep oct nov Dec 20 24 38 42 56 52 40 38 45 40
  • 86. MONTH SALE THREE MONTH MOVING AVERAGE MARCH 20 APRIL 24 MAY 38 JUNE 42 JULY 56 AUG 52 SEP 40 0CT 38
  • 87. (ii) SEMI AVERAGE METHOD The semi average method permits us to estimate the slope and intercept of the trend line quite early of a linear function will adequately describe the data. The trend line is determined simply by means of lower and upper halves of data. In continuous series these points are determined at mid point of class interval. The arithmetic mean of the first part is the intercept value and the slope is determined by the ratio of the difference in the arithmetic mean of the number of years between them, that is the change per unit time. The resultant is the time series is represented by equation Ỹ = a – bx When Ỹ = calculated trend value a = intercept b = slop value The equation should always be stated completely with reference to the year where x = 0 and a description of the units x and y. In the condition of odd number it is customary to ignore middle time series value. It may be satisfactory if the trend is linear. If the data deviate much from linearity the forecast will be biased and less reliable.
  • 88. ILLUSTRATION The production of any company Tons / per year is as follows. Determine the trend line. 2001 115 2002 120 2003 130 2004 160 2005 145 2006 155 2007 160 2008 155 2009 170 2010 175
  • 89. To calculate the time series Ỹ = a + bx ∆ y change in series Slop b = = ∆x change in year 163 – 134 29 = —————— = — = 5.8 2008 – 2003 5 Intercept = a = 134 at 2003 Thus the trend line is Ỹ = 134 + 5.8 x If we want to predict product in 2012 2003 – 2012 = 9 Ỹ = 134 + 5.8 × 9 = 186.2 tons
  • 90. Natural Language Processing steps 1. Segmentation: break the entire document down into its constituent sentences with its punctuations like full stops and commas. I am in VIIT. I am learning AI at TY. 2. Tokenizing: I am in VIIT. I am learning AI at TY I am learning AI at TY I am learning AI
  • 91. • Removing Stop Words: Natural Language Processing steps I am learning AI at TY I learning AI • Stemming: It is the process of obtaining the Word Stem of a word. Word Stem gives new words upon adding affixes to them learning learn Lemmatization: The process of obtaining the Root Stem of a word. Root Stem gives the new base form of a word that is present in the dictionary and from which the word is derived. intelligence, intelligent, and intelligently has a root word intelligent, which has a meaning
  • 92. • Speech Recognition (is also known as Automatic Speech Recognition (ASR), or computer speech recognition) is the process of converting a speech signal to a sequence of words, by means of an algorithm implemented as a computer program. • The main goal of speech recognition area is to develop techniques and systems for speech input to machine. Speech Recognition Applications of speech recognition Problem Domain Application Input Pattern classes Speech/Tele hphone/Co mmunicatio n Sector Education assistance Telephone directory enquiry without operator Teaching students of foreign languages to pronounce vocabulary correctly. Teaching overseas students to pronounce English correctly. Speech wave form Speech wave form Spoken words Spoken words
  • 94. Approaches to speech recognition: • Acoustic Phonetic Approach – The earliest approaches to speech recognition were based on finding speech sounds and providing appropriate labels to these sounds. – This is the basis of the acoustic phonetic approach (Hemdal and Hughes 1967), which postulates that there exist finite, distinctive phonetic units (phonemes) in spoken language and that these units are broadly characterized by a set of acoustics properties that are manifested in the speech signal over time. – Even though, the acoustic properties of phonetic units are highly variable, both with speakers and with neighboring sounds, it is assumed in the acoustic-phonetic approach that the rules governing the variability are straightforward and can be readily learned by a machine. • Artificial Intelligence Approach
  • 95. • Pattern Recognition Approach – The pattern-matching approach(Itakura 1975; Rabiner 1989; Rabiner and Juang 1993) involves two essential steps namely, pattern training and pattern comparison. – The essential feature of this approach is that it uses a well formulated mathematical framework and establishes consistent speech pattern representations, for reliable pattern comparison, from a set of labeled training samples via a formal training algorithm. – A speech pattern representation can be in the form of a speech template or a statistical model (e.g., a HIDDEN MARKOV MODEL or HMM) and can be applied to a sound (smaller than a word), a word, or a phrase. – In the pattern-comparison stage of the approach, a direct comparison is made between the unknown speeches (the speech to be recognized) with each possible pattern learned in the training stage in order to determine the identity of the unknown according to the goodness of match of the patterns. The pattern-matching approach has become the predominant method for speech recognition in the last six decades Approaches to speech recognition:
  • 96. Artificial Intelligence approach (Knowledge Based approach) • The Artificial Intelligence approach [97] is a hybrid of the acoustic phonetic approach and pattern recognition approach. In this, it exploits the ideas and concepts of Acoustic phonetic and pattern recognition methods. • Knowledge based approach uses the information regarding linguistic, phonetic and spectrogram.
  • 97. Perceptron • A neural network link that contains computations to track features and uses Artificial Intelligence in the input data is known as Perceptron. • This neural links to the artificial neurons using simple logic gates with binary outputs. • An artificial neuron invokes the mathematical function and has node, input, weights, and output equivalent to the cell nucleus, dendrites, synapse, and axon, respectively, compared to a biological neuron.
  • 98. Perceptron • Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron learning rule based on the original MCP neuron. A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm enables neurons to learn and processes elements in the training set one at a time. • overcomes some of the limitations of the M-P neuron by introducing the concept of numerical weights (a measure of importance) for inputs, and a mechanism for learning those weights. Inputs are no longer limited to boolean values like in the case of an M-P neuron, it supports real inputs as well which makes it more useful and generalized.
  • 100. Unit 5 AI Applications L.A.Bewoor laxmi.bewoor@viit.ac.in Department of Computer Engineering BRACT’S, Vishwakarma Institute of Information Technology, Pune-48 (An Autonomous Institute affiliated to Savitribai Phule Pune University) (NBA and NAAC accredited, ISO 9001:2015 certified)
  • 101. Objective/s of this session Discuss real life applications of AI 1. AI application for NLP 2. AI application for time series analysis 3. AI application for speech recognition 4. AI application for chatbots 5. AI application for perceptron based classifier Learning Outcome/Course Outcome Dr. L. A. Bewoor Department of Computer Engineering, VIIT , Pune-48 2
  • 102. Contents • Sequential and time series analysis • Speech Recognizer • Natural Language Processing • Chatbots • Perceptron based classifier
  • 103. Time series analysis ■ A Time Series is a sequence of measures of a given phenomenon taken at regular time intervals such as hourly, daily, weekly, monthly, quarterly, annually, or every so many years – Stock series are measures of activity at a point in time – Flow series are series which are a measure of activity to a date (e.g. Retail, Current Account Deficit, Balance of Payments) – price of a particular commodity like gold, silver, any eatables, petrol, diesel etc. – rate of interest, The rate of interest for home loans ▪ A set of observations ordered with respect to the successive time periods is a time series. In other words, the arrangement of data in accordance with their time of occurrence is a time series. It is the chronological arrangement of data. Here, time is just a way in which one can relate the entire phenomenon to suitable reference points.
  • 104. • A time series depicts the relationship between two variables. Time is one of those variables and the second is any quantitative variable. Uses of Time Series • The most important use of studying time series is that it helps us to predict the future behaviour of the variable based on past experience • It is helpful for business planning as it helps in comparing the actual current performance with the expected one • From time series, we get to study the past behaviour of the phenomenon or the variable under consideration • We can compare the changes in the values of different variables at different times or places, etc. Time series analysis
  • 105. Components for Time Series Analysis • Trend • Seasonal Variations • Cyclic Variations • Random or Irregular movements
  • 106. Trend • The trend shows the general tendency of the data to increase or decrease during a long period of time. A trend is a smooth, general, long-term, average tendency. It is not always necessary that the increase or decrease is in the same direction throughout the given period of time. • It is observable that the tendencies may increase, decrease or are stable in different sections of time. But the overall trend must be upward, downward or stable. The population, agricultural production, items manufactured, number of births and deaths, number of industry or any factory, number of schools or colleges are some of its example showing some kind of tendencies of movement. Components for Time Series Analysis
  • 107. • Seasonal Variations • These are the rhythmic forces which operate in a regular and periodic manner over a span of less than a year. They have the same or almost the same pattern during a period of 12 months. This variation will be present in a time series if the data are recorded hourly, daily, weekly, quarterly, or monthly. • These variations come into play either because of the natural forces or man-made conventions. The various seasons or climatic conditions play an important role in seasonal variations. Such as production of crops depends on seasons, the sale of umbrella and raincoats in the rainy season, and the sale of electric fans and A.C. shoots up in summer seasons. • The effect of man-made conventions such as some festivals, customs, habits, fashions, and some occasions like marriage is easily noticeable. They recur themselves year after year. An upswing in a season should not be taken as an indicator of better business conditions. Components for Time Series Analysis
  • 108. Cyclic Variations • The variations in a time series which operate themselves over a span of more than one year are the cyclic variations. This oscillatory movement has a period of oscillation of more than a year. One complete period is a cycle. This cyclic movement is sometimes called the ‘Business Cycle’. • It is a four-phase cycle comprising of the phases of prosperity, recession, depression, and recovery. The cyclic variation may be regular are not periodic. The upswings and the downswings in business depend upon the joint nature of the economic forces and the interaction between them. Random or Irregular Movements • There is another factor which causes the variation in the variable under study. They are not regular variations and are purely random or irregular. These fluctuations are unforeseen, uncontrollable, unpredictable, and are erratic. These forces are earthquakes, wars, flood, famines, and any other disasters. Components for Time Series Analysis
  • 109. Fundamental Rule of Time Series Analysis • Stationarity is an important concept in the field of time series analysis with tremendous influence on how the data is perceived and predicted. • When forecasting or predicting the future, most time series models assume that each point is independent of one another. The best indication of this is when the dataset of past instances is stationary. • For data to be stationary, the statistical properties of a system do not change over time. This does not mean that the values for each data point have to be the same, but the overall behavior of the data should remain constant. From a purely visual assessment, time plots that do not show trends or seasonality can be considered stationary. More numerical factors in support of stationarity include a constant mean and a constant variance.
  • 110. • Non-stationary time series A non-stationary time series's statistical properties like mean, variance etc will not be constant over time An example of a non stationary time series is a series with a trend - something that grows over time for instance. The sample mean and variance of such a series will grow as you increase the size of the sample. • perform a transformation to convert into a stationary dataset. The most common transforms are the difference and logarithmic transform. Fundamental Rule of Time Series Analysis
  • 111. Time Series Decomposition • Additive time series • Remember the equation for additive time series is simply: Ot = Tt + St + Rt • Ot = output Tt = trend St = seasonality Rt = residual t = variable representing a particular point in time • additive = trend + seasonal + residual
  • 112. Time Series Decomposition • Multiplicative time series • Remember the equation for additive time series is simply: Ot = Tt * St * Rt • Ot = output Tt = trend St = seasonality Rt = residual t = variable representing a particular point in time • multiplicative = trend * seasonal * residual
  • 113. FORCASTING AND TIME SERIES ANALYSIS The forecasting is based on the past recorded data and help in the determination of future plan with respect to any desired objective. It helps in the fixing of strategies. STRATEGY MAKING DECISION PLANNED PERFORMANCE ANALYSIS DEVIATION DESIRED PERFORMANCE FORECASTE
  • 114. TYPES OF FORECAST 1. Demand Forecast – Prediction of demand for products or services. 2. Environmental Forecast – Prediction of social, political and economic changes. 3. Technological Forecast – Prediction of technological changes. TIMING OF FORECASTS Forecasts are usually classified accordingly to time period. 1. Short range forecast – commonly one year and usually less than the three months. Eg purchasing of job scheduling, workforce, production level, regional production, seasonal production etc. 2. Medium range forecast – commonly one to three years. Eg cash budgeting, sale planning etc. 3. Long range forecast – commonly three to more years. Eg R and D capital expenditure, establishment of new plants, facilities of labor etc. Forecasting Methods Forecasting methods are based on opinion (quantitative) or judgment (qualitative). The quantitative methods are further divided into two namely, time series and casual.
  • 115. A time series is a set of measurements of a variable that are ordered through time to time. The time variables does not fluctuate arbitrarily. It moves uniformly always in the same direction. The time series forecasting methods attempt to account for changes over a period of time at regular intervals by examining patterns, cycles or trends to product the outcome for a future time period. Causal methods are based on the assumptions that the variable value under consideration has a cause effect relationship with one or more other values. Methods of Forecasting 1. Define objective 2. Select the variable of interest 3. Determine the time for forecasting 4. Select appropriate model 5. Collect the relevant data 6. Make the forecast
  • 116. TYPES OF FORECASTING TECHNIQUES A fixed and suitable technique for forecasting is primary necessity for the validity of forecasts. In last few decades some forecasting techniques have been developed and can be classified into three broad categories. 1. NAÏVE METHODS – It is based on the assumption that future is just an extension of past. 2. BAROMETRIC METHODS – It is based on assumption that forecast can be made on the basis of certain happenings on the past. In this method a factor dependent series has been constructed and there after statistical analysis can yields forecast. 3. ANALYTICAL METHODS – It is based on the analysis of causative forces operative on the variable to be forecasted. Analytical techniques may be non-mathematical like factor listing or opinion or mathematical.
  • 117. TIME SERIES ANALYSIS A time series is orderly arranged numerical values of desired variables with respect to time. It is represented both in tabular as well as graphical manner. Objectives : 1 : To identify the pattern and isolate the influencing factors (or effects) for prediction purpose as well as for future planning and control. : 2 : To review and evaluate plan progress Pattern : It is assumed that time series data consists of an uniform pattern with random fluctuations. • Actual value of variable per unit time = Mean value of variable per unit time + Random deviation/unit time Ŷ = (r) pattern + e Components : 1 : Trend – Sometimes a time series displayed either upward or downward movements in the average value of the variable of interest. 2 : Cycles – An upward or downward movements in the variable of interest over a period of time. It may has four phases peak, contradiction, trough and expansion 3 : Seasonal – An upward and downward movements within year and follow regular pattern. 4 : Irregular – rapid upward or downward movements caused by short term unanticipated and non-recurring factors.
  • 118. Time Series Methods - The available data of time series is used for the mathematical analysis to derive future inferences. These processes have limitations that they have no accurate future values. This limitations of the time series approach is taken care by the application of causal methods. The time series methods are as follows - A. Freehand Methods B. Smoothing Methods – Smoothing is a process that often improves our ability to forecast series by reducing the impact of noise (i) Moving Averages (ii) Weighted Moving Averages (iii) Semi Averages. C. Exponential Smoothing Methods – (i) Simple exponential Smoothing (ii) Adjusted Exponential Smoothing D. Quadratic Trend Model A. Freehand Methods A freehand curve draws as a straight line from value of lowest time limit to value of highest time limit of series. The forecast can be obtained simply by extending the trend line. A trend line fitted by the freehand method should confirmed the conditions mentioned below.
  • 119. (i) It is smooth and straight (ii) The sum of the vertical deviations above and below the trend line are equal. (iii) The sum of squares of the vertical deviations from the trend line is as small as possible. (iv) The trend line bisects the cycles Limitation : 1 : This method is highly subjective : 2 : The trend line drawn cannot have much value : 3 : It is very time consuming to constant a freehand trend. B. Smoothing Methods The objective of smoothing methods is to smoothes out the random variations due to irregular components of the time series (i) Moving Averages It is a quantitative method of forecasting or smoothing a time series by averaging each successive groups of data values. It is an subjective method and depends on the length of the period chosen for calculating moving average. The moving averages which serve as an estimate of the next periods value of a variable given a period of length n is expressed as –
  • 120. Σ {D1 + Dt-1 +Dt-2 +-----+ Dt- (n+1/ } Moving average (MAt +1 ) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ n Where – t = current time period; D = actual data which is exchanged each period and n = length of time period. • In this method the term “moving” is used because it is obtained by summing and averaging the values from a given number of periods, each time deleting the oldest value and adding a new value. Limitation – It is highly subjective and dependent on the length of period chosen for calculation of average. The method has three important limitation. (a) The increase in size of n increase smoothness of variation but it also makes the method less sensitive to real changes in the date. (b) It is difficult to choose the options length of time for which to compute the moving averages. Moving average can not be found for the first and last K/2 periods in a k- period moving average. (c) Moving average cannot pick up trends very well.
  • 121. Illustration - Calculation of Trend and Short term fluctuations Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Variable 205 316 340 446 396 450 515 575 495 605 205 316 340 446 396 450 515 575 495 605 (ii) Weighted Moving Averages - In moving average, each observation is given equal importance (weight) . However, it may be desired to place more weight (importance) on certain period of time than others. So a moving average; in which some time periods are weighted differently than others; is called a weighted moving average. Commonly, the more recent observations receives the more weight, and the weight decreases for older data values. 1307/4=326.75 1632/4=408 1807/4=451.75 1936/4=484 2035/4=508.75 2190/4=547.5
  • 122. 1 Weighted moving Average = ⎯ Σ (weight for period n) x (Data value in period n) Σ weights Illustration - Forecasting of sales by weighting in past three months Weight applied 1 2 3 Month Three months ago Two months ago Last month X-weighted = 3Mi – 1 + 2Mi – 2 + Mi – 3 1 = ⎯ [ 3 × sales last month + 2 × sales in two months ago + 1 × sale in three 6 months ago march 11 april mey june july aug sep oct nov Dec 20 24 38 42 56 52 40 38 45 40
  • 123. MONTH SALE THREE MONTH MOVING AVERAGE MARCH 20 APRIL 24 MAY 38 JUNE 42 JULY 56 AUG 52 SEP 40 0CT 38
  • 124. (ii) SEMI AVERAGE METHOD The semi average method permits us to estimate the slope and intercept of the trend line quite early of a linear function will adequately describe the data. The trend line is determined simply by means of lower and upper halves of data. In continuous series these points are determined at mid point of class interval. The arithmetic mean of the first part is the intercept value and the slope is determined by the ratio of the difference in the arithmetic mean of the number of years between them, that is the change per unit time. The resultant is the time series is represented by equation Ỹ = a – bx When Ỹ = calculated trend value a = intercept b = slop value The equation should always be stated completely with reference to the year where x = 0 and a description of the units x and y. In the condition of odd number it is customary to ignore middle time series value. It may be satisfactory if the trend is linear. If the data deviate much from linearity the forecast will be biased and less reliable.
  • 125. ILLUSTRATION The production of any company Tons / per year is as follows. Determine the trend line. 2001 115 2002 120 2003 130 2004 160 2005 145 2006 155 2007 160 2008 155 2009 170 2010 175
  • 126. To calculate the time series Ỹ = a + bx ∆ y change in series Slop b = = ∆x change in year 163 – 134 29 = —————— = — = 5.8 2008 – 2003 5 Intercept = a = 134 at 2003 Thus the trend line is Ỹ = 134 + 5.8 x If we want to predict product in 2012 2003 – 2012 = 9 Ỹ = 134 + 5.8 × 9 = 186.2 tons
  • 128. NLP approaches for Text Analysis • Conduct basic text processing • Categorize and tag the words • Classify text • Extract information • Analyze sentence structure • Build feature vector • Analyze meaning
  • 129. NLP Libraries • Natural Language Toolkit(NLTK) • GenSim • SpaCy • CoreNLP • TextBlob • scikit-learn NLP Components
  • 131. Natural Language Processing steps 1. Segmentation: break the entire document down into its constituent sentences with its punctuations like full stops and commas. I am in VIIT. I am learning AI at TY. 2. Tokenizing: I am in VIIT. I am learning AI at TY I am learning AI at TY I am learning AI
  • 132. • Syntactic Analysis • Removing Stop Words: Natural Language Processing steps I am learning AI at TY I learning AI • Stemming: It is the process of obtaining the Word Stem of a word. Word Stem gives new words upon adding affixes to them learning learn Lemmatization: The process of obtaining the Root Stem of a word. Root Stem gives the new base form of a word that is present in the dictionary and from which the word is derived. intelligence, intelligent, and intelligently has a root word intelligent, which has a meaning
  • 133. • POS tagging: POS stands for parts of speech, which includes Noun, verb, adverb, and Adjective. It indicates that how a word functions with its meaning as well as grammatically within the sentences. A word has one or more parts of speech based on the context in which it is used. Semantic Analysis Semantics involves the use of and meaning behind words. Word sense disambiguation. This derives the meaning of a word based on context. Eg. A pleasant breeze was experienced at river bank
  • 134. • Named entity recognition. This determines words that can be categorized into groups/entities like people, values, locations, and so on. For example, in the sentence “Mark Zuckerberg is one of the founders of Facebook, a company from the United States” we can identify three types of entities: • “Person”: Mark Zuckerberg • “Company”: Facebook • “Location”: United States • Discourse Integration Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning of the sentences that follow it. Eg. Students were asking for the same. Pragmatic Analysis • Pragmatic is last phase of NLP. It helps you to discover the intended effect by applying a set of rules that characterize cooperative dialogues. Eg. Shut the door is request not order.
  • 135. [ 281 ] Artificial Intelligence on the Cloud In this chapter, we are going to learn about the cloud and artificial intelligence workloads on the cloud. We will discuss the benefits and the risks of migrating AI projects to the cloud. We will also learn about the offerings provided by the major cloud providers. We will learn about the services and features that they offer and hopefully get an understanding of why those providers are the market leaders. By the end of this chapter, you will have a better understanding of the following: • The benefits, risks, and costs of migrating to the cloud • Fundamental cloud concepts such as elasticity • The top cloud providers • Amazon Web Services: ° Amazon SageMaker ° Alexa, Lex, and Polly – conversational agents ° Amazon Comprehend – natural language processing ° Amazon Rekognition – image and video ° Amazon Translate ° Amazon machine learning ° Amazon Transcribe – transcription ° Amazon Textract – document analysis • Microsoft Azure: ° Machine Learning Studio
  • 136. Artificial Intelligence on the Cloud [ 282 ] ° Azure Machine Learning interactive workspace ° Azure Cognitive Services • Google AI and its machine learning products: ° AI Hub ° AI building blocks Why are companies migrating to the cloud? It is hard turning anywhere these days without being hit with the term "the cloud." Our present-day society has hit a tipping point where businesses big and small are seeing that the benefits of moving their workloads to the cloud outweigh the costs and risks. As an example, the US Department of Defense, as of 2019, is in the process of selecting a cloud provider and awarding a 10-year $10 billion-dollar contract. Moving your systems to the cloud has many advantages, but one of the main reasons companies move to the cloud is its elastic capabilities. When deploying a new project in an on-premises environment, we always start with capacity planning. Capacity planning is the exercise that enterprises go through to determine how much hardware they will need for a new system to run efficiently. Depending on the size of the project, the cost of this hardware can run into the millions. For that reason, it could take months to complete the process. One of the reasons it can take so long is because many approvals might be required to complete the purchase. We can't blame business for being so slow and judicious with these kinds of decisions. Even though great planning and thought might go into these purchases, it is not uncommon to either buy less equipment than required or to buy underpowered equipment. Maybe just as often, too much equipment is bought or equipment that is overkill for the project at hand. The reason this happens is because in many cases, it is difficult to determine demand a priori. Additionally, even if we get the capacity required properly at the beginning, the demand might continue to grow and force us to go through the provisioning process all over again. Or the demand might be variable. For example, we might have a website that gets a lot of traffic during the day, but demand drops way down at night. In this case, when using on-premises environments, we have no choice but to account for the worst-case scenario and buy enough resources so that we can handle peak periods of demand, but resources will be wasted when demand decreases in slow periods.
  • 137. Chapter 12 [ 283 ] All these issues are non-existent in a cloud environment. All the major cloud providers, in different ways, provide elastic environments. Not only can we easily scale up, but we can just as easily scale down. If we have a website that has variable traffic, we could put the servers that handle the traffic behind a load balancer and set up alerts that automatically add more servers to handle traffic spikes and other alerts to terminate the servers once the storm passes. The top cloud providers Given the tsunami that is the cloud, many vendors are vying to quench the demand for cloud services. However, as is often the case in technology markets, only a few have bubbled to the top and dominate the space. In this section, we'll analyze the top players. Amazon Web Services (AWS) Amazon Web Services is one of the cloud pioneers. Since it launched in 2006, AWS has ranked highly in the greatly respected Gartner's Magic Quadrant in both vision and execution. Since its inception, AWS has held a big chunk of the cloud market. AWS is an appealing option both for legacy players as well as start-ups. According to Gartner: "AWS is the provider most commonly chosen for strategic, organization-wide adoption" AWS also has an army of consultants and advisors dedicated to helping its customers deploy AWS services as well as to teach them how to best leverage the services available. In summary, it is safe to say that AWS is the most mature, most advanced cloud provider, with a strong track record of customer success, as well as a strong stable of partners in AWS Marketplace. On the flip side, since AWS is the leader and they know it, they are not always the least expensive option. Another knock for AWS is that since they highly value being first to market with new services and features, it seems like they are willing to launch services quickly that might not be fully mature and feature-complete, and work out the kinks once they are released. In fairness, this is not a tactic exclusive to AWS and other cloud providers also release beta versions of their services. In addition, since Amazon competes in markets other than the cloud, it is not uncommon for some potential customers to go with other providers in order to not "feed the beast." For example, Walmart is well known for avoiding using AWS at all costs because of their fierce competition in the e-commerce space.
  • 138. Artificial Intelligence on the Cloud [ 284 ] Microsoft Azure For the past few years, Microsoft Azure has held the second position in the Gartner Magic Quadrant, trailing AWS, lagging significantly on their ability to execute better than AWS. But the good news is that they only trail AWS and they are a strong number two. Microsoft's solution is appealing to customers hosting legacy workloads as well as brand new cloud deployments, but for different reasons. Legacy workloads are normally run on Azure by clients that have traditionally been Microsoft customers and are trying to leverage their previous investments in that technology stack. For new cloud deployments, Azure cloud services hold appeal because of Microsoft's strong offerings for application development, specialized Platform as a Service (PaaS) capabilities, data storage, machine learning, and Internet of Things (IoT) services. Enterprises that are strategically committed to the Microsoft technology stack have been able to deploy many large-scale applications in production. Azure specifically shines when developers fully commit to the suite of Microsoft products, such as .NET applications, and then deploy them on Azure. Another reason Microsoft has deep market penetration is its experienced sales staff and its extensive partner network. In addition, Microsoft realizes that the next battle in technology will not revolve around operating systems but rather in the cloud and they have become increasingly open to adopting non-Microsoft operating systems. As proof of this, as of now, about half of Azure workloads run on Linux or other open source operating systems and technology stacks. A Gartner report notes "Microsoft has a unique vision for the future that involves bringing in technology partners through native, first-party offerings such as those from VMware, NetApp, Red Hat, Cray, and Databricks." On the downside, there have been some reports of reliability, downtime, and service disruptions as well as some customers taking issue with the quality of Microsoft's technical support. Google Cloud Platform (GCP) In 2018, Google broke into the prestigious Gartner's leaders' quadrant with its GCP offering, joining only AWS and Azure in the exclusive club. In 2019, GCP remained in the same quadrant with its two fierce competitors. However, in terms of market share, GCP is a distant third.
  • 139. Chapter 12 [ 285 ] They recently beefed up their sales staff, they have deep pockets, and they have a strong incentive to not be left behind so don't discount them yet. Google's reputation as a leader in machine learning is undisputed so it is no surprise that GCP has strong big data and machine learning offerings. But GCP is also making some headway, attracting bigger enterprises looking to host legacy workloads such as SAP and other traditional customer relationship management (CRMs) systems. Google's internal innovations around machine learning, automation, containers, and networking, with offerings such as TensorFlow and Kubernetes, have advanced cloud development. GPS's technology offerings revolve around their contributions to open source. Be careful about centering your cloud strategy exclusively around GCP, however. In a recent report, Gartner declared: "Google demonstrates an immaturity of process and procedures when dealing with enterprise accounts, which can make the company difficult to transact with at times." And: "Google has a much smaller pool of experienced Managed Service Providers (MSP) and infrastructure-centric professional services partners than other vendors in this Magic Quadrant." However, Gartner also states: "Google is aggressively targeting these shortcomings." Gartner also notes that Google's channel needs development. Alibaba Cloud Alibaba Cloud made its first appearance in Gartner's Magic Quadrant in 2017, and as of 2019, Alibaba's cloud offering called Aliyun remains in the Niche Player category. Gartner only evaluated the company's international service, headquartered in Singapore. Alibaba Cloud is the market leader in China, and many Chinese businesses, as well as the Chinese government, have been served well by using Alibaba as their cloud provider. However, a big part of this market share leadership might be given up if China ever decides to remove some of the restrictions on other international cloud vendors.
  • 140. Artificial Intelligence on the Cloud [ 286 ] The company provides support in China for building hybrid clouds. But, outside of China, it's mostly used by cloud-centric workloads. In 2018, it forged partnerships with VMware and SAP. Alibaba has a suite of services that is comparable in scope to the service portfolios of other global providers. The company's close relationships with the Alibaba Group helps the cloud service to be a bridge for international companies looking to do business in China, and out of China for Chinese companies. Alibaba does not yet seem to have the service and feature depth of competitors such as AWS, Azure, and GCP. And in many regions, services are only available for specific compute instances. They also need to strengthen their MSP ecosystem, third-party enterprise software integration, and operational tools. Oracle Cloud Infrastructure (OCI) In 2017, Oracle's cloud offering made a debut on Gartner's Magic Quadrant as a Visionary. But in 2018, due to a change to Gartner's evaluation criteria, Oracle was moved to Niche Player status. It remained there as of 2019. Oracle Cloud Infrastructure, or OCI, was a second-generation service launched in 2016 to phase out the legacy offering, now referred to as Oracle Cloud Infrastructure Classic. OCI offers both virtualized and bare-metal servers, with one-click installation and configuration of Oracle databases and container services. OCI appeals to customers with Oracle workloads that don't need more than basic Infrastructure as a Service (IaaS) capabilities. Oracle's cloud strategy relies on its applications, database, and middleware. Oracle has made some headway in attracting talent from other cloud providers to beef up its offerings. It's also made some progress in winning new business and getting existing Oracle customers to move to the OCI cloud. However, Oracle still has a long road ahead of it before it can catch up with the big three. IBM Cloud In the mainframe era, IBM was the undisputed computing king of the hill. It lost that title when we started moving away from mainframes and personal computers became ubiquitous. IBM is again trying to reclaim a leadership position in this new paradigm shift. IBM Cloud is IBM's answer to this challenge.
  • 141. Chapter 12 [ 287 ] The company's diversified cloud services include container platforms, serverless services, and PaaS offerings. They are complemented by IBM Cloud Private for hybrid architectures. Like some of the other lower-tier cloud providers, IBM appeals to its existing customers who have a strong preference to purchase most of their technology from Big Blue (IBM's nickname). These existing customers usually have traditional workloads. IBM is also leveraging these long relationships to transition these customers into emerging IBM solutions, such as Watson's artificial intelligence. IBM benefits from a large base of existing customers running critical production services and that are just starting to get comfortable with cloud adoption. This existing customer base positions IBM well to assist these customers as they embrace the cloud and begin their transformation journeys. Like Oracle, IBM is fighting an uphill battle to gain market share from AWS, Azure, and Google. Amazon Web Services (AWS) We'll now focus on the top three cloud providers. As you are probably already aware, cloud providers offer much more than artificial services, starting with barebones compute and storage services, all the way to very sophisticated high- level services. As with everything else in this book, we will specifically drill into the artificial intelligence and machine learning services that cloud providers offer, starting with AWS. Amazon SageMaker Amazon SageMaker was launched at Amazon's annual re:Invent conference in Las Vegas, Nevada in 2017. SageMaker is a machine learning platform that enables developers and data scientists to create, train, and deploy machine learning (ML) models in the cloud. A common tool used by data scientists in their day-to-day work is a Jupyter Notebook. These notebooks are documents that contain a combination of computer code such as Python, rich text elements such as paragraphs, equations, graphs, and URLs. Jupyter notebooks can easily be understood by humans because they contain analysis, descriptions, and results (figures, graphs, tables, and so on), and they are also executable programs that can be processed online or on a laptop.
  • 142. Artificial Intelligence on the Cloud [ 288 ] You can think of Amazon SageMaker as a Jupyter Notebook on steroids. These are some of the advantages of SageMaker over traditional Jupyter notebooks. In other words, these are the different steroid flavors: • Like many of the machine learning services offered by Amazon, SageMaker is a fully managed machine learning service so you do not have to worry about upgrading operating systems or installing drivers. • Amazon SageMaker provides implementations of some of the most common machine learning models, but these implementations are highly optimized and, in some cases, run up to 10 times faster than other implementations of the same algorithm. In addition, you can bring in your own algorithms if the machine learning model is not provided out of the box by SageMaker. • Amazon SageMaker provides the right amount of muscle for a variety of workloads. The type of machine that can be used to either train or deploy your algorithm can be selected from the wide variety of machine types that Amazon provides. If you are just experimenting with SageMaker, you might decide to use an ml.t2.medium machine, which is one of the smallest machines you can use with SageMaker. If you require some real power, you can their accelerated computer instances, such as an ml.p3dn.24xlarge machine. The power delivered by such an instance is equivalent to what just a few years ago was considered a supercomputer and would cost millions of dollars to purchase. Amazon SageMaker allows developers to increase their productivity across the entire machine learning pipeline, including: Data preparation – Amazon SageMaker can seamlessly integrate with many other AWS services, including S3, RDS, DynamoDB, and Lambda, making it simple to ingest and prepare data for consumption by machine learning algorithms. Algorithm selection and training – Out of the box, Amazon SageMaker has a variety of high-performance, scalable machine learning algorithms optimized for speed and accuracy. These algorithms can perform training on petabyte-size datasets and can increase performance by up to 10 times the performance of similar implementations. These are some of the algorithms that are included with SageMaker: • BlazingText • DeepAR forecasting • Factorization machines • K-Means • Random Cut Forest (RCF)
  • 143. Chapter 12 [ 289 ] • Object detection • Image classification • Neural Topic Model (NTM) • IP Insights • K-Nearest Neighbors (k-NN) • Latent Dirichlet Allocation (LDA) • Linear Learner • Object2Vec • Principal Component Analysis (PCA) • Semantic segmentation • Sequence-to-sequence • XGBoost Algorithm tuning and optimizing – Amazon SageMaker offers automatic model tuning, also known as hyperparameter tuning. The tuning finds the best parameter set for a model by running multiple iterations using the same input dataset running the same algorithm over a range of specified hyperparameters. As the training jobs run, a scorecard is kept of the best performing version of the model. The definition of "best" is based on a pre-defined metric. As an example, let's assume we are trying to solve a binary classification problem. The goal is to maximize the area under the curve (AUC) metric of the algorithm by training an XGBoost algorithm model. We can tune the following hyperparameters for the algorithm: • alpha • eta • min_child_weight • max_depth In order to find the best values for these hyperparameters, we can specify a range of values for the hyperparameter tuning. A series of training jobs will be kicked off and the best set of hyperparameters will be stored depending on which version provides the highest AUC. Amazon SageMaker's automatic model tuning can be used both with SageMaker's built-in algorithms as well as with custom algorithms.
  • 144. Artificial Intelligence on the Cloud [ 290 ] Algorithm deployment – Deploying a model in Amazon SageMaker is a two-step process: 1. Create an endpoint configuration specifying the ML compute instances that are used to deploy the model. 2. Launching one or more ML compute instances to deploy the model and exposing the URI to invoke that will allow users to make predictions. The endpoint configuration API accepts the ML instance type and the initial count of instances. In the case of neural networks, the configuration may include the type of GPU-backed instance. The endpoint API provisions the infrastructure as defined in the previous step. SageMaker deployment supports both one-off and batch predictions. Batch predictions make predictions on datasets that can be stored in Amazon S3 or other AWS storage solutions. Integration and invocation – Amazon SageMaker provides a variety of ways and interfaces to interact with the service: • Web API – Sagemaker has a web API that can be used to control and invoke a SageMaker server instance. • SageMaker API – As with other services, Amazon has an API for SageMaker that supports the following list of programming languages: ° Go ° C++ ° Java ° JavaScript ° Python ° PHP ° Ruby ° Java • Web interface – If you are familiar with Jupyter Notebooks, you will feel right at home with Amazon SageMaker since the web interface to interact with SageMaker is Jupyter Notebooks. • AWS CLI – The AWS command-line interface (CLI).
  • 145. Chapter 12 [ 291 ] Alexa, Lex, and Polly – conversational gents In previous chapters, we discussed Alexa and its increasingly pervasive presence in homes. We'll now delve into the technologies that power Alexa and allow you to create your own conversational bots. Amazon Lex is a service for building conversational agents. Amazon Lex, along with other chatbots, is our generation's attempt at passing the Turing Test, which we discussed in previous chapters. It will be a while before anyone confuses a conversation with Alexa with a human conversation. However, Amazon and other companies keep on making strides in making these conversations more and more natural. Amazon Lex, which uses the same technologies that power Amazon Alexa allows developers to quickly build sophisticated, natural language, conversational agents or chatbots. For simple cases, it's possible to build some of these chatbots without any programming. However, it is possible to integrate Lex with other services in the AWS stack with AWS Lambda as the integration technology. We will devote a whole chapter to creating chatbots later, so we will keep this section short for now. Amazon Comprehend – natural language processing Amazon Comprehend is a natural language processing (NLP) service provided by AWS. It uses machine learning to analyze content, perform entity recognition, and find implicit and explicit relationships. Companies are starting to realize that they have valuable information in the mounds of data that they generate every day. Valuable insights can be ascertained from customer emails, support tickets, product reviews, call center conversations, and social media interactions. Up until recently, it was cost-prohibitive to try to obtain these insights, but tools like Amazon Comprehend make it cost-effective to perform analysis on vast amounts of data. Another advantage of this service is that is it yet another AWS service that is fully managed, so there is no need to provision servers, install drivers, and upgrade software. It is simple to use and deep experience in NLP is not required to quickly become productive with it. Like other AWS AI/ML services, Amazon Comprehend integrates with other AWS services such as AWS Lambda and AWS Glue. Use cases – Amazon Comprehend can be used to scan documents and identify patterns in those documents. This capability can be applied to a range of use cases, such as sentiment analysis, entity extraction, and document organization by topic.
  • 146. Artificial Intelligence on the Cloud [ 292 ] As an example, Amazon Comprehend could analyze text from a social media interaction with a customer, identify key phrases, and determine whether the customer's experience was positive or negative. Console Access – Amazon Comprehend can be accessed from the AWS Management Console. One of the easiest ways to ingest data into the service is by using Amazon S3. We can then make a call to the Comprehend service to analyze text for key phrases and relationships. Comprehend can return a confidence score for each user request to determine the confidence level of accuracy; the higher the percentage, the more confident the service is. Comprehend can easily process a single request or multiple requests in a batch. Available Application Programming Interface (APIs) – As of this writing, Comprehend provides six different APIs to enable insights. They are: • Key phrase Extraction API – Identifies key phrases and terms. • Sentiment Analysis API – Returns the overall meaning and feeling of the text, either positive, negative, neutral, or mixed. • Syntax API – Allows a user to tokenize text to define word boundaries and label words in their different parts of speech, such as nouns and verbs. • Entity Recognition API – Identifies and labels different entities in the text, such as people, places, and companies. • Language Detection API – Identifies the primary language in which a text is written. The service can identify over a hundred languages. • Custom Classification API – Enables a user to build a custom text classification model. Industry-specific services – Amazon Comprehend Medical was released at AWS re:Invent in 2018. It is built specifically for the medical industry and can identify industry-specific terminology. Comprehend also offers a specific Medical Named Entity and Relationship Extraction API. AWS does not store or use any text inputs from Amazon Comprehend Medical for future machine learning training. Amazon Rekognition – image and video No, it's not a typo. Amazon named its recognition service with a k and not a c. Amazon Rekognition can perform image and video analysis and enables users to add this functionality to their applications. Amazon Rekognition has been pretrained with millions of labeled images. Because of this, the service can quickly recognize: • Object types – Chairs, tables, cars, and so on • Celebrities – Actors, politicians, athletes, and so on
  • 147. Chapter 12 [ 293 ] • People – Facial analysis, facial expressions, facial quality, user verification, and so on • Text – Recognize an image as text and convert it to text • Scenes – Dancing, celebrating, eating, and so on • Inappropriate content – Adult, violent, or visually disturbing content Amazon Rekognition has already recognized billions of images and videos and it uses them to continuously get better and better. The application of deep learning in the domain of image recognition might arguably be the most successful machine learning application in the last few years and Amazon Rekognition leverages deep learning to deliver impressive results. To use it, it is not required to have a high level of machine learning expertise. Amazon Rekognition provides a simple API. To use it, an image is passed along to the services along with a few parameters and that is it. Amazon Rekognition will only continue to get better. The more it gets used, the more inputs it receives, and the more it learns from those inputs. In addition, Amazon continues to enhance and to add new features and functionality to the service. Some of the most popular use cases and applications for Amazon Rekognition are: Object, scene, and activity detection – With Amazon Rekognition, you can identify thousands of different types of objects (for example, cars, houses, chairs, and so on) and scenes (for example, cities, malls, beaches, and so on). When analyzing video, specific activities that are happening in the frame can be identified, such as "emptying a car trunk" or "children playing." Gender recognition – Amazon Rekognition can be used to make an educated guess to determine whether a person in an image is a male or a female. The functionality should not be used as the sole determinant of a person's gender. It is not meant to be used in such a way. For example, if a male actor is wearing a long-haired wig and earrings for a role, they might be identified as a female. Facial recognition and analysis – One of the uses of facial recognition systems is to identify and authenticate a person from an image or video. This technology has been around for decades, but it's not until recently that its application has become more popular, cheaper, and more available, due in no small part to deep learning techniques and the ubiquity of services such as Rekognition. Facial recognition technologies power many of today's applications, such as photo sharing and storage services and as a second factor in authentication workflows for smartphones. Once we recognize that an object is a face, we might want to perform further facial analysis. Some of the attributes that Amazon Rekognition can assist in determining are: • Eyes open or closed
  • 148. Artificial Intelligence on the Cloud [ 294 ] • Mood: ° Happy ° Sad ° Angry ° Surprised ° Disgusted ° Calm ° Confused ° Fear • Hair color • Eye color • Beards or mustaches • Glasses • Age range • Gender • Visual geometry of a face These detected attributes are useful when there is a need to search through and organize millions of images in seconds, generating metadata tags such as a person's mood or to identify a person. Pathing – The path of a person can be tracked in the scene using Amazon Rekognition using video files. For example, if we see an image that contains a person with bags around a trunk, we might not know whether the person is taking the bags out of the trunk and arriving or if they are putting the bags into the trunk and leaving. By analyzing the video using pathing, we will be able to make this determination. Unsafe content detection – Amazon Rekognition can assist in identifying potentially unsafe or inappropriate content in images and video content and it can provide detailed labels that accurately control access to those assets based on previously determined criteria. Celebrity recognition – Celebrities and famous people can be quickly identified in image and video libraries to catalog photos and footage. This functionality can be used in marketing, advertising, and media industry use cases.
  • 149. Chapter 12 [ 295 ] Text in images – Once we identify that an image contains text, it is only natural to want to convert the letters and words in that image into text. As an example, if Rekognition is able to not only recognize that an object is a license plate but additionally convert the image into text, it will then be easy to index that against Department of Motor Vehicle records and track individuals and their whereabouts. Amazon Translate Amazon Translate is another Amazon service that can be used to translate large amounts of text written in one language to another language. Amazon Translate is pay-per-use, so you will only be charged when you submit something that needs translation. As of October 2019, Amazon Translate supports 32 languages: Language Language Code Arabic ar Chinese (Simplified) zh Chinese (Traditional) zh-TW Czech cs Danish da Dutch nl English en Finnish fi French fr German de Greek el Hebrew he Hindi hi Hungarian hu Indonesian id Italian it Japanese ja Korean ko Malay ms Norwegian no Persian fa Polish pl Portuguese pt
  • 150. Artificial Intelligence on the Cloud [ 296 ] Romanian ro Russian ru Spanish es Swedish sv Thai th Turkish tr Ukrainian uk Urdu ur Vietnamese vi With a few exceptions, most of these languages can be translated from one to the other. Users can also add items to the dictionary to customize the terminology and include terms that are specific to their organization or use case, such as brand and product names. Amazon Translate uses machine learning and a continuous learning model to improve the performance of its translation over time. The service can be accessed in three different ways, in the same way that many of the AWS services can be accessed: • From the AWS console, to translate small snippets of text and to sample the service. • Using the AWS API (supported languages are C++, Go, Java, JavaScript, .NET, Node.js, PHP, Python, and Ruby). • Amazon Translate can be accessed via the AWS CLI. Uses for Amazon Translate Many companies use Amazon Translate together with other external services. Additionally, Amazon Translate can be integrated with other AWS services. For example, Translate can be used in conjunction with Amazon Comprehend to pull out predetermined entities, sentiments, or keywords from a social media feed and then translate the extracted terms. In another example, the service can be paired with Amazon S3 to translate document repositories and speak a translated language with Amazon Polly. However, using Amazon Translate does not mean that human translators don't have a role anymore. Some companies are pairing Amazon Translate with human translators to increase the speed of the translation process.