Endsem AI merged.pdf

Unit 4
Learning
L.A.Bewoor
laxmi.bewoor@viit.ac.in
Department of Computer Engineering
BRACT’S, Vishwakarma Institute of Information Technology, Pune-48
(An Autonomous Institute affiliated to Savitribai Phule Pune University)
(NBA and NAAC accredited, ISO 9001:2015 certified)
Dr. L.A.Bewoor, Dept. of Computer Engg. ,VIIT,Pune

Objective/s of this session
Discuss Learning components and types of learning
in AI
1. Differentiate between supervised, unsupervised
and reinforcement learning
2. Implement application for supervised,
unsupervised and reinforcement algorithm
3. Learn & Implement perceptron & neural networks
4. Learn & Implement Ensemble learning
Learning Outcome/Course Outcome
Dr. L. A. Bewoor Department of Computer Engineering, VIIT , Pune-48
Dr. L.A.Bewoor, Dept. of Computer Engg.
,VIIT,Pune

Contents
• Sequential and time series analysis
• Speech Recognizer
• Natural Language Processing
• Chatbots
• Perceptron based classifier
,VIIT,Pune

Ensemble Learning
• Ensemble Learning is a method of reaching a consensus in predictions by
fusing the salient properties of two or more models. The final ensemble
learning framework is more robust than the individual models that
constitute the ensemble because ensembling reduces the variance in the
prediction errors
Ensemble Learning tries to capture complementary information from its
different contributing models—that is, an ensemble framework is
successful when the contributing models are statistically diverse.
• for example, a model may be well adapted to differentiate between cats
and dogs, but not so much when distinguishing between dogs and wolves.
On the other hand, a second model can accurately differentiate between
dogs and wolves while producing wrong predictions on the “cat” class. An
ensemble of these two models might draw a more discriminative decision
boundary between all the three classes of the data.
• In learning models, noise, variance, and bias are the major sources of
error. The ensemble methods in machine learning help minimize these
error-causing factors, thereby ensuring the accuracy and stability of
machine learning (ML) algorithms.

Ensemble Learning
• we may have trained one cat/dog classifier on high-quality images taken by
a professional photographer. In contrast, another classifier has been
trained on data using low-quality photos captured on mobile phones.
When predicting a new sample, integrating the decisions from both these
classifiers will be more robust and bias-free.
,VIIT,Pune

Bias and Variance
• Bias is the difference between the predicted
value and actual value by the model. Bias is
introduced when the model doesn’t consider the
variation of data and creates a simple model.
• The simple model doesn’t follow the patterns of
data, and hence the model gives errors in
predicting training as well as testing data i.e. the
model with high bias and high variance.
• When the model follows even random quirks of
data, as pattern of data, then the model might do
very well on training dataset i.e. it gives low bias,
but it fails on test data and gives high variance.
,VIIT,Pune

• In supervised learning, underfitting happens when a model unable to
capture the underlying pattern of the data. These models usually have
high bias and low variance. It happens when we have very less amount
of data to build an accurate model or when we try to build a linear model
with a nonlinear data. Also, these kind of models are very simple to
capture the complex patterns in data like Linear and logistic regression.
• In supervised learning, overfitting happens when our model captures the
noise along with the underlying pattern in data. It happens when we
train our model a lot over noisy dataset. These models have low bias
and high variance. These models are very complex like Decision trees
which are prone to overfitting.
,VIIT,Pune

Ensemble learning techniques
1. Bagging : Bagging tries to implement similar learners on small
sample populations and then takes a mean of all the predictions.
In generalized bagging, you can use different learners on different
population. As you can expect this helps us to reduce the
variance error. Acronym “Bootstrap Aggregating”
,VIIT,Pune

• Multiple different training datasets can be prepared,
used to estimate a predictive model, and make
predictions. Averaging the predictions across the
models typically results in better predictions than a
single model fit on the training dataset directly.
• Bagging is a parallel method, which means
several weak learners learn the data pattern
independently and simultaneously
• Bagging reduces variance
• Popular ensemble methods based on this
approach include:
Bagged Decision Trees
Random Forest Classifiers
Extra Trees
,VIIT,Pune

2. Boosting
• instead of parallel processing of data, sequential processing of
the dataset occurs. The first classifier is fed with the entire
dataset, and the predictions are analyzed.
• The instances where Classifier-1 fails to produce correct
predictions (that are samples near the decision boundary of
the feature space) are fed to the second classifier.
• This is done so that Classifier-2 can specifically focus on the
problematic areas of feature space and learn an appropriate
decision boundary. Similarly, further steps of the same idea
are employed, and then the ensemble of all these previous
classifiers is computed to make the final prediction on the test
data.
,VIIT,Pune

,VIIT,Pune

• The main aim of the boosting method is to
reduce bias in the ensemble decision. Thus,
the classifiers are chosen for the ensemble
usually need to have low variance and high
bias, i.e., simpler models with less trainable
parameters.
– Adaptive Boosting
– Stochastic Gradient Boosting
– Gradient Boosting Machines
,VIIT,Pune

3. Stacking
• The stacking ensemble method also involves creating
bootstrapped data subsets, like the bagging ensemble
mechanism for training multiple models.
However, the outputs of all such models are used as an
input to another classifier, called meta-classifier, which
finally predicts the samples. The intuition behind using two
layers of classifiers is to determine whether the training
data have been appropriately learned.
• For example, in the example of the cat/dog/wolf classifier at
the if, say, Classifier-1 can distinguish between cats and
dogs, but not between dogs and wolves, the meta-classifier
present in the second layer will be able to capture this
behavior from classifier-1. The meta classifier can then
correct this behavior before making the final prediction.
,VIIT,Pune

• Split the training set into two disjoint sets.
• Train several base learners on the first part.
• Test the base learners on the second part.
• Using the predictions from 3) as the inputs, and the correct
responses as the outputs, train a higher level learner.
Example : Voting Classifier
,VIIT,Pune

Reinforcement Learning (RL)
• Drawback of machine learning algorithms:
– Need of huge amount of data for training the model
– Data may be missing or false or unavailable
• Requirement by system
– Machines need to learn to perform actions by themselves and not just
learn from data.
• Reinforcement Learning
▪ Reinforcement Learning is a feedback-based Machine learning
technique in which an agent learns to behave in an environment by
performing the actions and seeing the results of actions.
▪ if the model performs an action that brings it closer to its goal then
positive reward or a negative reward if it goes away from its goal.
▪ Returns an optimum solution for a problem by taking a sequence of
decisions by itself(without human interference)
▪ Hit and Trail basis
▪ Sequential decision making
▪ Feedback is not instantaneous
▪ type of dynamic programming

Important Terminologies in RL
• Agent: Agent is the model that is being trained via
reinforcement learning
• Environment: The training situation that the model must
optimize to is called its environment
• Action: All possible steps that can be taken by the model
• State: The current position/ condition returned by the
model
• Reward: To help the model move in the right direction, it
is rewarded/points are given to it to appraise some action
• Policy: Policy determines how an agent will behave at any
time. It is a strategy applied by the agent for the next
action based on the current state.
• Value: It is expected long-term retuned with the discount
factor and opposite to the short-term reward.
• Q-value: It is mostly similar to the value, but it takes one
additional parameter as a current action .
• Discount factor – helps to adjust the importance of
rewards over time. It exponentially decreases the value of
later rewards so agents don’t take any action with no
long-term impacts

RL algorithms Categorization
,VIIT,Pune

Learning models of RL
• Markov Decision
Process(MDP):
• Most Reinforcement Learning
tasks can be framed as MDP
The following parameters are
used to get a solution:
– Set of actions- A
– Set of states -S
– Reward- R
– Policy- n
– Value- V
Mathematically we can express this
statement as :(no ‘memory’ )

Bellman Equation & Dynamic Programming
,VIIT,Pune
ɤ between 0 to 1

,VIIT,Pune
Bellman Equation & Dynamic Programming

,VIIT,Pune
Solution is the largest value
in the array after computing
n iterations

,VIIT,Pune
Q-Learning:
Markov Decision Process + Reinforcement Learning
Q-Learning is a Reinforcement learning policy that will find the next best action,
given a current state. It chooses this action at random and aims to maximize
the reward.

Q Learning
• The objective of the model is to find the best
course of action given its current state. To do
this, it may come up with rules of its own or it
may operate outside the policy given to it to
follow. This means that there is no actual need
for a policy, hence we call it off-policy.
• Model-free means that the agent uses
predictions of the environment’s expected
response to move forward. It does not use the
reward system to learn, but rather, trial and
error.
,VIIT,Pune

Important Terms in Q-Learning
• States: The State, S, represents the current position of
an agent in an environment.
• Action: The Action, A, is the step taken by the agent
when it is in a particular state.
• Rewards: For every action, the agent will get a positive
or negative reward.
• Episodes: When an agent ends up in a terminating
state and can’t take a new action.
• Q-Values: Used to determine how good an Action, A,
taken at a particular state, S, is. Q (A, S).
• Temporal Difference: A formula used to find the
Q-Value by using the value of current state and action
and previous state and action.
,VIIT,Pune

Robot Navigation
• a robot has to cross a maze and reach
the end point. There are mines, and the
robot can only move one tile at a time. If
the robot steps onto a mine, the robot is
dead. The robot has to reach the end
point in the shortest time possible.
• The scoring/reward system is as below:
• The robot loses 1 point at each step. This
is done so that the robot takes the
shortest path and reaches the goal as
fast as possible.
• If the robot steps on a mine, the point
loss is 100 and the game ends.
• If the robot gets power ⚡, it gains 1
point.
• If the robot reaches the end goal, the
robot gets 100 points.

Q Table
In the Q-Table, the columns are the actions and the rows are the
states.
Each Q-table score will be the
maximum expected future reward
that the robot will get if it takes
that action at that state.
each value of the Q-table is
calculated with the Q-Learning
algorithm.
The Q-function uses the Bellman
equation and takes two inputs:
state (s) and action (a).

Q Learning
• Q is brain of agent. Initialize it with 0
• Set gamma and environment rewards in R
• Each episode is one training session
• In each training session agent explores
enviornment (with R) and receives reward until it
reaches goal.
• Purpose is to enhance brain represented with Q.
More training results in more optimized Q.
• Gamma is set between 0 to1. Closer to 0 means
agent considers immediate rewards whereas closer
to 1 means future rewards
• Q(State,Action)= R(State,Action)+ Gamma*
max[Q(state,all actions)]

Q Learning Algorithm
,VIIT,Pune

Introduction
• A perceptron is a neural network unit (an
artificial neuron) that does certain computations
to detect features or business intelligence in the
input data.
• It closely resembles with biological neuron
• McCullock and Walter Pitts firstly
introduced nerve cell as a simple logic gate with
binary outputs.
• Perceptron is a simple model of biological neuron
in the from of ANN. It is supervised learning
algorithm designed for binary classifier.

Biological Neuron vs Artificial Neuron
Multiple signals arrive at the dendrites
and are then integrated into the cell
body, and, if the accumulated signal
exceeds a certain threshold, an output
signal is generated that will be passed
on by the axon.
Cell neuclues(Soma)
Dendrites
Synapse
Axon
An artificial neuron is a mathematical function
based on a model of biological neurons, where
each neuron takes inputs, weighs them
separately, sums them up and passes this sum
through a nonlinear function to produce output.
Node
Input
Weights
Output

Artificial Neuron
The artificial neuron has the following
characteristics:
– A neuron is a mathematical function modeled on the
working of biological neurons
– It is an elementary unit in an artificial neural network
– One or more inputs are separately weighted
– Inputs are summed and passed through a nonlinear
function to produce output
– Every neuron holds an internal state called activation
signal
– Each connection link carries information about the
input signal
– Every neuron is connected to another neuron via
connection link

Perceptron
• Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron learning
rule based on the original MCP neuron. A Perceptron is an algorithm for supervised learning
of binary classifiers. This algorithm enables neurons to learn and processes elements in the
training set one at a time.
• There are two types of Perceptrons:
• Single layer - Single layer perceptrons can learn only linearly separable patterns
• Multilayer - Multilayer perceptrons or feedforward neural networks with two or more layers
have the greater processing power
• The Perceptron algorithm learns the weights for the input signals in order to draw a linear
decision boundary.
• It takes an input, aggregates it (weighted sum) and returns 1 only if the aggregated sum is
more than some threshold else returns 0.

Multiple-Layer Networks
and
Backpropagation Algorithms

Multiple-Layer Networks
and
Backpropagation Algorithms
Backpropagation is the generalization of the Widrow-Hoff learning rule to
multiple-layer networks and nonlinear differentiable transfer functions.
Input vectors and the corresponding target vectors are used to train a
network until it can approximate a function, associate input vectors with
specific output vectors, or classify input vectors in an appropriate way as
defined by you.

Architecture
This section presents the architecture of the network that is most
commonly used with the backpropagation algorithm –
the multilayer feedforward network

Architecture
Neuron Model
An elementary neuron with R inputs is shown below. Each input is
weighted with an appropriate w. The sum of the weighted inputs and the
bias forms the input to the transfer function f. Neurons can use any
differentiable transfer function f to generate their output.

Architecture
Neuron Model
Transfer Functions (Activition Function)
Multilayer networks often use the log-sigmoid transfer function logsig.
The function logsig generates outputs between 0 and 1 as the neuron's
net input goes from negative to positive infinity

Architecture
Neuron Model
Transfer Functions (Activition Function)
Alternatively, multilayer networks can use the tan-sigmoid transfer
function-tansig.
The function logsig generates outputs between -1 and +1 as the neuron's
net input goes from negative to positive infinity

Architecture
Feedforward Network
A single-layer network of S logsig neurons having R inputs is shown
below in full detail on the left and with a layer diagram on the right.

Architecture
Feedforward Network
Feedforward networks often have one or more hidden layers of sigmoid neurons followed
by an output layer of linear neurons.
Multiple layers of neurons with nonlinear transfer functions allow the network to learn
nonlinear and linear relationships between input and output vectors.
The linear output layer lets the network produce values outside the range -1 to +1. On the
other hand, if you want to constrain the outputs of a network (such as between 0 and 1),
then the output layer should use a sigmoid transfer function (such as logsig).

Learning Algorithm:
Backpropagation
The following slides describes teaching process of multi-layer neural network
employing backpropagation algorithm. To illustrate this process the three layer neural
network with two inputs and one output,which is shown in the picture below, is used:

Learning Algorithm:
Backpropagation
Each neuron is composed of two units. First unit adds products of weights coefficients and
input signals. The second unit realise nonlinear function, called neuron transfer (activation)
function. Signal e is adder output signal, and y = f(e) is output signal of nonlinear element.
Signal y is also output signal of neuron.

Learning Algorithm:
Backpropagation
To teach the neural network we need training data set. The training data set consists of
input signals (x1
and x2
) assigned with corresponding target (desired output) z.
The network training is an iterative process. In each iteration weights coefficients of nodes
are modified using new data from training data set. Modification is calculated using
algorithm described below:
Each teaching step starts with forcing both input signals from training set. After this stage
we can determine output signals values for each neuron in each network layer.

Learning Algorithm:
Backpropagation
Pictures below illustrate how signal is propagating through the network,
Symbols w(xm)n
represent weights of connections between network input xm
and
neuron n in input layer. Symbols yn
represents output signal of neuron n.

Learning Algorithm:
Backpropagation

Learning Algorithm:
Backpropagation
Propagation of signals through the hidden layer. Symbols wmn
represent weights
of connections between output of neuron m and input of neuron n in the next
layer.

Learning Algorithm:
Backpropagation
Propagation of signals through the output layer.

Learning Algorithm:
Backpropagation
In the next algorithm step the output signal of the network y is compared
with the desired output value (the target), which is found in training data
set. The difference is called error signal d of output layer neuron

Learning Algorithm:
Backpropagation
The idea is to propagate error signal d (computed in single teaching step)
back to all neurons, which output signals were input for discussed
neuron.

Learning Algorithm:
Backpropagation
The weights' coefficients wmn
used to propagate errors back are equal to
this used during computing output value. Only the direction of data flow
is changed (signals are propagated from output to inputs one after the
other). This technique is used for all network layers. If propagated errors
came from few neurons they are added. The illustration is below:

Learning Algorithm:
Backpropagation
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function (which
weights are modified).

Thank you

Unit 5
AI Applications
L.A.Bewoor
laxmi.bewoor@viit.ac.in
Department of Computer Engineering
BRACT’S, Vishwakarma Institute of Information Technology, Pune-48
(An Autonomous Institute affiliated to Savitribai Phule Pune University)
(NBA and NAAC accredited, ISO 9001:2015 certified)

Discuss real life applications of AI
Apply AI techniques for real world application
1. AI application for NLP
2. AI application for time series analysis
3. AI application for speech recognition
4. AI application for chatbots
5. AI application for perceptron based classifier
Dr. L. A. Bewoor Department of Computer Engineering, VIIT , Pune-48 2

Contents
• Sequential and time series analysis
• Speech Recognizer
• Natural Language Processing
• Chatbots
• Perceptron based classifier

Time series analysis
■ A Time Series is a sequence of measures of a given
phenomenon taken at regular time intervals such as hourly,
daily, weekly, monthly, quarterly, annually, or every so many
years
– Stock series are measures of activity at a point in time
– Flow series are series which are a measure of activity to a date (e.g.
Retail, Current Account Deficit, Balance of Payments)
– price of a particular commodity like gold, silver, any eatables, petrol,
diesel etc.
– rate of interest, The rate of interest for home loans
▪ A set of observations ordered with respect to the successive
time periods is a time series. In other words, the arrangement
of data in accordance with their time of occurrence is a time
series. It is the chronological arrangement of data. Here, time
is just a way in which one can relate the entire phenomenon
to suitable reference points.

• A time series depicts the relationship between two
variables. Time is one of those variables and the second is
any quantitative variable.
Uses of Time Series
• The most important use of studying time series is that it
helps us to predict the future behaviour of the variable
based on past experience
• It is helpful for business planning as it helps in comparing
the actual current performance with the expected one
• From time series, we get to study the past behaviour of the
phenomenon or the variable under consideration
• We can compare the changes in the values of different
variables at different times or places, etc.
Time series analysis

Components for Time Series Analysis
• Trend
• Seasonal Variations
• Cyclic Variations
• Random or Irregular movements

Trend
• The trend shows the general tendency of the data to increase or decrease
during a long period of time. A trend is a smooth, general, long-term,
average tendency. It is not always necessary that the increase or decrease
is in the same direction throughout the given period of time.
• It is observable that the tendencies may increase, decrease or are stable in
different sections of time. But the overall trend must be upward,
downward or stable. The population, agricultural production, items
manufactured, number of births and deaths, number of industry or any
factory, number of schools or colleges are some of its example showing
some kind of tendencies of movement.

• Seasonal Variations
• These are the rhythmic forces which operate in a regular and periodic
manner over a span of less than a year. They have the same or almost the
same pattern during a period of 12 months. This variation will be present
in a time series if the data are recorded hourly, daily, weekly, quarterly, or
monthly.
• These variations come into play either because of the natural forces or
man-made conventions. The various seasons or climatic conditions play an
important role in seasonal variations. Such as production of crops depends
on seasons, the sale of umbrella and raincoats in the rainy season, and the
sale of electric fans and A.C. shoots up in summer seasons.
• The effect of man-made conventions such as some festivals, customs,
habits, fashions, and some occasions like marriage is easily noticeable.
They recur themselves year after year. An upswing in a season should not
be taken as an indicator of better business conditions.

Cyclic Variations
• The variations in a time series which operate themselves over a
span of more than one year are the cyclic variations. This
oscillatory movement has a period of oscillation of more than a
year. One complete period is a cycle. This cyclic movement is
sometimes called the ‘Business Cycle’.
• It is a four-phase cycle comprising of the phases of prosperity,
recession, depression, and recovery. The cyclic variation may be
regular are not periodic. The upswings and the downswings in
business depend upon the joint nature of the economic forces and
the interaction between them.
Random or Irregular Movements
• There is another factor which causes the variation in the variable
under study. They are not regular variations and are purely
random or irregular. These fluctuations are unforeseen,
uncontrollable, unpredictable, and are erratic. These forces are
earthquakes, wars, flood, famines, and any other disasters.

Fundamental Rule of Time Series Analysis
• Stationarity is an important concept in the field of time series analysis with
tremendous influence on how the data is perceived and predicted.
• When forecasting or predicting the future, most time series models
assume that each point is independent of one another. The best indication
of this is when the dataset of past instances is stationary.
• For data to be stationary, the statistical properties of a system do not
change over time. This does not mean that the values for each data point
have to be the same, but the overall behavior of the data should remain
constant. From a purely visual assessment, time plots that do not show
trends or seasonality can be considered stationary. More numerical factors
in support of stationarity include a constant mean and a constant variance.

• Non-stationary time series
A non-stationary time series's statistical properties like mean,
variance etc will not be constant over time An example of a
non stationary time series is a series with a trend - something
that grows over time for instance. The sample mean and
variance of such a series will grow as you increase the size of
the sample.
• perform a transformation to convert into a stationary dataset.
The most common transforms are the difference and
logarithmic transform.
Fundamental Rule of Time Series Analysis

Time Series Decomposition
• Additive time series
• Remember the equation for additive time series is
simply: Ot
= Tt
+ St
+ Rt
• Ot
= output
Tt
= trend
St
= seasonality
Rt
= residual
t
= variable representing a particular point in time
• additive = trend + seasonal + residual

Time Series Decomposition
• Multiplicative time series
• Remember the equation for additive time series is
simply: Ot
= Tt
* St
* Rt
• Ot
= output
Tt
= trend
St
= seasonality
Rt
= residual
t
= variable representing a particular point in time
• multiplicative = trend * seasonal * residual

FORCASTING AND TIME SERIES ANALYSIS
The forecasting is based on the past recorded data and help in the
determination of future plan with respect to any desired objective. It helps
in the fixing of strategies.
STRATEGY MAKING DECISION
PLANNED
PERFORMANCE
ANALYSIS
DEVIATION
DESIRED
PERFORMANCE
FORECASTE

TYPES OF FORECAST
1. Demand Forecast – Prediction of demand for products or services.
2. Environmental Forecast – Prediction of social, political and economic changes.
3. Technological Forecast – Prediction of technological changes.
TIMING OF FORECASTS
Forecasts are usually classified accordingly to time period.
1. Short range forecast – commonly one year and usually less than the three
months. Eg purchasing of job scheduling, workforce, production level,
regional production, seasonal production etc.
2. Medium range forecast – commonly one to three years. Eg cash
budgeting, sale planning etc.
3. Long range forecast – commonly three to more years. Eg R and D capital
expenditure, establishment of new plants, facilities of labor etc.
Forecasting Methods
Forecasting methods are based on opinion (quantitative) or judgment
(qualitative). The quantitative methods are further divided into two
namely, time series and casual.

A time series is a set of measurements of a variable that are ordered through
time to time. The time variables does not fluctuate arbitrarily. It moves
uniformly always in the same direction.
The time series forecasting methods attempt to account for changes over a
period of time at regular intervals by examining patterns, cycles or trends to
product the outcome for a future time period.
Causal methods are based on the assumptions that the variable value under
consideration has a cause effect relationship with one or more other values.
Methods of Forecasting
1. Define objective
2. Select the variable of interest
3. Determine the time for forecasting
4. Select appropriate model
5. Collect the relevant data
6. Make the forecast

TYPES OF FORECASTING TECHNIQUES
A fixed and suitable technique for forecasting is primary necessity for the
validity of forecasts. In last few decades some forecasting techniques have
been developed and can be classified into three broad categories.
1. NAÏVE METHODS –
It is based on the assumption that future is just an extension of past.
2. BAROMETRIC METHODS –
It is based on assumption that forecast can be made on the basis of
certain happenings on the past. In this method a factor dependent series
has been constructed and there after statistical analysis can yields
forecast.
3. ANALYTICAL METHODS –
It is based on the analysis of causative forces operative on the variable to
be forecasted. Analytical techniques may be non-mathematical like factor
listing or opinion or mathematical.

TIME SERIES ANALYSIS
A time series is orderly arranged numerical values of desired variables with
respect to time. It is represented both in tabular as well as graphical manner.
Objectives : 1 : To identify the pattern and isolate the influencing factors (or
effects) for prediction purpose as well as for future planning and control.
: 2 : To review and evaluate plan progress
Pattern : It is assumed that time series data consists of an uniform pattern
with random fluctuations.
• Actual value of variable per unit time
= Mean value of variable per unit time + Random deviation/unit time
Ŷ = (r) pattern + e
Components :
1 : Trend – Sometimes a time series displayed either upward or downward
movements in the average value of the variable of interest.
2 : Cycles – An upward or downward movements in the variable of interest over
a period of time. It may has four phases peak, contradiction, trough and
expansion
3 : Seasonal – An upward and downward movements within year and follow
regular pattern.
4 : Irregular – rapid upward or downward movements caused by short term
unanticipated and non-recurring factors.

Time Series Methods - The available data of time series is used for the
mathematical analysis to derive future inferences. These processes have
limitations that they have no accurate future values. This limitations of the
time series approach is taken care by the application of causal methods. The
time series methods are as follows -
A. Freehand Methods
B. Smoothing Methods – Smoothing is a process that often improves our ability to
forecast series by reducing the impact of noise
(i) Moving Averages (ii) Weighted Moving Averages (iii) Semi Averages.
C. Exponential Smoothing Methods – (i) Simple exponential Smoothing (ii) Adjusted
Exponential Smoothing
D. Quadratic Trend Model
A. Freehand Methods
A freehand curve draws as a straight line from value of lowest time limit to
value of highest time limit of series. The forecast can be obtained simply by
extending the trend line. A trend line fitted by the freehand method should
confirmed the conditions mentioned below.

(i) It is smooth and straight
(ii) The sum of the vertical deviations above and below the trend line are equal.
(iii) The sum of squares of the vertical deviations from the trend line is as small as
possible.
(iv) The trend line bisects the cycles
Limitation : 1 : This method is highly subjective
: 2 : The trend line drawn cannot have much value
: 3 : It is very time consuming to constant a freehand trend.
B. Smoothing Methods
The objective of smoothing methods is to smoothes out the random
variations due to irregular components of the time series
(i) Moving Averages
It is a quantitative method of forecasting or smoothing a time series by
averaging each successive groups of data values. It is an subjective
method and depends on the length of the period chosen for calculating
moving average.
The moving averages which serve as an estimate of the next periods
value of a variable given a period of length n is expressed as –

Σ {D1
+ Dt-1
+Dt-2
+-----+ Dt- (n+1/
}
Moving average (MAt +1
) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
n
Where – t = current time period; D = actual data which is exchanged each
period and n = length of time period.
• In this method the term “moving” is used because it is obtained by
summing and averaging the values from a given number of periods, each
time deleting the oldest value and adding a new value.
Limitation – It is highly subjective and dependent on the length of period
chosen for calculation of average. The method has three important
limitation.
(a) The increase in size of n increase smoothness of variation but it also
makes the method less sensitive to real changes in the date.
(b) It is difficult to choose the options length of time for which to
compute the moving averages. Moving average can not be found for the
first and last K/2 periods in a k- period moving average.
(c) Moving average cannot pick up trends very well.

Illustration - Calculation of Trend and Short term fluctuations
Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Variable 205 316 340 446 396 450 515 575 495 605
205 316 340 446 396 450 515 575 495 605
(ii) Weighted Moving Averages - In moving average, each observation is
given equal importance (weight) . However, it may be desired to place
more weight (importance) on certain period of time than others. So a
moving average; in which some time periods are weighted differently
than others; is called a weighted moving average. Commonly, the more
recent observations receives the more weight, and the weight decreases
for older data values.
1307/4=326.75
1632/4=408
1807/4=451.75
1936/4=484
2035/4=508.75
2190/4=547.5

1
Weighted moving Average = ⎯ Σ (weight for period n) x (Data value in period n)
Σ weights
Illustration - Forecasting of sales by weighting in past three months
Weight applied 1 2 3 Month Three months ago
Two months ago Last month
X-weighted = 3Mi – 1
+ 2Mi – 2
+ Mi – 3
1
= ⎯ [ 3 × sales last month + 2 × sales in two months ago + 1 × sale in three
6 months ago
march
11
april mey june july aug sep oct nov Dec
20 24 38 42 56 52 40 38 45 40

MONTH SALE THREE MONTH MOVING AVERAGE
MARCH 20
APRIL 24
MAY 38
JUNE 42
JULY 56
AUG 52
SEP 40
0CT 38

(ii) SEMI AVERAGE METHOD
The semi average method permits us to estimate the slope and intercept
of the trend line quite early of a linear function will adequately describe
the data. The trend line is determined simply by means of lower and
upper halves of data. In continuous series these points are determined
at mid point of class interval. The arithmetic mean of the first part is the
intercept value and the slope is determined by the ratio of the difference
in the arithmetic mean of the number of years between them, that is the
change per unit time.
The resultant is the time series is represented by equation
Ỹ = a – bx
When Ỹ = calculated trend value
a = intercept
b = slop value
The equation should always be stated completely with reference to the
year where x = 0 and a description of the units x and y. In the condition
of odd number it is customary to ignore middle time series value.
It may be satisfactory if the trend is linear. If the data deviate much from
linearity the forecast will be biased and less reliable.

ILLUSTRATION
The production of any company
Tons / per year is as follows.
Determine the trend line.
2001 115
2002 120
2003 130
2004 160
2005 145
2006 155
2007 160
2008 155
2009 170
2010 175

To calculate the time series Ỹ = a + bx
∆ y change in series
Slop b = = ∆x change in year
163 – 134 29
= —————— = — = 5.8
2008 – 2003 5
Intercept = a = 134 at 2003
Thus the trend line is Ỹ = 134 + 5.8 x
If we want to predict product in 2012
2003 – 2012 = 9
Ỹ = 134 + 5.8 × 9 = 186.2 tons

Natural Language Processing steps
1. Segmentation:
break the entire document down
into its constituent sentences
with its punctuations like full
stops and commas.
I am in VIIT. I am learning AI at TY.
2. Tokenizing:
I am in VIIT.
I am learning AI at TY
I am
learning
AI

• Removing Stop Words:
I learning
AI
• Stemming:
It is the process of obtaining the Word Stem of a word. Word Stem gives new words
upon adding affixes to them
learning learn
Lemmatization:
The process of obtaining the Root Stem of a word. Root Stem gives the new base
form of a word that is present in the dictionary and from which the word is derived.
intelligence, intelligent, and intelligently has a root word intelligent, which has a
meaning

• Speech Recognition (is also known as Automatic Speech
Recognition (ASR), or computer speech recognition) is the
process of converting a speech signal to a sequence of words, by
means of an algorithm implemented as a computer program.
• The main goal of speech recognition area is to develop
techniques and systems for speech input to machine.
Speech Recognition
Applications of speech recognition
Problem
Domain
Application Input Pattern classes
Speech/Tele
hphone/Co
mmunicatio
n Sector
Education
assistance
Telephone directory enquiry
without operator
Teaching students of foreign
languages to pronounce
vocabulary correctly. Teaching
overseas students to
pronounce English correctly.
Speech wave
form
Speech wave
form
Spoken words
Spoken words

Approaches to speech recognition:
• Acoustic Phonetic Approach
– The earliest approaches to speech recognition were based on
finding speech sounds and providing appropriate labels to these
sounds.
– This is the basis of the acoustic phonetic approach (Hemdal and
Hughes 1967), which postulates that there exist finite, distinctive
phonetic units (phonemes) in spoken language and that these
units are broadly characterized by a set of acoustics properties
that are manifested in the speech signal over time.
– Even though, the acoustic properties of phonetic units are highly
variable, both with speakers and with neighboring sounds, it is
assumed in the acoustic-phonetic approach that the rules
governing the variability are straightforward and can be readily
learned by a machine.
• Artificial Intelligence Approach

• Pattern Recognition Approach
– The pattern-matching approach(Itakura 1975; Rabiner 1989; Rabiner
and Juang 1993) involves two essential steps namely, pattern training
and pattern comparison.
– The essential feature of this approach is that it uses a well formulated
mathematical framework and establishes consistent speech pattern
representations, for reliable pattern comparison, from a set of labeled
training samples via a formal training algorithm.
– A speech pattern representation can be in the form of a speech
template or a statistical model (e.g., a HIDDEN MARKOV MODEL or
HMM) and can be applied to a sound (smaller than a word), a word,
or a phrase.
– In the pattern-comparison stage of the approach, a direct comparison
is made between the unknown speeches (the speech to be
recognized) with each possible pattern learned in the training stage in
order to determine the identity of the unknown according to the
goodness of match of the patterns. The pattern-matching approach
has become the predominant method for speech recognition in the
last six decades
Approaches to speech recognition:

Artificial Intelligence approach (Knowledge
Based approach)
• The Artificial Intelligence approach [97] is a hybrid of
the acoustic phonetic approach and pattern
recognition approach. In this, it exploits the ideas and
concepts of Acoustic phonetic and pattern
recognition methods.
• Knowledge based approach uses the information
regarding linguistic, phonetic and spectrogram.

Perceptron
• A neural network link that contains computations to track features and
uses Artificial Intelligence in the input data is known as Perceptron.
• This neural links to the artificial neurons using simple logic gates with
binary outputs.
• An artificial neuron invokes the mathematical function and has node,
input, weights, and output equivalent to the cell nucleus, dendrites,
synapse, and axon, respectively, compared to a biological neuron.

Perceptron
• Perceptron was introduced by Frank Rosenblatt in 1957. He
proposed a Perceptron learning rule based on the original MCP
neuron. A Perceptron is an algorithm for supervised learning of
binary classifiers. This algorithm enables neurons to learn and
processes elements in the training set one at a time.
• overcomes some of the limitations of the M-P neuron by
introducing the concept of numerical weights (a measure of
importance) for inputs, and a mechanism for learning those
weights. Inputs are no longer limited to boolean values like in the
case of an M-P neuron, it supports real inputs as well which
makes it more useful and generalized.

Discuss real life applications of AI
1. AI application for NLP
2. AI application for time series analysis
3. AI application for speech recognition
4. AI application for chatbots
5. AI application for perceptron based classifier
Dr. L. A. Bewoor Department of Computer Engineering, VIIT , Pune-48 2

Natural Language Processing (NLP)

NLP approaches for Text Analysis
• Conduct basic text processing
• Categorize and tag the words
• Classify text
• Extract information
• Analyze sentence structure
• Build feature vector
• Analyze meaning

NLP Libraries
• Natural Language Toolkit(NLTK)
• GenSim
• SpaCy
• CoreNLP
• TextBlob
• scikit-learn
NLP Components

• Syntactic Analysis
• Removing Stop Words:
I learning
AI
• Stemming:
It is the process of obtaining the Word Stem of a word. Word Stem gives new words
upon adding affixes to them
learning learn
Lemmatization:
The process of obtaining the Root Stem of a word. Root Stem gives the new base
form of a word that is present in the dictionary and from which the word is derived.
intelligence, intelligent, and intelligently has a root word intelligent, which has a
meaning

• POS tagging:
POS stands for parts of speech, which includes Noun,
verb, adverb, and Adjective. It indicates that how a word
functions with its meaning as well as grammatically
within the sentences. A word has one or more parts of
speech based on the context in which it is used.
Semantic Analysis
Semantics involves the use of and meaning behind
words.
Word sense disambiguation. This derives the meaning
of a word based on context. Eg. A pleasant breeze was
experienced at river bank

• Named entity recognition. This determines words that can be
categorized into groups/entities like people, values, locations,
and so on. For example, in the sentence “Mark Zuckerberg is one
of the founders of Facebook, a company from the United
States” we can identify three types of entities:
• “Person”: Mark Zuckerberg
• “Company”: Facebook
• “Location”: United States
• Discourse Integration
Discourse Integration depends upon the sentences that proceeds it
and also invokes the meaning of the sentences that follow it. Eg.
Students were asking for the same.
Pragmatic Analysis
• Pragmatic is last phase of NLP. It helps you to discover the
intended effect by applying a set of rules that characterize
cooperative dialogues. Eg. Shut the door is request not order.

[ 281 ]
Artificial Intelligence
on the Cloud
In this chapter, we are going to learn about the cloud and artificial intelligence
workloads on the cloud. We will discuss the benefits and the risks of migrating AI
projects to the cloud. We will also learn about the offerings provided by the major
cloud providers. We will learn about the services and features that they offer and
hopefully get an understanding of why those providers are the market leaders.
By the end of this chapter, you will have a better understanding of the following:
• The benefits, risks, and costs of migrating to the cloud
• Fundamental cloud concepts such as elasticity
• The top cloud providers
• Amazon Web Services:
° Amazon SageMaker
° Alexa, Lex, and Polly – conversational agents
° Amazon Comprehend – natural language processing
° Amazon Rekognition – image and video
° Amazon Translate
° Amazon machine learning
° Amazon Transcribe – transcription
° Amazon Textract – document analysis
• Microsoft Azure:
° Machine Learning Studio

Artificial Intelligence on the Cloud
[ 282 ]
° Azure Machine Learning interactive workspace
° Azure Cognitive Services
• Google AI and its machine learning products:
° AI Hub
° AI building blocks
Why are companies migrating to
the cloud?
It is hard turning anywhere these days without being hit with the term "the cloud."
Our present-day society has hit a tipping point where businesses big and small are
seeing that the benefits of moving their workloads to the cloud outweigh the costs
and risks. As an example, the US Department of Defense, as of 2019, is in the process
of selecting a cloud provider and awarding a 10-year $10 billion-dollar contract.
Moving your systems to the cloud has many advantages, but one of the main
reasons companies move to the cloud is its elastic capabilities.
When deploying a new project in an on-premises environment, we always start with
capacity planning. Capacity planning is the exercise that enterprises go through to
determine how much hardware they will need for a new system to run efficiently.
Depending on the size of the project, the cost of this hardware can run into the
millions. For that reason, it could take months to complete the process. One of the
reasons it can take so long is because many approvals might be required to complete
the purchase. We can't blame business for being so slow and judicious with these
kinds of decisions.
Even though great planning and thought might go into these purchases, it is not
uncommon to either buy less equipment than required or to buy underpowered
equipment. Maybe just as often, too much equipment is bought or equipment that
is overkill for the project at hand. The reason this happens is because in many cases,
it is difficult to determine demand a priori.
Additionally, even if we get the capacity required properly at the beginning,
the demand might continue to grow and force us to go through the provisioning
process all over again. Or the demand might be variable. For example, we might
have a website that gets a lot of traffic during the day, but demand drops way down
at night. In this case, when using on-premises environments, we have no choice
but to account for the worst-case scenario and buy enough resources so that we
can handle peak periods of demand, but resources will be wasted when demand
decreases in slow periods.

Chapter 12
[ 283 ]
All these issues are non-existent in a cloud environment. All the major cloud
providers, in different ways, provide elastic environments. Not only can we
easily scale up, but we can just as easily scale down.
If we have a website that has variable traffic, we could put the servers that handle
the traffic behind a load balancer and set up alerts that automatically add more
servers to handle traffic spikes and other alerts to terminate the servers once
the storm passes.
The top cloud providers
Given the tsunami that is the cloud, many vendors are vying to quench the demand
for cloud services. However, as is often the case in technology markets, only a few
have bubbled to the top and dominate the space. In this section, we'll analyze the
top players.
Amazon Web Services (AWS)
Amazon Web Services is one of the cloud pioneers. Since it launched in 2006, AWS
has ranked highly in the greatly respected Gartner's Magic Quadrant in both vision
and execution. Since its inception, AWS has held a big chunk of the cloud market.
AWS is an appealing option both for legacy players as well as start-ups. According
to Gartner:
"AWS is the provider most commonly chosen for strategic, organization-wide
adoption"
AWS also has an army of consultants and advisors dedicated to helping its
customers deploy AWS services as well as to teach them how to best leverage
the services available. In summary, it is safe to say that AWS is the most mature,
most advanced cloud provider, with a strong track record of customer success,
as well as a strong stable of partners in AWS Marketplace.
On the flip side, since AWS is the leader and they know it, they are not always the
least expensive option. Another knock for AWS is that since they highly value being
first to market with new services and features, it seems like they are willing to launch
services quickly that might not be fully mature and feature-complete, and work out
the kinks once they are released. In fairness, this is not a tactic exclusive to AWS
and other cloud providers also release beta versions of their services. In addition,
since Amazon competes in markets other than the cloud, it is not uncommon for
some potential customers to go with other providers in order to not "feed the beast."
For example, Walmart is well known for avoiding using AWS at all costs because
of their fierce competition in the e-commerce space.

[ 284 ]
Microsoft Azure
For the past few years, Microsoft Azure has held the second position in the Gartner
Magic Quadrant, trailing AWS, lagging significantly on their ability to execute better
than AWS. But the good news is that they only trail AWS and they are a strong
number two.
Microsoft's solution is appealing to customers hosting legacy workloads as well
as brand new cloud deployments, but for different reasons.
Legacy workloads are normally run on Azure by clients that have traditionally
been Microsoft customers and are trying to leverage their previous investments
in that technology stack.
For new cloud deployments, Azure cloud services hold appeal because of
Microsoft's strong offerings for application development, specialized Platform
as a Service (PaaS) capabilities, data storage, machine learning, and Internet of
Things (IoT) services.
Enterprises that are strategically committed to the Microsoft technology stack have
been able to deploy many large-scale applications in production. Azure specifically
shines when developers fully commit to the suite of Microsoft products, such as
.NET applications, and then deploy them on Azure. Another reason Microsoft has
deep market penetration is its experienced sales staff and its extensive partner
network.
In addition, Microsoft realizes that the next battle in technology will not revolve
around operating systems but rather in the cloud and they have become increasingly
open to adopting non-Microsoft operating systems. As proof of this, as of now, about
half of Azure workloads run on Linux or other open source operating systems and
technology stacks.
A Gartner report notes "Microsoft has a unique vision for the future that involves bringing
in technology partners through native, first-party offerings such as those from VMware,
NetApp, Red Hat, Cray, and Databricks."
On the downside, there have been some reports of reliability, downtime, and service
disruptions as well as some customers taking issue with the quality of Microsoft's
technical support.
Google Cloud Platform (GCP)
In 2018, Google broke into the prestigious Gartner's leaders' quadrant with its GCP
offering, joining only AWS and Azure in the exclusive club. In 2019, GCP remained
in the same quadrant with its two fierce competitors. However, in terms of market
share, GCP is a distant third.

Chapter 12
[ 285 ]
They recently beefed up their sales staff, they have deep pockets, and they have
a strong incentive to not be left behind so don't discount them yet.
Google's reputation as a leader in machine learning is undisputed so it is no surprise
that GCP has strong big data and machine learning offerings. But GCP is also making
some headway, attracting bigger enterprises looking to host legacy workloads such
as SAP and other traditional customer relationship management (CRMs) systems.
Google's internal innovations around machine learning, automation, containers,
and networking, with offerings such as TensorFlow and Kubernetes, have advanced
cloud development. GPS's technology offerings revolve around their contributions
to open source.
Be careful about centering your cloud strategy exclusively around GCP, however.
In a recent report, Gartner declared:
"Google demonstrates an immaturity of process and procedures when dealing
with enterprise accounts, which can make the company difficult to transact
with at times."
And:
"Google has a much smaller pool of experienced Managed Service Providers (MSP)
and infrastructure-centric professional services partners than other vendors in this
Magic Quadrant."
However, Gartner also states:
"Google is aggressively targeting these shortcomings."
Gartner also notes that Google's channel needs development.
Alibaba Cloud
Alibaba Cloud made its first appearance in Gartner's Magic Quadrant in 2017, and as
of 2019, Alibaba's cloud offering called Aliyun remains in the Niche Player category.
Gartner only evaluated the company's international service, headquartered in
Singapore.
Alibaba Cloud is the market leader in China, and many Chinese businesses, as
well as the Chinese government, have been served well by using Alibaba as their
cloud provider. However, a big part of this market share leadership might be given
up if China ever decides to remove some of the restrictions on other international
cloud vendors.

[ 286 ]
The company provides support in China for building hybrid clouds. But, outside
of China, it's mostly used by cloud-centric workloads. In 2018, it forged partnerships
with VMware and SAP.
Alibaba has a suite of services that is comparable in scope to the service portfolios
of other global providers.
The company's close relationships with the Alibaba Group helps the cloud service
to be a bridge for international companies looking to do business in China, and out
of China for Chinese companies.
Alibaba does not yet seem to have the service and feature depth of competitors
such as AWS, Azure, and GCP. And in many regions, services are only available
for specific compute instances. They also need to strengthen their MSP ecosystem,
third-party enterprise software integration, and operational tools.
Oracle Cloud Infrastructure (OCI)
In 2017, Oracle's cloud offering made a debut on Gartner's Magic Quadrant as
a Visionary. But in 2018, due to a change to Gartner's evaluation criteria, Oracle
was moved to Niche Player status. It remained there as of 2019.
Oracle Cloud Infrastructure, or OCI, was a second-generation service launched in
2016 to phase out the legacy offering, now referred to as Oracle Cloud Infrastructure
Classic.
OCI offers both virtualized and bare-metal servers, with one-click installation and
configuration of Oracle databases and container services.
OCI appeals to customers with Oracle workloads that don't need more than basic
Infrastructure as a Service (IaaS) capabilities.
Oracle's cloud strategy relies on its applications, database, and middleware.
Oracle has made some headway in attracting talent from other cloud providers to
beef up its offerings. It's also made some progress in winning new business and
getting existing Oracle customers to move to the OCI cloud. However, Oracle still
has a long road ahead of it before it can catch up with the big three.
IBM Cloud
In the mainframe era, IBM was the undisputed computing king of the hill. It lost
that title when we started moving away from mainframes and personal computers
became ubiquitous. IBM is again trying to reclaim a leadership position in this new
paradigm shift. IBM Cloud is IBM's answer to this challenge.

Chapter 12
[ 287 ]
The company's diversified cloud services include container platforms, serverless
services, and PaaS offerings. They are complemented by IBM Cloud Private for
hybrid architectures.
Like some of the other lower-tier cloud providers, IBM appeals to its existing
customers who have a strong preference to purchase most of their technology from
Big Blue (IBM's nickname).
These existing customers usually have traditional workloads. IBM is also leveraging
these long relationships to transition these customers into emerging IBM solutions,
such as Watson's artificial intelligence.
IBM benefits from a large base of existing customers running critical production
services and that are just starting to get comfortable with cloud adoption. This
existing customer base positions IBM well to assist these customers as they embrace
the cloud and begin their transformation journeys.
Like Oracle, IBM is fighting an uphill battle to gain market share from AWS, Azure,
and Google.
Amazon Web Services (AWS)
We'll now focus on the top three cloud providers. As you are probably already
aware, cloud providers offer much more than artificial services, starting with
barebones compute and storage services, all the way to very sophisticated high-
level services. As with everything else in this book, we will specifically drill into
the artificial intelligence and machine learning services that cloud providers offer,
starting with AWS.
Amazon SageMaker
Amazon SageMaker was launched at Amazon's annual re:Invent conference in
Las Vegas, Nevada in 2017. SageMaker is a machine learning platform that enables
developers and data scientists to create, train, and deploy machine learning (ML)
models in the cloud.
A common tool used by data scientists in their day-to-day work is a Jupyter
Notebook. These notebooks are documents that contain a combination of computer
code such as Python, rich text elements such as paragraphs, equations, graphs, and
URLs. Jupyter notebooks can easily be understood by humans because they contain
analysis, descriptions, and results (figures, graphs, tables, and so on), and they are
also executable programs that can be processed online or on a laptop.

[ 288 ]
You can think of Amazon SageMaker as a Jupyter Notebook on steroids. These are
some of the advantages of SageMaker over traditional Jupyter notebooks. In other
words, these are the different steroid flavors:
• Like many of the machine learning services offered by Amazon, SageMaker
is a fully managed machine learning service so you do not have to worry
about upgrading operating systems or installing drivers.
• Amazon SageMaker provides implementations of some of the most common
machine learning models, but these implementations are highly optimized
and, in some cases, run up to 10 times faster than other implementations
of the same algorithm. In addition, you can bring in your own algorithms
if the machine learning model is not provided out of the box by SageMaker.
• Amazon SageMaker provides the right amount of muscle for a variety of
workloads. The type of machine that can be used to either train or deploy
your algorithm can be selected from the wide variety of machine types
that Amazon provides. If you are just experimenting with SageMaker, you
might decide to use an ml.t2.medium machine, which is one of the smallest
machines you can use with SageMaker. If you require some real power,
you can their accelerated computer instances, such as an ml.p3dn.24xlarge
machine. The power delivered by such an instance is equivalent to what just
a few years ago was considered a supercomputer and would cost millions
of dollars to purchase.
Amazon SageMaker allows developers to increase their productivity across the
entire machine learning pipeline, including:
Data preparation – Amazon SageMaker can seamlessly integrate with many other
AWS services, including S3, RDS, DynamoDB, and Lambda, making it simple
to ingest and prepare data for consumption by machine learning algorithms.
Algorithm selection and training – Out of the box, Amazon SageMaker has a variety
of high-performance, scalable machine learning algorithms optimized for speed and
accuracy. These algorithms can perform training on petabyte-size datasets and can
increase performance by up to 10 times the performance of similar implementations.
These are some of the algorithms that are included with SageMaker:
• BlazingText
• DeepAR forecasting
• Factorization machines
• K-Means
• Random Cut Forest (RCF)

Chapter 12
[ 289 ]
• Object detection
• Image classification
• Neural Topic Model (NTM)
• IP Insights
• K-Nearest Neighbors (k-NN)
• Latent Dirichlet Allocation (LDA)
• Linear Learner
• Object2Vec
• Principal Component Analysis (PCA)
• Semantic segmentation
• Sequence-to-sequence
• XGBoost
Algorithm tuning and optimizing – Amazon SageMaker offers automatic model
tuning, also known as hyperparameter tuning. The tuning finds the best parameter
set for a model by running multiple iterations using the same input dataset running
the same algorithm over a range of specified hyperparameters. As the training jobs
run, a scorecard is kept of the best performing version of the model. The definition
of "best" is based on a pre-defined metric.
As an example, let's assume we are trying to solve a binary classification problem.
The goal is to maximize the area under the curve (AUC) metric of the algorithm by
training an XGBoost algorithm model. We can tune the following hyperparameters
for the algorithm:
• alpha
• eta
• min_child_weight
• max_depth
In order to find the best values for these hyperparameters, we can specify a range of
values for the hyperparameter tuning. A series of training jobs will be kicked off and
the best set of hyperparameters will be stored depending on which version provides
the highest AUC.
Amazon SageMaker's automatic model tuning can be used both with SageMaker's
built-in algorithms as well as with custom algorithms.

[ 290 ]
Algorithm deployment – Deploying a model in Amazon SageMaker is a two-step
process:
1. Create an endpoint configuration specifying the ML compute instances that
are used to deploy the model.
2. Launching one or more ML compute instances to deploy the model and
exposing the URI to invoke that will allow users to make predictions.
The endpoint configuration API accepts the ML instance type and the initial count
of instances. In the case of neural networks, the configuration may include the type
of GPU-backed instance. The endpoint API provisions the infrastructure as defined
in the previous step.
SageMaker deployment supports both one-off and batch predictions. Batch
predictions make predictions on datasets that can be stored in Amazon S3 or other
AWS storage solutions.
Integration and invocation – Amazon SageMaker provides a variety of ways and
interfaces to interact with the service:
• Web API – Sagemaker has a web API that can be used to control and invoke
a SageMaker server instance.
• SageMaker API – As with other services, Amazon has an API for SageMaker
that supports the following list of programming languages:
° Go
° C++
° Java
° JavaScript
° Python
° PHP
° Ruby
° Java
• Web interface – If you are familiar with Jupyter Notebooks, you will feel
right at home with Amazon SageMaker since the web interface to interact
with SageMaker is Jupyter Notebooks.
• AWS CLI – The AWS command-line interface (CLI).

Chapter 12
[ 291 ]
Alexa, Lex, and Polly – conversational gents
In previous chapters, we discussed Alexa and its increasingly pervasive presence
in homes. We'll now delve into the technologies that power Alexa and allow you
to create your own conversational bots.
Amazon Lex is a service for building conversational agents. Amazon Lex, along
with other chatbots, is our generation's attempt at passing the Turing Test, which
we discussed in previous chapters. It will be a while before anyone confuses a
conversation with Alexa with a human conversation. However, Amazon and other
companies keep on making strides in making these conversations more and more
natural. Amazon Lex, which uses the same technologies that power Amazon Alexa
allows developers to quickly build sophisticated, natural language, conversational
agents or chatbots. For simple cases, it's possible to build some of these chatbots
without any programming. However, it is possible to integrate Lex with other
services in the AWS stack with AWS Lambda as the integration technology.
We will devote a whole chapter to creating chatbots later, so we will keep this
section short for now.
Amazon Comprehend – natural language
processing
Amazon Comprehend is a natural language processing (NLP) service provided
by AWS. It uses machine learning to analyze content, perform entity recognition,
and find implicit and explicit relationships. Companies are starting to realize that
they have valuable information in the mounds of data that they generate every
day. Valuable insights can be ascertained from customer emails, support tickets,
product reviews, call center conversations, and social media interactions. Up until
recently, it was cost-prohibitive to try to obtain these insights, but tools like Amazon
Comprehend make it cost-effective to perform analysis on vast amounts of data.
Another advantage of this service is that is it yet another AWS service that is fully
managed, so there is no need to provision servers, install drivers, and upgrade
software. It is simple to use and deep experience in NLP is not required to quickly
become productive with it.
Like other AWS AI/ML services, Amazon Comprehend integrates with other AWS
services such as AWS Lambda and AWS Glue.
Use cases – Amazon Comprehend can be used to scan documents and identify
patterns in those documents. This capability can be applied to a range of use cases,
such as sentiment analysis, entity extraction, and document organization by topic.

[ 292 ]
As an example, Amazon Comprehend could analyze text from a social media
interaction with a customer, identify key phrases, and determine whether the
customer's experience was positive or negative.
Console Access – Amazon Comprehend can be accessed from the AWS Management
Console. One of the easiest ways to ingest data into the service is by using Amazon
S3. We can then make a call to the Comprehend service to analyze text for key
phrases and relationships. Comprehend can return a confidence score for each
user request to determine the confidence level of accuracy; the higher the percentage,
the more confident the service is. Comprehend can easily process a single request
or multiple requests in a batch.
Available Application Programming Interface (APIs) – As of this writing,
Comprehend provides six different APIs to enable insights. They are:
• Key phrase Extraction API – Identifies key phrases and terms.
• Sentiment Analysis API – Returns the overall meaning and feeling of the
text, either positive, negative, neutral, or mixed.
• Syntax API – Allows a user to tokenize text to define word boundaries and
label words in their different parts of speech, such as nouns and verbs.
• Entity Recognition API – Identifies and labels different entities in the text,
such as people, places, and companies.
• Language Detection API – Identifies the primary language in which a text
is written. The service can identify over a hundred languages.
• Custom Classification API – Enables a user to build a custom text
classification model.
Industry-specific services – Amazon Comprehend Medical was released at AWS
re:Invent in 2018. It is built specifically for the medical industry and can identify
industry-specific terminology. Comprehend also offers a specific Medical Named
Entity and Relationship Extraction API. AWS does not store or use any text inputs
from Amazon Comprehend Medical for future machine learning training.
Amazon Rekognition – image and video
No, it's not a typo. Amazon named its recognition service with a k and not a c.
Amazon Rekognition can perform image and video analysis and enables users to
add this functionality to their applications. Amazon Rekognition has been pretrained
with millions of labeled images. Because of this, the service can quickly recognize:
• Object types – Chairs, tables, cars, and so on
• Celebrities – Actors, politicians, athletes, and so on

Chapter 12
[ 293 ]
• People – Facial analysis, facial expressions, facial quality, user verification,
and so on
• Text – Recognize an image as text and convert it to text
• Scenes – Dancing, celebrating, eating, and so on
• Inappropriate content – Adult, violent, or visually disturbing content
Amazon Rekognition has already recognized billions of images and videos and it
uses them to continuously get better and better. The application of deep learning
in the domain of image recognition might arguably be the most successful machine
learning application in the last few years and Amazon Rekognition leverages deep
learning to deliver impressive results. To use it, it is not required to have a high level
of machine learning expertise. Amazon Rekognition provides a simple API. To use
it, an image is passed along to the services along with a few parameters and that is it.
Amazon Rekognition will only continue to get better. The more it gets used, the more
inputs it receives, and the more it learns from those inputs. In addition, Amazon
continues to enhance and to add new features and functionality to the service.
Some of the most popular use cases and applications for Amazon Rekognition are:
Object, scene, and activity detection – With Amazon Rekognition, you can identify
thousands of different types of objects (for example, cars, houses, chairs, and so
on) and scenes (for example, cities, malls, beaches, and so on). When analyzing
video, specific activities that are happening in the frame can be identified, such as
"emptying a car trunk" or "children playing."
Gender recognition – Amazon Rekognition can be used to make an educated guess
to determine whether a person in an image is a male or a female. The functionality
should not be used as the sole determinant of a person's gender. It is not meant
to be used in such a way. For example, if a male actor is wearing a long-haired
wig and earrings for a role, they might be identified as a female.
Facial recognition and analysis – One of the uses of facial recognition systems is
to identify and authenticate a person from an image or video. This technology has
been around for decades, but it's not until recently that its application has become
more popular, cheaper, and more available, due in no small part to deep learning
techniques and the ubiquity of services such as Rekognition. Facial recognition
technologies power many of today's applications, such as photo sharing and storage
services and as a second factor in authentication workflows for smartphones.
Once we recognize that an object is a face, we might want to perform further
facial analysis. Some of the attributes that Amazon Rekognition can assist in
determining are:
• Eyes open or closed

[ 294 ]
• Mood:
° Happy
° Sad
° Angry
° Surprised
° Disgusted
° Calm
° Confused
° Fear
• Hair color
• Eye color
• Beards or mustaches
• Glasses
• Age range
• Gender
• Visual geometry of a face
These detected attributes are useful when there is a need to search through and
organize millions of images in seconds, generating metadata tags such as a person's
mood or to identify a person.
Pathing – The path of a person can be tracked in the scene using Amazon
Rekognition using video files. For example, if we see an image that contains a
person with bags around a trunk, we might not know whether the person is taking
the bags out of the trunk and arriving or if they are putting the bags into the trunk
and leaving. By analyzing the video using pathing, we will be able to make this
determination.
Unsafe content detection – Amazon Rekognition can assist in identifying potentially
unsafe or inappropriate content in images and video content and it can provide
detailed labels that accurately control access to those assets based on previously
determined criteria.
Celebrity recognition – Celebrities and famous people can be quickly identified in
image and video libraries to catalog photos and footage. This functionality can be
used in marketing, advertising, and media industry use cases.

Chapter 12
[ 295 ]
Text in images – Once we identify that an image contains text, it is only natural
to want to convert the letters and words in that image into text. As an example,
if Rekognition is able to not only recognize that an object is a license plate but
additionally convert the image into text, it will then be easy to index that against
Department of Motor Vehicle records and track individuals and their whereabouts.
Amazon Translate
Amazon Translate is another Amazon service that can be used to translate large
amounts of text written in one language to another language. Amazon Translate is
pay-per-use, so you will only be charged when you submit something that needs
translation. As of October 2019, Amazon Translate supports 32 languages:
Language Language Code
Arabic ar
Chinese (Simplified) zh
Chinese (Traditional) zh-TW
Czech cs
Danish da
Dutch nl
English en
Finnish fi
French fr
German de
Greek el
Hebrew he
Hindi hi
Hungarian hu
Indonesian id
Italian it
Japanese ja
Korean ko
Malay ms
Norwegian no
Persian fa
Polish pl
Portuguese pt

[ 296 ]
Romanian ro
Russian ru
Spanish es
Swedish sv
Thai th
Turkish tr
Ukrainian uk
Urdu ur
Vietnamese vi
With a few exceptions, most of these languages can be translated from one to the
other. Users can also add items to the dictionary to customize the terminology and
include terms that are specific to their organization or use case, such as brand and
product names.
Amazon Translate uses machine learning and a continuous learning model to
improve the performance of its translation over time.
The service can be accessed in three different ways, in the same way that many of the
AWS services can be accessed:
• From the AWS console, to translate small snippets of text and to sample the
service.
• Using the AWS API (supported languages are C++, Go, Java, JavaScript,
.NET, Node.js, PHP, Python, and Ruby).
• Amazon Translate can be accessed via the AWS CLI.
Uses for Amazon Translate
Many companies use Amazon Translate together with other external services.
Additionally, Amazon Translate can be integrated with other AWS services. For
example, Translate can be used in conjunction with Amazon Comprehend to pull
out predetermined entities, sentiments, or keywords from a social media feed and
then translate the extracted terms. In another example, the service can be paired with
Amazon S3 to translate document repositories and speak a translated language with
Amazon Polly.
However, using Amazon Translate does not mean that human translators don't
have a role anymore. Some companies are pairing Amazon Translate with human
translators to increase the speed of the translation process.

Endsem AI merged.pdf

Recommended

Recommended

More Related Content

Similar to Endsem AI merged.pdf

Similar to Endsem AI merged.pdf (20)

Recently uploaded

Recently uploaded (20)

Endsem AI merged.pdf