Artificial Intelligence - Anna Uni -v1.pdf

Artificial Intelligence
Foundational ideas, applications & issues
Dr. Jayanti Prasad
DISYS India Pvt. Ltd. Chennai
June 07, 2023, Anna University Chennai

Agenda
● Natural language Processing
○ Text Vectorization
○ Seq2Seq Modeling
○ Attention
○ Transformers
● Physics & AI
● Responsible AI
○ Explainable AI
○ Safe AI
○ Fair AI
● Conclusions
● Introduction
○ Fourth paradigm of science
○ Fourth Industrial revolution
○ Why AI today
○ What is AI
○ What is data science ?
● Learning & Intelligence
○ Bayesian Learning
○ Type of AI
○ Deep Learning 2

3
1. Empirical observations
2. Theory
3. Computation
4. Data driven discovery
“No longer restricted to data analysis, machine learning is now increasingly being used in
theory, experiment and simulation — a sign that data-intensive science is starting to
encompass all traditional aspects of research.”
Fourth paradigm of science

Fourth Industrial revolution
4

WHY AI Today ?
5
Moore’s Law Data explosion

Algorithms timeline
6
• Seq2Aeq
• Attention
• Transformers
2015-

ImageNet
o The ImageNet project is a large visual database designed for
use in visual object recognition software research.
o ImageNet database has more than 14 million images in
20,000 categories (labelled).
o In 2012 competition one neural network named AlexNet
achieved 84 % accuracy, quite close to humans.
7

AI revolution in Language
9
Citations
Title
Authors
Year
84,605
Long Short-Term Memory
Sepp Hochreiter and
Jürgen Schmidhuber
1997
22,422
Sequence to Sequence
Learning with Neural Networks
Ilya Sutskever, Oriol
Vinyals, Quoc V. Le
2014
76,820
Attention Is All You Need !
Ashish Vaswani, Noam
Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan
N. Gomez, Lukasz Kaiser, Illia
Polosukhin
2017
67,875
BERT: Pre-training of Deep
Bidirectional Transformers for
Language Understanding
Jacob Devlin, Ming-Wei
Chang, Kenton Lee, Kristina
Toutanova
2018
5,592
Improving Language
Understanding by Generative
Pre-Training
Alec Radford, Karthik
Narasimhan, Tim
Salimans, Ilya Sutskever
2019

Type of AI
11
Thinking
Humanly
Thinking
Rationally
Acting
Humanly
Acting
rationally
1. Natural Language Processing (NLP)
2. Knowledge Representation
3. Automated Reasoning
4. Machine Learning – Learning from experience (data)
5. Computer Vision
6. Robotics
Turing Test

Machine Learning
● A computer program is said to learn from experience ‘E’
with respect to some class of tasks ‘T’ and performance
measure ‘P’, if its performance at tasks in ‘T’, as measured
by P, improves with experience ‘E’.”
- Tom Mitchell
 Experience (E) - Data
 Task (T) – Classification / Regression
 Performance (P) – Loss, accuracy
-
12

Machine Learning
13
Machine
Learning
Supervised
Unsupervised
Reinforcement
Hybrid

Data Science
Data science combines math and statistics, specialized programming, advanced
analytics, artificial intelligence (AI), and machine learning with specific subject
matter expertise to uncover actionable insights hidden in an organization’s
data. These insights can be used to guide decision making and strategic
planning
14
 Mathematics (Linear Algebra & Calculus)
 Probability & Statistics
 Specialized programming
 Artificial Intelligence & Machine Learning
 Subject matter expertise

Data science lifecycle
15
Business Understanding / use case discovery
Data collection
Data pre-processing (cleaning, featurization etc.,)
Data modeling
Model evaluation
Model deployment

Bayesian probability
● Conditional Probability
=
● Bayes’ Theorem
● Posterior (model) = Likelihood (data, model) * Prior (model)
17

Bayesian Model Fitting
18
A = Model
B = Data

Example
● Model : Fitting a line
● Data :
● Likelihood :
∗
=
● Posterior :
19

Discriminative vs Generative Models
● Discriminative models capture conditional
probability P(y | X)
● Generative models capture joint probability P(X, y)
or just P(X) if there are no labels.
● A generative model includes the distribution of the data itself and tells
you how likely a given example is. For example, models that predict
the next word in a sequence are typically generative models (usually
much simpler than GANs) because they can assign a probability to a
sequence of words.
20

Supervised Machine Leraning
21

Supervised Machine Learning
● Data : where and either (regression) or
where
● Model
, in general it is a non-linear function.
● Loss function : Quantifies the mismatch between the actual
output and predicted output ,
● Training : Minimize the loss against the weights / parameters of
the model.
22

Common Loss functions
● Mean Squared Error (regression)
= , (for linear cases)
● Cross Entropy (classification)
+
● Activation (non-linear) function:
Sigmoid : p(X) = ( ∗ )
23

Artificial Neuron /Perceptron
25

Optimization
● Supervised machine learning is all about finding the model
parameters or ‘weights’ which can represent the non-linear
mapping between the input (features) and the output.
● Finding a set of parameters which minimizes the ‘mismatch’
or loss between the actual output and predicted output is
called ‘optimization’.
● There are many optimization schemes and one of the most
common is ‘gradient descent’.
26

Gradient descent
● It is an optimization scheme to find the model parameters
iteratively by minimizing the loss function
27

Two key properties of
artificial neural networks

Data Classification
36
Data
Numerical
(value)
Continuous
Discrete
Ordinal
Nominal
Symbolic
(meaning)

Challenges in modeling text
● Computers are good in numerical computation / number
crunching
● Numerical operations (+,-,x,/) are not possible on text.
● There is way to get relations like the following one.
King – Man + Women = Queen !
37

Vectorization / Latent space
● Let us consider the text :
Data = “quick brown fox jumped over the lazy dog “
● Assign unique id to each word/symbol and build a
vocabulary.
● Now represent each word with a vector (one hot encoding) of
the size of the vocabulary.
● Unfortunately, the dimensionality of the vector is very high
and all the entries except one are zeroes.
38

Word Embedding
● Let us consider a one hot encoded vector X N is
the size of the vocabulary.
● We can find a matrix such that
Here
● In common English used we have around 40,000 words
(N=40,000) and we generally have M = 300.
● There are many pre-trained models to get vectorizd form of
words, such as Glove and Word2Vec
39

Three revolutionary
neural network
architectures

(1) Seq2Seq Model
44
“The encoder processes each item in the input sequence, it compiles the information it captures
into a vector (called the context). After processing the entire input sequence, the encoder
sends the context over to the decoder, which begins producing the output sequence item by item.”
RNN (LSTM)
Encoder-Decoder

Responsible AI
● Explainable AI
● Safe AI
● Fair AI / Unbiased
51

References
1. A high-bias, low-variance introduction to Machine Learning for physicists
2. Statistical Mechanics of deep learning
3. Discovering Physical Concepts with Neural Networks
4. Physics-informed machine learning
5. On scientific understanding with artificial intelligence
6. AI Feynman: A physics-inspired method for symbolic regression
7. Defining physicists’ relationship with AI
8. Machine intelligence – Nature insight
9. Rise of the Machines – Science special issue
10.https://prasad-jayanti.medium.com/
eargatisticanicee 54

Artificial Intelligence - Anna Uni -v1.pdf

Recommended

Recommended

More Related Content

Similar to Artificial Intelligence - Anna Uni -v1.pdf

Similar to Artificial Intelligence - Anna Uni -v1.pdf (20)

More from Jayanti Prasad Ph.D.

More from Jayanti Prasad Ph.D. (14)

Recently uploaded

Recently uploaded (20)

Artificial Intelligence - Anna Uni -v1.pdf