2. Agenda
● Natural language Processing
○ Text Vectorization
○ Seq2Seq Modeling
○ Attention
○ Transformers
● Physics & AI
● Responsible AI
○ Explainable AI
○ Safe AI
○ Fair AI
● Conclusions
● Introduction
○ Fourth paradigm of science
○ Fourth Industrial revolution
○ Why AI today
○ What is AI
○ What is data science ?
● Learning & Intelligence
○ Bayesian Learning
○ Type of AI
○ Deep Learning 2
3. 3
1. Empirical observations
2. Theory
3. Computation
4. Data driven discovery
“No longer restricted to data analysis, machine learning is now increasingly being used in
theory, experiment and simulation — a sign that data-intensive science is starting to
encompass all traditional aspects of research.”
Fourth paradigm of science
7. ImageNet
o The ImageNet project is a large visual database designed for
use in visual object recognition software research.
o ImageNet database has more than 14 million images in
20,000 categories (labelled).
o In 2012 competition one neural network named AlexNet
achieved 84 % accuracy, quite close to humans.
7
9. AI revolution in Language
9
Citations
Title
Authors
Year
84,605
Long Short-Term Memory
Sepp Hochreiter and
Jürgen Schmidhuber
1997
22,422
Sequence to Sequence
Learning with Neural Networks
Ilya Sutskever, Oriol
Vinyals, Quoc V. Le
2014
76,820
Attention Is All You Need !
Ashish Vaswani, Noam
Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan
N. Gomez, Lukasz Kaiser, Illia
Polosukhin
2017
67,875
BERT: Pre-training of Deep
Bidirectional Transformers for
Language Understanding
Jacob Devlin, Ming-Wei
Chang, Kenton Lee, Kristina
Toutanova
2018
5,592
Improving Language
Understanding by Generative
Pre-Training
Alec Radford, Karthik
Narasimhan, Tim
Salimans, Ilya Sutskever
2019
12. Machine Learning
● A computer program is said to learn from experience ‘E’
with respect to some class of tasks ‘T’ and performance
measure ‘P’, if its performance at tasks in ‘T’, as measured
by P, improves with experience ‘E’.”
- Tom Mitchell
Experience (E) - Data
Task (T) – Classification / Regression
Performance (P) – Loss, accuracy
-
12
14. Data Science
Data science combines math and statistics, specialized programming, advanced
analytics, artificial intelligence (AI), and machine learning with specific subject
matter expertise to uncover actionable insights hidden in an organization’s
data. These insights can be used to guide decision making and strategic
planning
14
Mathematics (Linear Algebra & Calculus)
Probability & Statistics
Specialized programming
Artificial Intelligence & Machine Learning
Subject matter expertise
15. Data science lifecycle
15
Business Understanding / use case discovery
Data collection
Data pre-processing (cleaning, featurization etc.,)
Data modeling
Model evaluation
Model deployment
19. Example
● Model : Fitting a line
● Data :
● Likelihood :
∗
=
● Posterior :
19
20. Discriminative vs Generative Models
● Discriminative models capture conditional
probability P(y | X)
● Generative models capture joint probability P(X, y)
or just P(X) if there are no labels.
● A generative model includes the distribution of the data itself and tells
you how likely a given example is. For example, models that predict
the next word in a sequence are typically generative models (usually
much simpler than GANs) because they can assign a probability to a
sequence of words.
20
22. Supervised Machine Learning
● Data : where and either (regression) or
where
● Model
, in general it is a non-linear function.
● Loss function : Quantifies the mismatch between the actual
output and predicted output ,
● Training : Minimize the loss against the weights / parameters of
the model.
22
23. Common Loss functions
● Mean Squared Error (regression)
= , (for linear cases)
● Cross Entropy (classification)
+
● Activation (non-linear) function:
Sigmoid : p(X) = ( ∗ )
23
26. Optimization
● Supervised machine learning is all about finding the model
parameters or ‘weights’ which can represent the non-linear
mapping between the input (features) and the output.
● Finding a set of parameters which minimizes the ‘mismatch’
or loss between the actual output and predicted output is
called ‘optimization’.
● There are many optimization schemes and one of the most
common is ‘gradient descent’.
26
27. Gradient descent
● It is an optimization scheme to find the model parameters
iteratively by minimizing the loss function
27
37. Challenges in modeling text
● Computers are good in numerical computation / number
crunching
● Numerical operations (+,-,x,/) are not possible on text.
● There is way to get relations like the following one.
King – Man + Women = Queen !
37
38. Vectorization / Latent space
● Let us consider the text :
Data = “quick brown fox jumped over the lazy dog “
● Assign unique id to each word/symbol and build a
vocabulary.
● Now represent each word with a vector (one hot encoding) of
the size of the vocabulary.
● Unfortunately, the dimensionality of the vector is very high
and all the entries except one are zeroes.
38
39. Word Embedding
● Let us consider a one hot encoded vector X N is
the size of the vocabulary.
● We can find a matrix such that
Here
● In common English used we have around 40,000 words
(N=40,000) and we generally have M = 300.
● There are many pre-trained models to get vectorizd form of
words, such as Glove and Word2Vec
39
44. (1) Seq2Seq Model
44
“The encoder processes each item in the input sequence, it compiles the information it captures
into a vector (called the context). After processing the entire input sequence, the encoder
sends the context over to the decoder, which begins producing the output sequence item by item.”
RNN (LSTM)
Encoder-Decoder
54. References
1. A high-bias, low-variance introduction to Machine Learning for physicists
2. Statistical Mechanics of deep learning
3. Discovering Physical Concepts with Neural Networks
4. Physics-informed machine learning
5. On scientific understanding with artificial intelligence
6. AI Feynman: A physics-inspired method for symbolic regression
7. Defining physicists’ relationship with AI
8. Machine intelligence – Nature insight
9. Rise of the Machines – Science special issue
10.https://prasad-jayanti.medium.com/
eargatisticanicee 54