Artificial Intelligence, Machine
Learning, and (Large) Language
Models: A Quick Introduction
Hiroki Sayama
sayama@binghamton.edu
Outline
1. The Origin: Understanding
“Intelligence”
2. Key Ingredient I: Statistics
& Data Analytics
3. Key Ingredient II:
Optimization
4. Machine Learning
5. Artificial Neural Networks
6. (Large) Language Models
7. Challenges
2
The Origin:
Understanding
“Intelligence”
3
Alan Turing and the
Turing Machine (1936)
4
https://www.felienne.com/archives/2974
Turing Test (1950) – a.k.a.
“the Imitation Game”
5
https://en.wikipedia.org/wiki/Turing_test
McCulloch-Pitts Model
(1943)
6
The first formal model of
computational mechanisms of
(artificial) neurons
Basis of
Modern
Artificial
Neural
Networks
7
Multilayer perceptron
(Rosenblatt 1958)
Backpropagation
(Rumelhart, Hinton &
Williams 1986)
Deep learning
https://commons.wikimedia.org/wiki/File:
Example_of_a_deep_neural_network.png
Cybernetics (1940s-80s)
8
“Cybernetics” as a
Precursor to “AI”
9
Norbert Wiener
(This is where the word “cyber-” came from!)
Good Old-Fashioned AI:
Symbolic Computation and
Reasoning
▪ Herbert Simon et al.’s “Logic Theorist” (1956)
▪ Functional programming, list processing (e.g.,
LISP (1955-))
▪ Logic-based chatbots (e.g., ELIZA (1966))
▪ Expert systems
▪ Fuzzy logic (Zadeh, 1965)
10
“AI Winters”
11
Key
Ingredient I:
Statistics &
Data Analytics
12
Pattern Discovery,
Classic Way
▪ Descriptive statistics
▪ Distribution, correlation,
regression
▪ Inferential statistics
▪ Hypothesis testing, estimation,
Bayesian inference
▪ Parametric / non-parametric
approaches
13
https://en.wikipedia.org/wiki/Statistics
Regression
▪ Legendre, Gauss (early 1800s)
▪ Representing the behavior of a
dependent variable (DV) as a
function of independent
variable(s) (IV)
▪ Linear regression, polynomial
regression, logistic regression,
etc.
▪ Optimization (minimization) of
errors between model and data
14
https://en.wikipedia.org/wiki/Regression_analysis
https://en.wikipedia.org/wiki/Polynomial_regression
Hypothesis Testing
▪ Original idea dates back to
1700s
▪ Pearson, Gosset, Fisher (early
1900s)
▪ Set up hypothesis(-ses) and
see how (un)likely the
observed data could be
explained by them
▪ Type-I error (false positive),
Type-II error (false negative)
15
https://en.wikibooks.org/wiki/Statistics/Testing
_Statistical_Hypothesis
Bayesian Inference
▪ Bayes & Price (1763), Laplace
(1774)
▪ Probability as a degree of belief
that an event or a proposition is
true
▪ Estimated likelihoods updated
as additional data are obtained
▪ Empowered by Markov Chain
Monte Carlo (MCMC) numerical
integration methods (Metropolis
1953; Hastings 1970)
16
https://en.wikipedia.org/wiki/Bayes%27_theorem
https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo
Key
Ingredient II:
Optimization
17
Least Squares Method
▪ Legendre, Gauss (early 1800s)
▪ Find the formula that minimizes
the sum of squared errors
(residuals) analytically
18
https://en.wikipedia.org/wiki/Least_squares
Gradient Methods
▪ Find local minimum of a
function computationally
▪ Gradient descent (Cauchy
1847) and its variants
▪ More than 150 years later,
this is still what modern
AI/ML/DL systems are
essentially doing!!
▪ Error minimization
19
https://commons.wikimedia.org/wiki/File:
Gradient_descent.gif
Linear/Nonlinear/Integer/
Dynamic Programming
▪ Extensively studied and used in
Operations Research
▪ Practical optimization algorithms
under various constraints
20
https://en.wikipedia.org/wiki/Linear_programming
https://en.wikipedia.org/wiki/Integer_programming
https://en.wikipedia.org/wiki/Floyd%E2%80%93Wa
rshall_algorithm
Evolutionary Algorithms
▪ Original idea by Turing (1950)
▪ Genetic algorithm (Holland 1975)
▪ Genetic programming (Cramer 1985, Koza 1988)
▪ Differential evolution (Storn & Price 1997)
▪ Neuroevolution (Stanley & Miikkulainen 2002)
21
https://becominghuman.ai/my-new-genetic-algorithm-for-time-series-f7f0df31343d https://en.wikipedia.org/wiki/Genetic_programming
Other Population-Based
Learning & Optimization
▪ Ant colony optimization
(Dorigo 1992)
▪ Particle swarm optimization
(Kennedy & Eberhart 1995)
▪ And various other metaphor-based metaheuristic algorithms
https://en.wikipedia.org/wiki/List_of_metaphor-based_metaheuristics
22
https://en.wikipedia.org/wiki
/Ant_colony_optimization_al
gorithms
https://en.wikipedia.org/wiki
/Particle_swarm_optimizati
on
Machine
Learning
23
Pattern Discovery,
Modern Way
▪ Unsupervised learning
▪ Find patterns in the data
▪ Supervised learning
▪ Find patterns in the input-output mapping
▪ Reinforcement learning
▪ Learn the world by taking actions and receiving
rewards from the environment
24
Unsupervised Learning
▪ Clustering
▪ k-means, agglomerative
clustering, DBSCAN,
Gaussian mixture, community
detection, Jarvis Patrick, etc.
▪ Anomaly detection
▪ Feature
extraction/selection
▪ Dimension reduction
▪ PCA, t-SNE, etc.
25
https://reference.wolfram.com/language/ref/FindClusters.html
https://commons.wikimedia.org/wiki/File:T-SNE_and_PCA.png
Supervised Learning
▪ Regression
▪ Linear regression, Lasso, polynomial
regression, nearest neighbors,
decision tree, random forest,
Gaussian process, gradient boosted
trees, neural networks, support vector
machine, etc.
▪ Classification
▪ Logistic regression, decision tree,
gradient boosted trees, naive Bayes,
nearest neighbors, support vector
machine, neural networks, etc.
▪ Risk of overfitting
▪ Addressed by model selection, cross-
validation, etc.
26
https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
https://scikit-learn.org/stable/auto_examples/
model_selection/plot_underfitting_overfitting.html
Reinforcement Learning
▪ Environment typically
formulated as a Markov
decision process (MDP)
▪ State of the world + agent’s
action
→ next state of the world +
reward
▪ Monte Carlo methods
▪ TD learning, Q-learning
27
https://en.wikipedia.org/wiki/Markov_decision_process
Artificial
Neural
Networks
28
Hopfield Networks
▪ Hopfield (1982)
▪ A.k.a. “attractor networks”
▪ Fully connected networks with
symmetric weights can recover
imprinted patterns from imperfect
initial conditions
▪ “Associative memory”
Input Output
29
https://github.com/nosratullah/hopfieldNeuralNetwork
Boltzmann Machines
▪ Hinton & Sejnowski (1983),
Hinton & Salakhutdinov (2006)
▪ Stochastic, learnable variants
of Hopfield networks
▪ Restricted (bipartite) Boltzmann
machine was at the core of the
HS 2006 Science paper that
ignited the current boom of “Deep
Learning”
30
https://en.wikipedia.org/wiki/Boltzmann_machine
https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine
Feed-Forward NNs and
Backpropagation
▪ Multilayer perceptron
(Rosenblatt 1958)
▪ Backpropagation (Werbos
1974; Rumelhart, Hinton &
Williams 1986)
▪ Minimization of errors by
gradient descent method
▪ Note that this is NOT how our
brain learns
▪ “Vanishing gradient” problem
31
Computation
Error correction
Input
Output
Autoencoders
▪ Rumelhart, Hinton & Williams
(1986) (again!)
▪ Feed-forward ANNs that try
to reproduce the input
▪ Smaller intermediate layers
→ dimension reduction,
feature learning
▪ HS 2006 Science paper also
used restricted Boltzmann
machines as stacked
autoencoders
32
https://towardsdatascience.com/applied-deep-learning-part-3-
autoencoders-1c083af4d798
https://doi.org/10.1126/science.1127647
Recurrent Neural
Networks
▪ Hopfield (1982);
Rumelhart, Hinton &
Williams (1986) (again!!)
▪ ANNs that contain
feedback loops
▪ Have internal states and
can learn temporal
behaviors of any long-
term dependencies
▪ With practical problems
in vanishing or exploding
long-term gradients
33
https://commons.wikimedia.org/wiki/File:Neuronal-Networks-
Feedback.png
https://en.wikipedia.org/wiki/Recurrent_neural_network
h
o
V
nfold
t 1
ht 1
ot 1
t
ht
ot
t+1
ht+1
ot+1
V
V V V
... ...
Long Short-Term Memory
(LSTM)
▪ Hochreiter & Schmidhuber
(1997)
▪ An improved neural module
for RNNs that can learn long-
term dependencies
effectively
▪ Vanishing gradient problem
resolved by hidden states
and error flow control
▪ “The most cited NN paper of
the 20th century”
34
Reservoir Computing
▪ Actively studied since 2000s
▪ Use inherent behaviors of
complex dynamical systems
(usually a random RNN) as
a “reservoir” of various
solutions
▪ Learning takes place only at
the readout layer (i.e., no
backpropagation needed)
▪ Discrete-time, continuous-
time versions
35
https://doi.org/10.1515/nanoph-2016-0132
https://doi.org/10.1103/PhysRevLett.120.024102
Deep Neural Networks
▪ Ideas originally around since
the beginning of ANNs
▪ Became feasible and popular
in 2010s because of:
▪ Huge increase in available
computational power thanks
to GPUs
▪ Wide availability of training
data over the Internet
36
https://commons.wikimedia.org/wiki/File:Example_of_a_deep_neural_network.png
https://www.techradar.com/news/computing-components/graphics-cards/best-graphics-cards-1291458
Convolutional Neural
Networks
▪ Fukushima (1980), Homma
et al. (1988), LeCun et al.
(1989, 1998)
▪ DNNs with convolution
operations between layers
▪ Layers represent spatial
(and/or temporal) patterns
▪ Many great applications to
image/video/time series
analyses
37
https://towardsdatascience.com/a-comprehensive-guide-to-
convolutional-neural-networks-the-eli5-way-3bd2b1164a53
https://cs231n.github.io/convolutional-networks/
Adversarial Attacks and
Generative Adversarial
Networks (GAN)
38
https://arxiv.org/abs/1412.6572
https://en.wikipedia.org/wiki/Generative_
adversarial_network
▪ Goodfellow et al. (2014a,b)
▪ DNNs are vulnerable
against adversarial attacks
▪ Utilize it to create co-
evolutionary systems of
generator and discriminator
https://commons.wikimedia.org/wiki/File:A-Standard-GAN-and-b-conditional-GAN-architecturpn.png
Graph Neural
Networks
▪ Scarselli et al. (2008),
Kipf & Welling (2016)
▪ Non-regular graph
structure used as
network topology
within each layer of
DNN
▪ Applications to graph-
based data modeling,
e.g, social networks,
molecular biology, etc.
39
https://tkipf.github.io/graph-convolutional-networks/
https://towardsdatascience.com/how-to-do-deep-learning-on-
graphs-with-graph-convolutional-networks-7d2250723780
Other ANNs
▪ Self-organizing map (Kohonen 1982)
▪ Neural gas (Martinetz & Schulten 1991)
▪ Spiking neural networks (1990s-)
▪ Hierarchical Temporal Memory (2004-)
etc…
40
https://en.wikipedia.org/wiki/
Self-organizing_map
https://doi.org/10.1016/j.neucom.
2019.10.104
https://numenta.com/neuroscience-research/sequence-learning/
(Large)
Language
Models
41
History of “Chatbots”
▪ ELIZA (Weizenbaum 1966)
▪ A.L.I.C.E. (Wallace 1995)
▪ Jabberwacky (Carpenter 1997)
▪ Cleverbot (Carpenter 2008)
(and many others)
42
https://en.wikipedia.org/wiki/ELIZA#/media/File:ELIZA_conversation.png
http://chatbots.org/
https://www.youtube.com/watch?v=WnzlbyTZsQY (by Cornell CCSL)
Language Models
“With great power comes great _____”
43
Probability of
the next word
… depends on the conte t
Function P( ) can be defined as an explicit dataset,
a heuristic algorithm, a simple statistical distribution,
a (deep) neural network, or anything else
“Large” Language Models
▪ Language models meet
(1) massive amount of data
and (2) “transformers”!
▪ Vaswani et al. (2017)
▪ DNNs with self-attention
mechanism for natural language
processing
▪ Enhanced parallelizability
leading to shorter training time
than LSTM
▪ BERT (2018) for Google
search
▪ Open AI’s GPT (2020-) and
many others
44
https://arxiv.org/abs/1706.03762
GPT/LLM
Architecture
Details
45
https://www.youtube.com/watch?v=wjZofJX0v4M
https://www.youtube.com/watch?v=eMlx5fFNoYc
3Blue1Brown
offers some great
video explanations!
46
https://informationisbeautiful.net/visualizations/the-rise-of-generative-ai-large-language-models-llms-like-chatgpt/
Getting
Larger
The New “Chatbots”
47
“ChatGPT and the Evolution
of Artificial Intelligence”
48
https://www.youtube.com/watch?v=SzbKJWKE_Ss
LLMs Becoming
Multimodal
49
Example: NExT-GPT architecture
https://medium.com/@cout.shubham/exploring-multimodal-large-language-
models-a-step-forward-in-ai-626918c6a3ec
Promising Applications
▪ Coding aid
▪ Personalized tutoring
▪ Conversation partners
▪ Modality conversion for people
with disability
▪ Analysis of qualitative scientific
data
(… and many
others)
50
“Foundation” Models
▪ General-purpose AI
models “that are
trained on broad
data at scale and
are adaptable to a
wide range of
downstream tasks”
− Stanford Institute for
Human-Centered Artificial
Intelligence (2021);
https://arxiv.org/abs/2108.07
258
51
https://philosophyterms.com/the-library-of-babel/
Conscious-
ness in
LLMs?
52
Challenges
(Especially from Systems
Science Perspectives)
53
Various Societal
Concerns About AI
▪ “Artificial General Intelligence” (AGI)
and the “e istential crisis of the humanity”
▪ Significant job loss caused by AI
▪ Fake information generated by AI
▪ Biases and social (in)justice
▪ Lack of transparency and over-concentration of AI
power
▪ Huge energy costs of deep learning and LLMs
▪ Rights of AI and machines 54
AI as a Threat to Humanity?
55
But Some Simple Tasks
Are Still Difficult for AI
▪ Words, numbers, facts
▪ Simple logic and
reasoning
▪ Maintaining stability and plasticity
▪ Catastrophic forgetting
56
https://spectrum.ieee.org/openai-dall-e-2
https://www.invistaperforms.
org/getting-ahead-forgetting-
curve-training/
57
58
“Hallucination”
(B.S.-ing)
Wrong Use Cases of AI
59
Contamination of AI-
Generated Data
60
Another “AI Winter”
Coming?
61
System-Level Challenge:
Idea Homogenization and
Social Fragmentation
▪ Widespread use of
common AI tools may
homogenize human ideas
▪ Over-consumption of
catered AI-generated
information may accelerate
echo chamber formation
and social fragmentation
▪ How can we prevent these
negative outcomes?
62
(Centola et al. 2007)
System-Level Challenge:
Critical Decision Making in
the Absence of Data
63
Fall 2020: “How to
safely reopen the
campus”
How can we make
informed decisions
in a critical situation
when no prior data
are available?
System-Level
Challenge:
Open-Endedness
64
https://en.wikipedia.org/wiki/Tree_of_life_(biology)
How can we make AI able to
keep producing new things?
Are We Getting Any
Closer to the
Understanding of
True “Intelligence"?
65
Final Remarks
▪ Don’t get drowned in the vast
ocean of methods and tools
▪ Hundreds of years of history
▪ Buzzwords and fads keep changing
▪ Keep the big picture in mind –
focus on what your real problem
is and how you will solve it
▪ Being able to think and develop
unique, original, creative
solutions is key to differentiate
your intelligence from
AI/LLMs/machines 66
Thank You
67
@hirokisayama

Artificial Intelligence, Machine Learning, and (Large) Language Models: A Quick Introduction

  • 1.
    Artificial Intelligence, Machine Learning,and (Large) Language Models: A Quick Introduction Hiroki Sayama sayama@binghamton.edu
  • 2.
    Outline 1. The Origin:Understanding “Intelligence” 2. Key Ingredient I: Statistics & Data Analytics 3. Key Ingredient II: Optimization 4. Machine Learning 5. Artificial Neural Networks 6. (Large) Language Models 7. Challenges 2
  • 3.
  • 4.
    Alan Turing andthe Turing Machine (1936) 4 https://www.felienne.com/archives/2974
  • 5.
    Turing Test (1950)– a.k.a. “the Imitation Game” 5 https://en.wikipedia.org/wiki/Turing_test
  • 6.
    McCulloch-Pitts Model (1943) 6 The firstformal model of computational mechanisms of (artificial) neurons
  • 7.
    Basis of Modern Artificial Neural Networks 7 Multilayer perceptron (Rosenblatt1958) Backpropagation (Rumelhart, Hinton & Williams 1986) Deep learning https://commons.wikimedia.org/wiki/File: Example_of_a_deep_neural_network.png
  • 8.
  • 9.
    “Cybernetics” as a Precursorto “AI” 9 Norbert Wiener (This is where the word “cyber-” came from!)
  • 10.
    Good Old-Fashioned AI: SymbolicComputation and Reasoning ▪ Herbert Simon et al.’s “Logic Theorist” (1956) ▪ Functional programming, list processing (e.g., LISP (1955-)) ▪ Logic-based chatbots (e.g., ELIZA (1966)) ▪ Expert systems ▪ Fuzzy logic (Zadeh, 1965) 10
  • 11.
  • 12.
  • 13.
    Pattern Discovery, Classic Way ▪Descriptive statistics ▪ Distribution, correlation, regression ▪ Inferential statistics ▪ Hypothesis testing, estimation, Bayesian inference ▪ Parametric / non-parametric approaches 13 https://en.wikipedia.org/wiki/Statistics
  • 14.
    Regression ▪ Legendre, Gauss(early 1800s) ▪ Representing the behavior of a dependent variable (DV) as a function of independent variable(s) (IV) ▪ Linear regression, polynomial regression, logistic regression, etc. ▪ Optimization (minimization) of errors between model and data 14 https://en.wikipedia.org/wiki/Regression_analysis https://en.wikipedia.org/wiki/Polynomial_regression
  • 15.
    Hypothesis Testing ▪ Originalidea dates back to 1700s ▪ Pearson, Gosset, Fisher (early 1900s) ▪ Set up hypothesis(-ses) and see how (un)likely the observed data could be explained by them ▪ Type-I error (false positive), Type-II error (false negative) 15 https://en.wikibooks.org/wiki/Statistics/Testing _Statistical_Hypothesis
  • 16.
    Bayesian Inference ▪ Bayes& Price (1763), Laplace (1774) ▪ Probability as a degree of belief that an event or a proposition is true ▪ Estimated likelihoods updated as additional data are obtained ▪ Empowered by Markov Chain Monte Carlo (MCMC) numerical integration methods (Metropolis 1953; Hastings 1970) 16 https://en.wikipedia.org/wiki/Bayes%27_theorem https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo
  • 17.
  • 18.
    Least Squares Method ▪Legendre, Gauss (early 1800s) ▪ Find the formula that minimizes the sum of squared errors (residuals) analytically 18 https://en.wikipedia.org/wiki/Least_squares
  • 19.
    Gradient Methods ▪ Findlocal minimum of a function computationally ▪ Gradient descent (Cauchy 1847) and its variants ▪ More than 150 years later, this is still what modern AI/ML/DL systems are essentially doing!! ▪ Error minimization 19 https://commons.wikimedia.org/wiki/File: Gradient_descent.gif
  • 20.
    Linear/Nonlinear/Integer/ Dynamic Programming ▪ Extensivelystudied and used in Operations Research ▪ Practical optimization algorithms under various constraints 20 https://en.wikipedia.org/wiki/Linear_programming https://en.wikipedia.org/wiki/Integer_programming https://en.wikipedia.org/wiki/Floyd%E2%80%93Wa rshall_algorithm
  • 21.
    Evolutionary Algorithms ▪ Originalidea by Turing (1950) ▪ Genetic algorithm (Holland 1975) ▪ Genetic programming (Cramer 1985, Koza 1988) ▪ Differential evolution (Storn & Price 1997) ▪ Neuroevolution (Stanley & Miikkulainen 2002) 21 https://becominghuman.ai/my-new-genetic-algorithm-for-time-series-f7f0df31343d https://en.wikipedia.org/wiki/Genetic_programming
  • 22.
    Other Population-Based Learning &Optimization ▪ Ant colony optimization (Dorigo 1992) ▪ Particle swarm optimization (Kennedy & Eberhart 1995) ▪ And various other metaphor-based metaheuristic algorithms https://en.wikipedia.org/wiki/List_of_metaphor-based_metaheuristics 22 https://en.wikipedia.org/wiki /Ant_colony_optimization_al gorithms https://en.wikipedia.org/wiki /Particle_swarm_optimizati on
  • 23.
  • 24.
    Pattern Discovery, Modern Way ▪Unsupervised learning ▪ Find patterns in the data ▪ Supervised learning ▪ Find patterns in the input-output mapping ▪ Reinforcement learning ▪ Learn the world by taking actions and receiving rewards from the environment 24
  • 25.
    Unsupervised Learning ▪ Clustering ▪k-means, agglomerative clustering, DBSCAN, Gaussian mixture, community detection, Jarvis Patrick, etc. ▪ Anomaly detection ▪ Feature extraction/selection ▪ Dimension reduction ▪ PCA, t-SNE, etc. 25 https://reference.wolfram.com/language/ref/FindClusters.html https://commons.wikimedia.org/wiki/File:T-SNE_and_PCA.png
  • 26.
    Supervised Learning ▪ Regression ▪Linear regression, Lasso, polynomial regression, nearest neighbors, decision tree, random forest, Gaussian process, gradient boosted trees, neural networks, support vector machine, etc. ▪ Classification ▪ Logistic regression, decision tree, gradient boosted trees, naive Bayes, nearest neighbors, support vector machine, neural networks, etc. ▪ Risk of overfitting ▪ Addressed by model selection, cross- validation, etc. 26 https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html https://scikit-learn.org/stable/auto_examples/ model_selection/plot_underfitting_overfitting.html
  • 27.
    Reinforcement Learning ▪ Environmenttypically formulated as a Markov decision process (MDP) ▪ State of the world + agent’s action → next state of the world + reward ▪ Monte Carlo methods ▪ TD learning, Q-learning 27 https://en.wikipedia.org/wiki/Markov_decision_process
  • 28.
  • 29.
    Hopfield Networks ▪ Hopfield(1982) ▪ A.k.a. “attractor networks” ▪ Fully connected networks with symmetric weights can recover imprinted patterns from imperfect initial conditions ▪ “Associative memory” Input Output 29 https://github.com/nosratullah/hopfieldNeuralNetwork
  • 30.
    Boltzmann Machines ▪ Hinton& Sejnowski (1983), Hinton & Salakhutdinov (2006) ▪ Stochastic, learnable variants of Hopfield networks ▪ Restricted (bipartite) Boltzmann machine was at the core of the HS 2006 Science paper that ignited the current boom of “Deep Learning” 30 https://en.wikipedia.org/wiki/Boltzmann_machine https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine
  • 31.
    Feed-Forward NNs and Backpropagation ▪Multilayer perceptron (Rosenblatt 1958) ▪ Backpropagation (Werbos 1974; Rumelhart, Hinton & Williams 1986) ▪ Minimization of errors by gradient descent method ▪ Note that this is NOT how our brain learns ▪ “Vanishing gradient” problem 31 Computation Error correction Input Output
  • 32.
    Autoencoders ▪ Rumelhart, Hinton& Williams (1986) (again!) ▪ Feed-forward ANNs that try to reproduce the input ▪ Smaller intermediate layers → dimension reduction, feature learning ▪ HS 2006 Science paper also used restricted Boltzmann machines as stacked autoencoders 32 https://towardsdatascience.com/applied-deep-learning-part-3- autoencoders-1c083af4d798 https://doi.org/10.1126/science.1127647
  • 33.
    Recurrent Neural Networks ▪ Hopfield(1982); Rumelhart, Hinton & Williams (1986) (again!!) ▪ ANNs that contain feedback loops ▪ Have internal states and can learn temporal behaviors of any long- term dependencies ▪ With practical problems in vanishing or exploding long-term gradients 33 https://commons.wikimedia.org/wiki/File:Neuronal-Networks- Feedback.png https://en.wikipedia.org/wiki/Recurrent_neural_network h o V nfold t 1 ht 1 ot 1 t ht ot t+1 ht+1 ot+1 V V V V ... ...
  • 34.
    Long Short-Term Memory (LSTM) ▪Hochreiter & Schmidhuber (1997) ▪ An improved neural module for RNNs that can learn long- term dependencies effectively ▪ Vanishing gradient problem resolved by hidden states and error flow control ▪ “The most cited NN paper of the 20th century” 34
  • 35.
    Reservoir Computing ▪ Activelystudied since 2000s ▪ Use inherent behaviors of complex dynamical systems (usually a random RNN) as a “reservoir” of various solutions ▪ Learning takes place only at the readout layer (i.e., no backpropagation needed) ▪ Discrete-time, continuous- time versions 35 https://doi.org/10.1515/nanoph-2016-0132 https://doi.org/10.1103/PhysRevLett.120.024102
  • 36.
    Deep Neural Networks ▪Ideas originally around since the beginning of ANNs ▪ Became feasible and popular in 2010s because of: ▪ Huge increase in available computational power thanks to GPUs ▪ Wide availability of training data over the Internet 36 https://commons.wikimedia.org/wiki/File:Example_of_a_deep_neural_network.png https://www.techradar.com/news/computing-components/graphics-cards/best-graphics-cards-1291458
  • 37.
    Convolutional Neural Networks ▪ Fukushima(1980), Homma et al. (1988), LeCun et al. (1989, 1998) ▪ DNNs with convolution operations between layers ▪ Layers represent spatial (and/or temporal) patterns ▪ Many great applications to image/video/time series analyses 37 https://towardsdatascience.com/a-comprehensive-guide-to- convolutional-neural-networks-the-eli5-way-3bd2b1164a53 https://cs231n.github.io/convolutional-networks/
  • 38.
    Adversarial Attacks and GenerativeAdversarial Networks (GAN) 38 https://arxiv.org/abs/1412.6572 https://en.wikipedia.org/wiki/Generative_ adversarial_network ▪ Goodfellow et al. (2014a,b) ▪ DNNs are vulnerable against adversarial attacks ▪ Utilize it to create co- evolutionary systems of generator and discriminator https://commons.wikimedia.org/wiki/File:A-Standard-GAN-and-b-conditional-GAN-architecturpn.png
  • 39.
    Graph Neural Networks ▪ Scarselliet al. (2008), Kipf & Welling (2016) ▪ Non-regular graph structure used as network topology within each layer of DNN ▪ Applications to graph- based data modeling, e.g, social networks, molecular biology, etc. 39 https://tkipf.github.io/graph-convolutional-networks/ https://towardsdatascience.com/how-to-do-deep-learning-on- graphs-with-graph-convolutional-networks-7d2250723780
  • 40.
    Other ANNs ▪ Self-organizingmap (Kohonen 1982) ▪ Neural gas (Martinetz & Schulten 1991) ▪ Spiking neural networks (1990s-) ▪ Hierarchical Temporal Memory (2004-) etc… 40 https://en.wikipedia.org/wiki/ Self-organizing_map https://doi.org/10.1016/j.neucom. 2019.10.104 https://numenta.com/neuroscience-research/sequence-learning/
  • 41.
  • 42.
    History of “Chatbots” ▪ELIZA (Weizenbaum 1966) ▪ A.L.I.C.E. (Wallace 1995) ▪ Jabberwacky (Carpenter 1997) ▪ Cleverbot (Carpenter 2008) (and many others) 42 https://en.wikipedia.org/wiki/ELIZA#/media/File:ELIZA_conversation.png http://chatbots.org/ https://www.youtube.com/watch?v=WnzlbyTZsQY (by Cornell CCSL)
  • 43.
    Language Models “With greatpower comes great _____” 43 Probability of the next word … depends on the conte t Function P( ) can be defined as an explicit dataset, a heuristic algorithm, a simple statistical distribution, a (deep) neural network, or anything else
  • 44.
    “Large” Language Models ▪Language models meet (1) massive amount of data and (2) “transformers”! ▪ Vaswani et al. (2017) ▪ DNNs with self-attention mechanism for natural language processing ▪ Enhanced parallelizability leading to shorter training time than LSTM ▪ BERT (2018) for Google search ▪ Open AI’s GPT (2020-) and many others 44 https://arxiv.org/abs/1706.03762
  • 45.
  • 46.
  • 47.
  • 48.
    “ChatGPT and theEvolution of Artificial Intelligence” 48 https://www.youtube.com/watch?v=SzbKJWKE_Ss
  • 49.
    LLMs Becoming Multimodal 49 Example: NExT-GPTarchitecture https://medium.com/@cout.shubham/exploring-multimodal-large-language- models-a-step-forward-in-ai-626918c6a3ec
  • 50.
    Promising Applications ▪ Codingaid ▪ Personalized tutoring ▪ Conversation partners ▪ Modality conversion for people with disability ▪ Analysis of qualitative scientific data (… and many others) 50
  • 51.
    “Foundation” Models ▪ General-purposeAI models “that are trained on broad data at scale and are adaptable to a wide range of downstream tasks” − Stanford Institute for Human-Centered Artificial Intelligence (2021); https://arxiv.org/abs/2108.07 258 51 https://philosophyterms.com/the-library-of-babel/
  • 52.
  • 53.
  • 54.
    Various Societal Concerns AboutAI ▪ “Artificial General Intelligence” (AGI) and the “e istential crisis of the humanity” ▪ Significant job loss caused by AI ▪ Fake information generated by AI ▪ Biases and social (in)justice ▪ Lack of transparency and over-concentration of AI power ▪ Huge energy costs of deep learning and LLMs ▪ Rights of AI and machines 54
  • 55.
    AI as aThreat to Humanity? 55
  • 56.
    But Some SimpleTasks Are Still Difficult for AI ▪ Words, numbers, facts ▪ Simple logic and reasoning ▪ Maintaining stability and plasticity ▪ Catastrophic forgetting 56 https://spectrum.ieee.org/openai-dall-e-2 https://www.invistaperforms. org/getting-ahead-forgetting- curve-training/
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
    System-Level Challenge: Idea Homogenizationand Social Fragmentation ▪ Widespread use of common AI tools may homogenize human ideas ▪ Over-consumption of catered AI-generated information may accelerate echo chamber formation and social fragmentation ▪ How can we prevent these negative outcomes? 62 (Centola et al. 2007)
  • 63.
    System-Level Challenge: Critical DecisionMaking in the Absence of Data 63 Fall 2020: “How to safely reopen the campus” How can we make informed decisions in a critical situation when no prior data are available?
  • 64.
  • 65.
    Are We GettingAny Closer to the Understanding of True “Intelligence"? 65
  • 66.
    Final Remarks ▪ Don’tget drowned in the vast ocean of methods and tools ▪ Hundreds of years of history ▪ Buzzwords and fads keep changing ▪ Keep the big picture in mind – focus on what your real problem is and how you will solve it ▪ Being able to think and develop unique, original, creative solutions is key to differentiate your intelligence from AI/LLMs/machines 66
  • 67.