Presenting the landscape of AI/ML in 2023 by introducing a quick summary of the last 10 years of its progress, current situation, and looking at things happening behind the scene.
1. HyunJoon Jung, PhD | Director of Applied Research at Adobe | April 2023
Landscape of AI/ML in 2023
ML/AI past, current, and future
2. Disclaimer
This presentation is intended only for educational
purposes.
Statements of fact and views expressed are based
on the presenter’s personal viewpoints and
judgments. It does not represent any a
ffi
liations the
presenter engaged and engages.
2
7. What we are seeing now?
In
fl
ection point of technology
Wang et al. 2023. Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
8. What we are seeing now?
Past, Current, and Forecast
What happened
before the emergence of
GPT3 and Stable Diffusion?
Emergence of various
Foundation ML Models
What’s happening behind the
scene?
9. What we are seeing now?
In
fl
ection point of technology
What have happened
before GPT3 and Stable
Diffusion?
Emergence of various
11. Word2Vec (2013)
Motivation
Build a simple and shallow neural model learning
from a huge corpus
Goal
Predict middle word from neighbors within a
fi
xed
size context window
E
ffi
cient Implementation of Word Representations in Vector Space, T. Mikolov, K. Chen, G. Corrado, and J. Dean, 2013. http://arxiv.org/pdf/1301.3781.pdf
12. Neural Machine Translation (2015)
Neural Machine Translation, Bahdanau et al. 2015
Proposing an e
ffi
cient way to align tokens through the attention model
13. Attention is All You Need (2017)
Attention Is All You Need Vaswani et al. 2017
Proposing a new architecture, the Transformer
14. Generative Pre-Training (2018)
Improving Language Understanding by Generative Pre-Training, Radford et al. 2018
showing how a generative language model can acquire knowledge and process dependencies unsupervised based on pre-
training on a large and diverse set of data.
15. BERT (2019)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al. 2019
Showing the possibility of pre-training and
fi
ne-tuning approaches across various downstream language-based tasks.
16. BERT (2019)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al. 2019
Diverse BERT-based variants for many di
ff
erent cases
17. Evolution Trends in Language
Li et al. Object-driven Text-to-Image Synthesis via Adversarial Training, 2019
Representation Learning and
Self-supervised Learning
Easier and Scalable
Data and Label Acquisition
Evolution of Computing Power
19. Generative Adversarial Network (2014)
https://developers.google.com/machine-learning/gan/gan_structure
In
fl
ection Point of GenAI by Combining Discriminator with Generator
20. YOLO (2015)
Fast and E
ffi
cient Object Detection Algorithm, Widely Adopted
You Only Look Once: Uni
fi
ed, Real-Time Object Detection, Redmond et al. 2016
21. DeepLab (2017)
Proposing a semantic image segmentation method combining atrous convolution and conditional random
fi
elds (CRFs).
Widely adopted for image segmentation tasks.
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Chen et al. 2017
22. Mask R-CNN (2017)
Proposing a new approach for object detection and instance segmentation
that extends the Faster R-CNN architecture with a mask prediction branch.
Mask R-CNN, He et al. 2017
23. StyleGAN (2018)
a learned mapping function to separate the latent space representation of an image into disentangled style and content vectors
Karras et al. Analyzing and Improving the Image Quality of StyleGAN, 2019
24. ObjGAN (2019)
Understand captions, sketch layouts, and re
fi
ne the details -> Vue.ai
Li et al. Object-driven Text-to-Image Synthesis via Adversarial Training, 2019
25. Evolution Trends in Computer Vision
Li et al. Object-driven Text-to-Image Synthesis via Adversarial Training, 2019
CNN-based Detection and
Segmentation
GAN-based
Style Transfer,
Image Generation,
Super Resolution
Faster, Lighter models
for productization
while bigger, heavier models
for research
26. What we are seeing now?
In
fl
ection point of technology
What have happened
Applications
What’s happening behind the
scene?
27. Vision Transformer
(2020)
A seminal paper proposing the use of the Transformer
architecture in the Computer Vision domain. It sparked a
variety of subsequent research.
Dosovitskiy, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, 2020
Khan et al.,Transformers in Vision: A Survey, 2021
28. T5 (2020)
Wei et al., FINETUNED LANGUAGE MODELS ARE ZERO-SHOT LEARNERS, 2022
a massive pre-trained language model with 11
billion parameters, introducing a uni
fi
ed framework
that converts every language problem into a text-
to-text format.
Ra
ff
el et al., Exploring the Limits of Transfer Learning with a Uni
fi
ed Text-to-Text Transformer, 2020
FLAN (2022)
29. GPT3 (2021)
Brown et al. Language Models are Few-Shot Learners, 2020
a massive neural network with 175 billion parameters,
generating high-quality language with unprecedented
accuracy and coherence.
30. CLIP (2021)
Radford et al. Learning Transferable Visual Models From Natural Language Supervision, 2021
a massive neural network language-to-vision
transformer model with 12 billion parameters
providing a pre-trained model that can be
fi
ne-
tuned for various downstream tasks such as image
classi
fi
cation, object detection, and visual question
answering.
33. Imagen (2022) and
Imagen Editor (2023)
Photorealistic Text-to-Image Di
ff
usion Models with Deep Language
Understanding, Saharia et al. 2022
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
Inpainting, Wang et al., 2023
LLMs are surprisingly e
ff
ective at encoding text for image synthesis: increasing
the size of the language model in Imagen boosts both sample
fi
delity and image-
text alignment much more than increasing the size of the image di
ff
usion model.
34. PaLM (2022)
PaLM: Scaling Language Modeling with Pathways, Chordhery et al. 2022
a 540-billion parameter, dense decoder-only Transformer model trained with the Pathways system
36. OPT (2022)
OPT: Open Pre-trained Transformer Language Models, Zhang et al., 2022
a suite of decoder only pre-trained transformers ranging from
125M to 175B parameters, which we aim to fully and
responsibly share with interested researchers.
41. Evolution Trends in AI/ML from 2020 to 2023
Li et al. Object-driven Text-to-Image Synthesis via Adversarial Training, 2019
Multi-modal ML Models
Diffusion Models and
the Transformers
Bigger, Bigger, and even Bigger
44. Some Questions ahead of us
Challenges of
Foundation Models
Human and AI Interaction
ML Research
vs. Productization
Big Data / Big Model
45. Starting from a customer value than technology?
Customer
Value
Functional
Value
Technological
Value
Need for thinking about why many ML start-ups failed?
46. ML Product Life Cycle?
Problem
Definition
Data
Acquisition
ML
Modeling
Model
Evaluation
47. ML Product Life Cycle?
Iterative, Interaction, Data-driven, and Engineering Factors
Problem
Definition
Data
Acquisition
Data
Analysis
ML
Modeling
Model
Evaluation
Deployment /
Product
Integration
Failure, Failure, Failure, ⋯ and Success.
48. Some Questions ahead of us
Challenges of
Foundation Models
Human and AI Interaction
ML Research
vs. Productization
Big Data / Big Model
49. Data Gets Bigger
https://www.lesswrong.com/posts/asqDCb9XzXnLjSfgL/trends-in-training-dataset-sizes, by Pablo Villalobos
• Vision and language datasets have historically grown at 0.1 and 0.2 orders of magnitude (OOM) per year, respec
ti
vely.
• There seems to be some transi
ti
on around 2014-2015, a
ft
er which training datasets became much bigger and (in the case of
language) smaller datasets disappeared. This might be just an artefact of our small sample size.
50. Increasing Model Parameter and Gaps?
Machine Learning Model Sizes and the Parameter Gap, Villalobos et al. 2022
•The model size of notable Machine Learning systems has grown ten
ti
mes faster than before since 2018.
• GPT-3 ini
ti
ated the gap by ‘jumping’ one order of magnitude in size over previous systems. This gap was maintained because
researchers are incen
ti
vized to build the cheapest model that can outperform previous models. Those compe
ti
ng with GPT-3 are
above the gap; the rest are below
51. Scaling Model Size is Beneficial?
Training Compute-Optimal Large Language Models, Hoffmann et al., 2022
• This study demonstrates that modern LLMs are (i) oversized and (ii) not trained with enough data.
• How e
ff
ec
ti
vely training a model with data? (Data-to-Model Alignment)
52. Some Questions ahead of us
Challenges of
Foundation Models
Human and AI Interaction
ML Research
vs. Productization
Big Data / Big Model
53. Challenges in Foundation Model Dev
Cost
Polarization of AI Research
and Development
Bias and Safety
Hallucination
(Fidelity and Consistency)
55. Google Bard, OpenAI ChatGPT, Wikipedia
Hallucination
(Fidelity and Consistency)
Strong need for Fact Grounding via a in-memory DB, knowledge
graph (ontology), or information retrieval engine
57. LLM Hype with Dunning Kruger Effect
Image source: HFS Research
58. Polarization of AI Research
and Development
Data Model Computing Resource
Data Commercialization,
Data Usage Restriction
Model vs. Pipeline Cloud vs. On-device
59. Interesting Things to discuss?
Challenges of
Foundation Models
Human and AI Interaction
ML Research
vs. Productization
Big Data / Big Model
61. Leveraging LLMs
for Agent Planning
h
tt
ps://yoheinakajima.com/task-driven-autonomous-agent-u
ti
lizing-gpt-4-pinecone-and-langchain-for-diverse-applica
ti
ons/
https://github.com/microsoft/JARVIS
https://www.camel-ai.org/
Cost issues (from iterations), Overkill, Hallucination
62. LLMs for Agent-
based Plannings
Generative Agents: Interactive Simulacra of Human Behavior, Park et
al., 2023
Generative agents create believable simulacra of
human behavior for interactive applications. In this
work, we demonstrate generative agents by
populating a sandbox environment, reminiscent of
The Sims, with twenty-
fi
ve agents.
Users can observe and intervene as agents they
plan their days, share news, form relationships, and
coordinate group activities.
63. LLMs for Agent-based Plannings
Generative Agents: Interactive Simulacra of Human Behavior, Park et al., 2023
Demonstrate the feasibility of crea
ti
ng virtual agents and controlling them them through context by leveraging
LLMs such as ChatGPT
65. Recent Progress in AI allows
us to accelerate our
creativity and productivity
from 0 to 70-80% with ease.
However, it might still
require human intervention
for the last 20-30%.
That’s where interactive AI
systems will have more
chances to provide
fi
ne-
trained controls and
seamless experience for
users.
66. The Book of Why: The New Science of Cause and Effect, Judea Pearl and Dana Mackenzie, 2018
The Book of Why